Crossref Labs - Crossref

Feedback on automatic digital preservation and self-healing DOIs

Martin Eve – 2023 September 28

Thank you to everyone who responded with feedback on the Op Cit proposal. This post clarifies, defends, and amends the original proposal in light of the responses that have been sent. We have endeavoured to respond to every point that was raised, either here or in the document comments themselves. We strongly prefer for this to be developed in collaboration with CLOCKSS, LOCKSS, and/or Portico, i.e. through established preservation services that already have existing arrangements in place, are properly funded, and understand the problem space.

Follow the money, or how to link grants to research outputs

Dominika Tkaczyk – 2022 March 22

In GrantsLinkingCrossref Labs

The ecosystem of scholarly metadata is filled with relationships between items of various types: a person authored a paper, a paper cites a book, a funder funded research. Those relationships are absolutely essential: an item without them is missing the most basic context about its structure, origin, and impact. No wonder that finding and exposing such relationships is considered very important by virtually all parties involved. Probably the most famous instance of this problem is finding citation links between research outputs. Lately, another instance has been drawing more and more attention: linking research outputs with grants used as their funding source. How can this be done and how many such links can we observe?

Double trouble with DOIs

Dominika Tkaczyk – 2020 March 10

In Crossref LabsMetadataMetadata Quality

Detective Matcher stopped abruptly behind the corner of a short building, praying that his loud heartbeat doesn’t give up his presence. This missing DOI case was unlike any other before, keeping him awake for many seconds already. It took a great effort and a good amount of help from his clever assistant Fuzzy Comparison to make sense of the sparse clues provided by Miss Unstructured Reference, an elegant young lady with a shy smile, who begged him to take up this case at any cost.

What’s your (citations’) style?

Dominika Tkaczyk – 2019 October 29

In CitationCrossref LabsMachine Learning

Bibliographic references in scientific papers are the end result of a process typically composed of: finding the right document to cite, obtaining its metadata, and formatting the metadata using a specific citation style. This end result, however, does not preserve the information about the citation style used to generate it. Can the citation style be somehow guessed from the reference string only? TL;DR I built an automatic citation style classifier. It classifies a given bibliographic reference string into one of 17 citation styles or “unknown”.

What if I told you that bibliographic references can be structured?

Dominika Tkaczyk – 2019 July 08

In LinkingCitationCrossref LabsReference Matching

Last year I spent several weeks studying how to automatically match unstructured references to DOIs (you can read about these experiments in my previous blog posts). But what about references that are not in the form of an unstructured string, but rather a structured collection of metadata fields? Are we matching them, and how? Let’s find out.

Reference matching: for real this time

Dominika Tkaczyk – 2018 December 18

In LinkingCitationCrossref LabsReference Matching

In my previous blog post, Matchmaker, matchmaker, make me a match, I compared four approaches for reference matching. The comparison was done using a dataset composed of automatically-generated reference strings. Now it’s time for the matching algorithms to face the real enemy: the unstructured reference strings deposited with Crossref by some members. Are the matching algorithms ready for this challenge? Which algorithm will prove worthy of becoming the guardian of the mighty citation network? Buckle up and enjoy our second matching battle!

Matchmaker, matchmaker, make me a match

Dominika Tkaczyk – 2018 November 12

In LinkingCitationCrossref LabsReference Matching

Matching (or resolving) bibliographic references to target records in the collection is a crucial algorithm in the Crossref ecosystem. Automatic reference matching lets us discover citation relations in large document collections, calculate citation counts, H-indexes, impact factors, etc. At Crossref, we currently use a matching approach based on reference string parsing. Some time ago we realized there is a much simpler approach. And now it is finally battle time: which of the two approaches is better?

What does the sample say?

Dominika Tkaczyk – 2018 November 09

In LinkingCitationCrossref LabsReference Matching

At Crossref Labs, we often come across interesting research questions and try to answer them by analyzing our data. Depending on the nature of the experiment, processing over 100M records might be time-consuming or even impossible. In those dark moments we turn to sampling and statistical tools. But what can we infer from only a sample of the data?

URLs and DOIs: a complicated relationship

Joe Wass – 2016 November 04

In Content RegistrationCrossref LabsEvent DataIdentifiersPersistence

As the linking hub for scholarly content, it’s our job to tame URLs and put in their place something better. Why? Most URLs suffer from link rot and can be created, deleted or changed at any time. And that’s a problem if you’re trying to cite them.

Using AWS S3 as a large key-value store for Chronograph

Joe Wass – 2016 August 02

In Crossref LabsDOIsEvent DataProgrammingWikipedia

One of the cool things about working in Crossref Labs is that interesting experiments come up from time to time. One experiment, entitled “what happens if you plot DOI referral domains on a chart?” turned into the Chronograph project. In case you missed it, Chronograph analyses our DOI resolution logs and shows how many times each DOI link was resolved per month, and also how many times a given domain referred traffic to DOI links per day.

RSS Feed

Get involved

Find a service

Documentation

About us

2024 April 03

Testing times

2024 March 18

Mending Chesterton's Fence: Open Source Decision-making

2024 March 15

Credential Checking at Crossref

2024 March 13

Subject codes, incomplete and unreliable, have got to go

Blog

Feedback on automatic digital preservation and self-healing DOIs

Follow the money, or how to link grants to research outputs

Double trouble with DOIs

What’s your (citations’) style?

What if I told you that bibliographic references can be structured?

Reference matching: for real this time

Matchmaker, matchmaker, make me a match

What does the sample say?

URLs and DOIs: a complicated relationship

Using AWS S3 as a large key-value store for Chronograph

Recent Posts

Categories

Archives