Blog

Discovering relationships between preprints and journal articles

Dominika Tkaczyk

Dominika Tkaczyk – 2023 December 07

In PreprintsLinking

In the scholarly communications environment, the evolution of a journal article can be traced by the relationships it has with its preprints. Those preprint–journal article relationships are an important component of the research nexus. Some of those relationships are provided by Crossref members (including publishers, universities, research groups, funders, etc.) when they deposit metadata with Crossref, but we know that a significant number of them are missing. To fill this gap, we developed a new automated strategy for discovering relationships between preprints and journal articles and applied it to all the preprints in the Crossref database. We made the resulting dataset, containing both publisher-asserted and automatically discovered relationships, publicly available for anyone to analyse.

Forming new relationships: Contributing to Open source

TL;DR One of the things that makes me glad to work at Crossref is the principles to which we hold ourselves, and the most public and measurable of those must be the Principles of Open Scholarly Infrastructure, or POSI, for short. These ambitions lay out how we want to operate - to be open in our governance, in our membership and also in our source code and data. And it’s that openness of source code that’s the reason for my post today - on 26th September 2022, our first collaboration with the JSON Forms open-source project was released into the wild.

Accessibility for Crossref DOI Links: Call for comments on proposed new guidelines

Our entire community – members, metadata users, service providers, community organizations and researchers – create and/or use DOIs in some way so making them more accessible is a worthy and overdue effort. For the first time in five years and only the second time ever, we are recommending some changes to our DOI display guidelines (the changes aren’t really for display but more on that below). We don’t take such changes lightly, because we know it means updating established workflows.

With a little help from your Crossref friends: Better metadata

We talk so much about more and better metadata that a reasonable question might be: what is Crossref doing to help? Members and their service partners do the heavy lifting to provide Crossref with metadata and we don’t change what is supplied to us. One reason we don’t is because members can and often do change their records (important note: updated records do not incur fees!). However, we do a fair amount of behind the scenes work to check and report on the metadata as well as to add context and relationships.

Follow the money, or how to link grants to research outputs

The ecosystem of scholarly metadata is filled with relationships between items of various types: a person authored a paper, a paper cites a book, a funder funded research. Those relationships are absolutely essential: an item without them is missing the most basic context about its structure, origin, and impact. No wonder that finding and exposing such relationships is considered very important by virtually all parties involved. Probably the most famous instance of this problem is finding citation links between research outputs. Lately, another instance has been drawing more and more attention: linking research outputs with grants used as their funding source. How can this be done and how many such links can we observe?

Fast, citable feedback: Peer reviews for preprints and other record types

Crossref has supported depositing metadata for preprints since 2016 and peer reviews since 2018. Now we are putting the two together, in fact we will permit peer reviews to be registered for any record type.

What if I told you that bibliographic references can be structured?

Last year I spent several weeks studying how to automatically match unstructured references to DOIs (you can read about these experiments in my previous blog posts). But what about references that are not in the form of an unstructured string, but rather a structured collection of metadata fields? Are we matching them, and how? Let’s find out.

A simpler text query form

The Simple Text Query form (STQ) allows users to retrieve existing DOIs for journal articles, books, and chapters by cutting and pasting a reference or reference list into a simple query box. For years the service has been heavily used by students, editors, researchers, and publishers eager to match and link references.

We had changes to the service planned for the first half of this year - an upgraded reference matching algorithm, a more modern interface, etc. In the spirit of openness and transparency, part of our project plan was to communicate these pending changes to STQ users well in advance of our 30 April completion date. What would users think? Could they help us improve upon our plans?

Reference matching: for real this time

In my previous blog post, Matchmaker, matchmaker, make me a match, I compared four approaches for reference matching. The comparison was done using a dataset composed of automatically-generated reference strings. Now it’s time for the matching algorithms to face the real enemy: the unstructured reference strings deposited with Crossref by some members. Are the matching algorithms ready for this challenge? Which algorithm will prove worthy of becoming the guardian of the mighty citation network? Buckle up and enjoy our second matching battle!

Matchmaker, matchmaker, make me a match

Matching (or resolving) bibliographic references to target records in the collection is a crucial algorithm in the Crossref ecosystem. Automatic reference matching lets us discover citation relations in large document collections, calculate citation counts, H-indexes, impact factors, etc. At Crossref, we currently use a matching approach based on reference string parsing. Some time ago we realized there is a much simpler approach. And now it is finally battle time: which of the two approaches is better?