Blog

Tony Hammond

Tony worked alongside Crossref at nature.com between 2006 and 2010.

XMP Primer

Tony Hammond

Tony Hammond – 2009 June 10

In XMP

There’s a new XMP Primer (PDF) by Ron Roskiewicz (ed. Dianne Kennedy) available from XMP-Open. This is copyrighted 2008 but I only just saw this now. This is a 43 page document which provides a very gentle introduction to metadata and labelling of media and then introduces XMP into the content lifecycle and talks to the business case for using XMP. The primer covers the following areas: Introduction to Metadata Introduction to XMP XMP and the Content Lifecycle XMP in Action; Use Cases Additional XMP Resources One small gripe would be that this seems to have been prepared for US letter-sized pages and although is printable on A4 there is the slightest of clippings on the right-hand margin with no real loss of information but it does confer a sense of “incompleteness”.

Aligning OpenSearch and SRU

Tony Hammond

Tony Hammond – 2009 June 05

In Search

[Update - 2009.06.07: As pointed out by Todd Carpenter of NISO (see comments below) the phrase “SRU by contrast is an initiative to update Z39.50 for the Web” is inaccurate. I should have said “By contrast SRU is an initiative recognized by ZING (Z39.50 International Next Generation) to bring Z39.50 functionality into the mainstream Web“.]

[Update - 2009.06.08: Bizarrely I find in mentioning query languages below that I omitted to mention SQL. I don’t know what that means. Probably just that there’s no Web-based API. And that again it’s tied to a particular technology - RDBMS.]

queryType.png

(Click image to enlarge.)

There are two well-known public search APIs for generic Web-based search: OpenSearch and SRU. (Note that the key term here is “generic”, so neither Solr/Lucene nor XQuery really qualify for that slot. Also, I am concentrating here on “classic” query languages rather than on semantic query languages such as SPARQL.)

OpenSearch was created by Amazon’s A9.com and is a cheap and cheerful means to interface to a search service by declaring a template URL and returning a structured XML format. It therefore allows for structured result sets while placing no constraints on the query string. As outlined in my earlier post Search Web Service, there is support for search operation control parameters (pagination, encoding, etc.), but no inroads are made into the query string itself which is regarded as opaque.

SRU by contrast is an initiative to update Z39.50 for the Web and is firmly focussed on structured queries and responses. Specifically a query can be expressed in the high-level query language CQL which is independent of any underlying implementation. Result records are returned using any declared W3C XML Schema format and are transported within a defined XML wrapper format for SRU. (Note that the SRU 2.0 draft provides support for arbitrary result formats based on media type.)

One can summarize the respective OpenSearch and SRU functionalities as in this table:

<th width="33%" align="center">
  OpenSearch
</th>

<th width="33%" align="center">
  SRU
</th>
<td align="center">
  no
</td>

<td align="center">
  yes
</td>
<td align="center">
  yes
</td>

<td align="center">
  yes
</td>
<td align="center">
  yes
</td>

<td align="center">
  yes
</td>
<td align="center">
  no
</td>

<td align="center">
  yes
</td>
Structure
query
results
control
diagnostics

What I wanted to discuss here was the OpenSearch and SRU interfaces to a Search Web Service such as outlined in my previous post. The diagram at top of this post shows query forms for OpenSearch and SRU and associated result types. The Search Web Service is taken to be exposing an SRU interface. It might be simplest to walk through each of the cases.

(Continues below.)

Search Web Service

Tony Hammond

Tony Hammond – 2009 May 30

In Search

(Click image to enlarge graphic.) While the OASIS Search Web Services TC is currently working towards reconciling SRU and OpenSearch, I thought it would be useful to share here a simple graphic outlining how a search web service for structured search might be architected. Basically there are two views of this search web service (described in separate XML description files and discoverable through autodiscovery links added to HTML pages):

Structured Search Using PRISM Elements

Tony Hammond

Tony Hammond – 2009 May 30

In Search

We just registered in the SRU (Search and Retrieve by URL) search registry the following components: Context Sets PRISM Context Set version 2.0 PRISM Context Set version 2.1 Schemas PRISM Aggregator Message Record Schema Version 2.0 PRISM Aggregator Message Record Schema Version 2.1 This means that an SRU (Search and Retrieve by URL) search engine that supported one of the PRISM context sets registered above could accept CQL (Contextual Query Language) queries such as the following:

OAI-ORE: Workshop Slides

Tony Hammond

Tony Hammond – 2009 May 26

In Interoperability

An Overview of the OAI Object Reuse and Exchange Interoperability Framework View more Microsoft Word documents from hvdsomp. This is a very slick presentation by Herbert Van de Sompel on OAI-ORE which he’s due to give today for a workshop at the INFORUM 2009 15th Conference on Prrofessional Information Resources in Prague. It’s on the long side at 167 slides but even if you just flip though or sample it selectively you’ll be bound to come away with something.

PRISM Aggregator Message

Tony Hammond

Tony Hammond – 2009 May 08

In Interoperability

The new OAI-PMH interface to Nature.com sports one particular novelty which may well be of interest here: it makes use of the PRISM Aggregator Message. (For an announcement of this service see the post on our web publishing blog Nascent.)

As a protocol for the harvesting of metadata records within a digital repository, OAI-PMH records may be expressed in a variety of different metadata formats. For reasons of interoperability a base metadata format (‘Dublin Core’) is mandated for all OAI-PMH implementations. The expectation is that this base format would be augmented by community-specific vocabularies.

Our natural inclination was to mirror the article descriptions which we already circulate in our RSS feeds and within our HTML pages (as META tags) and PDF files (as XMP packets). In these cases we have used open data models (e.g. RDF) with simple properties cherry-picked from the DC and PRISM namespaces. But OAI-PMH has a special ‘gotcha’ in this regard: any metadata format must allow for W3C XML Schema validation. That is, the properties need to be constrained by an XSD data model. Enter PRISM Aggregator Message (PAM).

(Continues)

Real PRISM in the RSS Wilds

Tony Hammond

Tony Hammond – 2009 February 19

In RSS

Alf Eaton just posted a real nice analysis of ticTOCs RSS feeds. Good to see that almost half of the feeds (46%) are now in RDF and that fully a third (34%) are using PRISM metadata to disclose bibliographic fields. The one downside from a Crossref point of view is that these feeds are still using the old PRISM version (1.2) and not the new version (2.0) which was released a year ago and blogged here.

CURIE Syntax 1.0

Tony Hammond

Tony Hammond – 2009 January 19

In Identifiers

The W3C has recently (Jan. 16) released CURIE Syntax 1.0 as a Candidate Recommendation and is inviting implementations. (Note that I made a fuller post here on CURIEs and erroneously confused the Editor’s Draft (Oct. 23, ’08) as being a Candidate Recommendation. Well, at least it’s got there now.)

Standard InChI Defined

Tony Hammond

Tony Hammond – 2009 January 17

In IdentifiersInChI

IUPAC has just released the final version (1.02) of its InChI software, which generates Standard InChIs and Standard InChIKeys. (InChI is the IUPAC International Chemical Identifier.) The Standard InChI “removes options for properties such as tautomerism and stereoconfiguration”, so that a molecule will always generate the same stable identifier - a unique InChI - which facilitates “interoperability/compatibility between large databases/web searching and information exchange”. Note also that any “shortcomings in Standard InChI may be addressed using non-Standard InChI (currently obtainable using InChI version 1.

XMP Library for Flash

Tony Hammond

Tony Hammond – 2009 January 16

In XMP

Update about new XMP Library from Adobe Labs: “The new Adobe XMP Library for ActionScript is now available for download on Adobe Labs. Adobe Extensible Metadata Platform (XMP) is a labeling technology that allows you to embed data about a file, known as metadata, into the file itself. XMP is an open technology based on RDF and RDF/XML. With this new library you can read existing XMP metadata from Flash based file formats via the Adobe Flash Player.