Author Archives: Christoph Lange

The Core Values include “freeness and openness” and a “clear copyright”. With regard to the openness of its data, our current implementation of these two values leads to a stark self-contradiction. Here is why.

Let’s start by revisiting these values:

  1. Freeness and openness: The publication service is free of cost and openly accessible for the academic community. The freeness of costs refers to the main publication service, i.e. to publish a submission that is essentially free of errors.
  2. Clear copyright: The authors shall keep the copyright to their papers. The editors keep the copyright to the proceedings as a whole.

Seems reasonable, doesn’t it? – It does, but only for the papers we publish, not for the metadata about these papers.

I’m starting this discussion in my role of the technical editor. This is so far my personal view, not (yet) the consensus of the team. Part of my mission is working towards the publication of the metadata as Linked Open Data. In particular, I helped to shape the definitions of the 2014 and 2015 Semantic Publishing Challenges to make them a major driver of the technical developments necessary for this mission.

We are an open access publication platform; thus, any paper published with is gold open access. Not only accessing papers, but also publishing them is free of charge.

We do not actually publish open content, because the Open Definition defines that open content “can be freely used, modified, and shared by anyone for any purpose”. This contradicts the way we are currently implementing the “clear copyright” value: neither paper authors nor volume editors have to grant any permission; they reserve all rights.

By the same argument, the metadata about the papers and workshop volumes is not open. Let’s first discuss why data should be open. According to the Open Knowledge Foundation, there are three common reasons, and all of them apply to scientific publishing:

  1. Transparency: Not only do citizens want to understand what their governments are doing, the members of the scientific community also want to be able to assess the quality of the scientific output of their peers (which is the primary motivation for the Semantic Publishing Challenges).
  2. Releasing social and commercial value: Not only assessing the quality of a workshop series or of a paper, but even finding a good paper about some topic, or finding an expert in some field, requires access to data. By merely being able download the HTML and PDF files of workshops, it is hard to realise retrieval or quality assessment in practice. It is even harder to deliver additional social and commercial value. To give a concrete example, researchers recently enquired about the possibility to develop a summarization service for our volumes and to re-publish such summarizations, which would only be with the consent of the copyright owners, i.e. the paper authors, but, to keep the publication process simple, does not ask for them to give their consent.
  3. Participation and engagement: is participatory, by its third fundamental value (“from scientists for scientists”). Every scientist can participate in by publishing a workshop volume, or contributing their papers to such a volume – but once such a volume is published, participation gets reduced to being able to look at papers.

Now assume you want to open your data – how do you, technically, implement this openness, including transparency, the possibility to add value, and the possibility to participate and engage? The 5 Star Open Data scheme argues that Linked Data is the way to go:

  1. using Web-wide unique identifiers (i.e. URIs) for things (here: papers, proceedings volumes, authors, conferences, etc.) – has been using stable URIs such as for a long time,
  2. using HTTP URLs for these identifiers so that information about a thing (here, e.g., the table of contents of a proceedings volume) can be downloaded by simply typing its identifier into the browser’s address bar – this is the case at,
  3. providing machine-comprehensible information about things for download from these URLs – this is not the case, as we only serve HTML and PDF designed for human consumption,
  4. providing links to other things so that further information can be discovered – this is not the case, as we leave submitted HTML and PDF files unchanged.

Linked Data principles (1) and (2) are prerequisites for 4-star open data, so is (3), and (4) is a prerequisite for the fifth star. All in all, the papers, published as PDF, gain one star, and the HTML tables of content gain between one and three stars: you can manipulate them (e.g. enlarge the font size for readability) without proprietary software, but you can only manipulate their presentational aspects; you cannot, e.g., access them like a database to filter papers by topic or by author.

After the 2014 Semantic Publishing Challenge, and at the verge of announcing the 2015 Challenge, we are technically ready to publish at least the metadata of all papers as Linked Data. The information extraction tools developed by the participants of the 2014 Challenge, in particular the winning one by Maxim Kolchin and Fedor Kozlov, combined with some scripts for automating the publishing workflow, make it possible.

However, there is a legal obstacle. The editors of the proceedings volumes own the copyright, and in particular never asked for their permission to re-publish derivatives of the metadata of workshops and papers. An RDF representation of a workshop’s table of contents is such a derivative, even if just w.r.t. the technical format, not w.r.t. the content. One may argue that the fact that someone published a paper somewhere is public, non-copyrightable information, and our tables of content contain little more information than that. One may also argue that others have been publishing derivatives of the metadata for a long time: DBLP indexes a subset of with the consent of the publisher, but actually not with the consent of the copyright owners, i.e. the proceedings editors, and it even publishes these derivatives under an open license, and it makes them available as RDF Linked Data. This is widely regarded fair use, but DBLP are doing so at their own risk – and would itself want to run such a risk?

To be fair, has been making an effort towards open data and linked data for a while: based on the results of a survey among former editors, the CC0 open data license became mandatory for metadata until 2014 (effective as of volume 1263). The first linked data enhusiasts published a volume annotated with machine-comprehensible RDFa attributes as early as 2009. RDFa became officially supported in 2013, and the ceur-make tool facilitates its generation – but still this is something for technophiles and only used by less than 1 out of 10 volume editors.

As a result, most of’s data is neither open nor linked. We could wait until volume 2526, when CC0-licensed metadata will be in the majority, but thorough quality analysis requires a look back into the history of workshops, and the “old” proceedings volumes also still provide the majority of connection points to other linked open datasets, including DBLP, the Semantic Web Dog Food Corpus, COLINDA and even datasets of commercial publishers.

So, what can we do to open and to link the metadata of all volumes ≤ 1263? Note that technically it is possible to partition a linked dataset and to give its different parts different licenses – CC0 for volumes ≥ 1263, and “all rights reserved” for volumes < 1263. The question is whether this is how we want to continue implementing our values.

The Semantic Publishing Challenge at ESWC 2014 has the general objective to assess the quality of scientific output by translating scientific publication data and metadata to a linked dataset and answering queries over the latter.

In particular, its Task 1 is concerned with assessing the quality of workshops published with The challenge is supported by and co-chaired by’s technical editor Christoph Lange.

The basics of participation: implement a tool that translates the HTML tables of contents of the workshop proceedings volumes to a linked dataset, and answer the given queries correctly. Write a 5-page paper that explains your tool. Submit both by 14 March 2014. If your submission is accepted, participate in the challenge on one day between 25–29 May 2014, and hope to win ☺ Read More