Dear CEUR-WS editors and users!

CEUR-WS.org celebrates its 20th anniversary on April 21, 2015.

Actually, the first volume was published already on April 7, 1995 but the instructions on how to submit were published on April 21, 1995. That is the official start of the service.

CEUR-WS.org became possible after Sun Microsystems Germany had donated a powerful server (“Sun SITE”) to the Informatik V institute of Prof. Matthias Jarke at RWTH Aachen, Germany. The Sun SITE got the name “Sun SITE Central Europe” and CEUR-WS became one of its first services. Indeed, the acronym “CEUR” was derived from “Central Europe”. We used that label because it was founded in Aachen, which was the residence of Charlemagne around 800 AD and continued to be the place where the medieval emperors were crowned.

After a slows start of the service, CEUR-WS.org attracts now 200 volume submissions per year (figure of 2014) and has become a popular publication channel, in particular for workshops in the computer science domain.

We thank all workshop organizers who published with CEUR-WS.org for their trust, and look forward to improve the service with your support!

Manfred Jeusfeld, founder of CEUR-WS.org

In September 2013, I had posted here an article”Is a paper just a PDF file”. Scientific articles frequently are based on data sets, or program code, or detailed images. The results of the article should be repeatable by other scientists and this requires convenient access to these artefacts.

I have now set up an experimental directory structure that can cater for these needs. The directory contains the original paper (e.g. as PDF) plus all additional elements. The semantically enhanced index.html file is the entry point.

http://ceur-ws.org/Vol-XXX/paperX/

What do you think about this model? Is there anything that is missing?

Comments are welcome!

Kind greetings, Manfred

The CEUR-WS.org Core Values include “freeness and openness” and a “clear copyright”. With regard to the openness of its data, our current implementation of these two values leads to a stark self-contradiction. Here is why.

Let’s start by revisiting these values:

  1. Freeness and openness: The publication service is free of cost and openly accessible for the academic community. The freeness of costs refers to the main publication service, i.e. to publish a submission that is essentially free of errors.
  2. Clear copyright: The authors shall keep the copyright to their papers. The editors keep the copyright to the proceedings as a whole.

Seems reasonable, doesn’t it? – It does, but only for the papers we publish, not for the metadata about these papers.

I’m starting this discussion in my role of the CEUR-WS.org technical editor. This is so far my personal view, not (yet) the consensus of the CEUR-WS.org team. Part of my mission is working towards the publication of the CEUR-WS.org metadata as Linked Open Data. In particular, I helped to shape the definitions of the 2014 and 2015 Semantic Publishing Challenges to make them a major driver of the technical developments necessary for this mission.

We are an open access publication platform; thus, any paper published with CEUR-WS.org is gold open access. Not only accessing papers, but also publishing them is free of charge.

We do not actually publish open content, because the Open Definition defines that open content “can be freely used, modified, and shared by anyone for any purpose”. This contradicts the way we are currently implementing the “clear copyright” value: neither paper authors nor volume editors have to grant any permission; they reserve all rights.

By the same argument, the metadata about the papers and workshop volumes is not open. Let’s first discuss why data should be open. According to the Open Knowledge Foundation, there are three common reasons, and all of them apply to scientific publishing:

  1. Transparency: Not only do citizens want to understand what their governments are doing, the members of the scientific community also want to be able to assess the quality of the scientific output of their peers (which is the primary motivation for the Semantic Publishing Challenges).
  2. Releasing social and commercial value: Not only assessing the quality of a workshop series or of a paper, but even finding a good paper about some topic, or finding an expert in some field, requires access to data. By merely being able download the HTML and PDF files of CEUR-WS.org workshops, it is hard to realise retrieval or quality assessment in practice. It is even harder to deliver additional social and commercial value. To give a concrete example, researchers recently enquired about the possibility to develop a summarization service for our volumes and to re-publish such summarizations, which would only be with the consent of the copyright owners, i.e. the paper authors, but, to keep the publication process simple, CEUR-WS.org does not ask for them to give their consent.
  3. Participation and engagement: CEUR-WS.org is participatory, by its third fundamental value (“from scientists for scientists”). Every scientist can participate in CEUR-WS.org by publishing a workshop volume, or contributing their papers to such a volume – but once such a volume is published, participation gets reduced to being able to look at papers.

Now assume you want to open your data – how do you, technically, implement this openness, including transparency, the possibility to add value, and the possibility to participate and engage? The 5 Star Open Data scheme argues that Linked Data is the way to go:

  1. using Web-wide unique identifiers (i.e. URIs) for things (here: papers, proceedings volumes, authors, conferences, etc.) – CEUR-WS.org has been using stable URIs such as http://ceur-ws.org/Vol-1155/ for a long time,
  2. using HTTP URLs for these identifiers so that information about a thing (here, e.g., the table of contents of a proceedings volume) can be downloaded by simply typing its identifier into the browser’s address bar – this is the case at CEUR-WS.org,
  3. providing machine-comprehensible information about things for download from these URLs – this is not the case, as we only serve HTML and PDF designed for human consumption,
  4. providing links to other things so that further information can be discovered – this is not the case, as we leave submitted HTML and PDF files unchanged.

Linked Data principles (1) and (2) are prerequisites for 4-star open data, so is (3), and (4) is a prerequisite for the fifth star. All in all, the CEUR-WS.org papers, published as PDF, gain one star, and the HTML tables of content gain between one and three stars: you can manipulate them (e.g. enlarge the font size for readability) without proprietary software, but you can only manipulate their presentational aspects; you cannot, e.g., access them like a database to filter papers by topic or by author.

After the 2014 Semantic Publishing Challenge, and at the verge of announcing the 2015 Challenge, we are technically ready to publish at least the metadata of all CEUR-WS.org papers as Linked Data. The information extraction tools developed by the participants of the 2014 Challenge, in particular the winning one by Maxim Kolchin and Fedor Kozlov, combined with some scripts for automating the publishing workflow, make it possible.

However, there is a legal obstacle. The editors of the proceedings volumes own the copyright, and in particular CEUR-WS.org never asked for their permission to re-publish derivatives of the metadata of workshops and papers. An RDF representation of a workshop’s table of contents is such a derivative, even if just w.r.t. the technical format, not w.r.t. the content. One may argue that the fact that someone published a paper somewhere is public, non-copyrightable information, and our tables of content contain little more information than that. One may also argue that others have been publishing derivatives of the CEUR-WS.org metadata for a long time: DBLP indexes a subset of CEUR-WS.org with the consent of the CEUR-WS.org publisher, but actually not with the consent of the copyright owners, i.e. the proceedings editors, and it even publishes these derivatives under an open license, and it makes them available as RDF Linked Data. This is widely regarded fair use, but DBLP are doing so at their own risk – and would CEUR-WS.org itself want to run such a risk?

To be fair, CEUR-WS.org has been making an effort towards open data and linked data for a while: based on the results of a survey among former editors, the CC0 open data license became mandatory for metadata until 2014 (effective as of volume 1263). The first linked data enhusiasts published a volume annotated with machine-comprehensible RDFa attributes as early as 2009. RDFa became officially supported in 2013, and the ceur-make tool facilitates its generation – but still this is something for technophiles and only used by less than 1 out of 10 volume editors.

As a result, most of CEUR-WS.org’s data is neither open nor linked. We could wait until volume 2526, when CC0-licensed metadata will be in the majority, but thorough quality analysis requires a look back into the history of workshops, and the “old” proceedings volumes also still provide the majority of connection points to other linked open datasets, including DBLP, the Semantic Web Dog Food Corpus, COLINDA and even datasets of commercial publishers.

So, what can we do to open and to link the metadata of all volumes ≤ 1263? Note that technically it is possible to partition a linked dataset and to give its different parts different licenses – CC0 for volumes ≥ 1263, and “all rights reserved” for volumes < 1263. The question is whether this is how we want to continue implementing our values.

CEUR-WS.org has now a new publisher, Ruzica Piskac, who has been managing editor  since 2008. When Ruzica joined, we had just 350  volumes published from the 1995 (the  year when the service started) to 2008, i.e. 13 years. Now we have more than 1200 volumes published and probably the majority of volume publications where handled by Ruzica. Great success with the new role, Ruzica!

CEUR-WS.org has now  a new board called ‘advisory team’. As I stepped down as publisher, I assume the role of a chair in that board. I am happy to welcome two brilliant senior colleagues to this board: Diego Calvanese and Ralf Klamma. Both have a long history in using CEUR-WS.org either as proceedings editor or author.

Christoph will continue as technical editor in the management team. I am very happy to have you on board! Exciting innovations are about to come!

The full details of the CEUR-WS.org Team are at http://ceur-ws.org/CEURWS-TEAM.html

Kind greetings, Manfred, 2014-10-07

Dear colleagues,

we publish since 1995 the open-access workshop proceedings series CEUR-WS.org.
The service is free-of-charge for readers and editors of the proceedings.

Some weeks ago I sent a questionnaire to a past editors of workshop proceedings
about their preferences for the licenses for
(1) the meta data of the proceedings, i.e. the bibliographic details
of the papers and the proceedings as a whole
(2) the papers themselves

The relative majority of respondents preferred CC0 (“public domain”) for the meta data
but there was a rather unclear result about the preferred license for the papers.

I would like to raise this question to your attention.

Some OA services prefer a CC-BY license. But the consequence can be harmful!

Suppose that there is a pool of papers, all published with a CC-BY license
and all about some subject, let’s say about R-Trees.

The papers could all come from different sources (OA conference proceedings,
OA journals, OA workshop proceedings).

Now, anyone (let’s call him John Doe) can retrieve a subset of those papers and publish them in
a new book, edited by a person who never talked to the authors or to
the editors of the original publication.

If it is just CC-BY, then John Doe can even slightly change the papers, e.g. omitting
some chapters or some figures. This would nit violate the CC-BY license as long as John Doe
includes in his new book a page with the references to the original papers.

John Dow could also include his own (low quality) papers on R-Tress into his new book,
side by side with the peer-reviewed papers that he downloaded.

Now, this would be morally wrong and a violation of scientific standards.
But it shows that CC-BY is rather inappropriate for scientific papers.
Even CC-BY-ND is problematic, since it still allows to republish without
the consent of the authors.

Likewise CC-BY-NC is not helping. It may prevent commercial players to
make money out of OA papers. But still, a person could corrupt the original
papers by re-pblishing them free-of-charge in a FALSE CONTEXT.

The conclusion is: CEUR-WS.org should keep the existing copyright clause
“Copyright © XXXX for the individual papers by the papers’ authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors.”

This does not license re-publication.

Comments welcome!

Manfred Jeusfeld

I sent a survey on the preferred license model for papers and for metadata (mostly of bibliographic nature) to former editors of proceedings published at CEUR-WS.org. The results indicate that the preferred license model is CC0 (public domain). The votes so far are:

 

5: no change (i.e. no specific license)

30: CC0 (i.e. public domain)

11: CC4.0-BY

17: ODC1.0-BY

 

The CC0 license is the simplest one. Note that this is only about the metadata, not the papers. So, basically the references to the papers can be used as public domain data. This allows organizations that build indexes of scientific output to use the index files of CEUR-WS.org volumes without further licenses and without asking the editors for permission.