We are excited to welcome Matteo Baldoni from University of Turin (Italy) to our CEUR-WS-org Management Team. Matteo shall be responsible for the AI*IA subseries inside CEUR-WS.org, i.e. for workshop proceedings submitted from members of the Italian Association of Artificial Intelligence (AI*IA).


The regular PDF files do not include all fonts but rather expect that the PDF Reader on the devive (PC, tablet,…) has those fonts installed locally. This could make PDF files unreadable in the future when they are viewed on devices that do not have those fonts.

PDF/A in its various incarnations promises a way out of this dilemma by including alls required font definitions in the PDF file, and restricting certain other elements such as hyperlinks. The great disadvantage is that such PDF files get really big.

My question: Shall we anyway move to PDF/A in scientific publishing? If yes: which of the variants?

Or should we rather promote a non-PDF format such as HTML5?

Opinions welcome!


Workshops are typically created when researchers feel the need to discuss some new ideas in a specialized community. The proceedings editors of such workshops are a vital part of the community and it makes perfect sense that they have something to contribute, for example research papers.

The very first volume of CEUR-WS.org (KRDB-94) was co-edited by me and I actually also published a paper there. So, I did this in the past. Ir wasn’t my greatest paper I must say.

But times are changing. The number of workshops are growing and I see quite a number of workshops at CEUR-WS.org where a major portion of the published papers are co-authored by one of the editors.

I believe that all such papers are peer-reviewed but still I feel that there is something wrong if editors publish papers in their own proceedings volume.

So my question to the community is: Should we ban such papers in the future?

Cheers: Manfred

Dear CEUR-WS editors and users!

CEUR-WS.org celebrates its 20th anniversary on April 21, 2015.

Actually, the first volume was published already on April 7, 1995 but the instructions on how to submit were published on April 21, 1995. That is the official start of the service.

CEUR-WS.org became possible after Sun Microsystems Germany had donated a powerful server (“Sun SITE”) to the Informatik V institute of Prof. Matthias Jarke at RWTH Aachen, Germany. The Sun SITE got the name “Sun SITE Central Europe” and CEUR-WS became one of its first services. Indeed, the acronym “CEUR” was derived from “Central Europe”. We used that label because it was founded in Aachen, which was the residence of Charlemagne around 800 AD and continued to be the place where the medieval emperors were crowned.

After a slows start of the service, CEUR-WS.org attracts now 200 volume submissions per year (figure of 2014) and has become a popular publication channel, in particular for workshops in the computer science domain.

We thank all workshop organizers who published with CEUR-WS.org for their trust, and look forward to improve the service with your support!

Manfred Jeusfeld, founder of CEUR-WS.org

In September 2013, I had posted here an article”Is a paper just a PDF file”. Scientific articles frequently are based on data sets, or program code, or detailed images. The results of the article should be repeatable by other scientists and this requires convenient access to these artefacts.

I have now set up an experimental directory structure that can cater for these needs. The directory contains the original paper (e.g. as PDF) plus all additional elements. The semantically enhanced index.html file is the entry point.


What do you think about this model? Is there anything that is missing?

Comments are welcome!

Kind greetings, Manfred

The CEUR-WS.org Core Values include “freeness and openness” and a “clear copyright”. With regard to the openness of its data, our current implementation of these two values leads to a stark self-contradiction. Here is why.

Let’s start by revisiting these values:

  1. Freeness and openness: The publication service is free of cost and openly accessible for the academic community. The freeness of costs refers to the main publication service, i.e. to publish a submission that is essentially free of errors.
  2. Clear copyright: The authors shall keep the copyright to their papers. The editors keep the copyright to the proceedings as a whole.

Seems reasonable, doesn’t it? – It does, but only for the papers we publish, not for the metadata about these papers.

I’m starting this discussion in my role of the CEUR-WS.org technical editor. This is so far my personal view, not (yet) the consensus of the CEUR-WS.org team. Part of my mission is working towards the publication of the CEUR-WS.org metadata as Linked Open Data. In particular, I helped to shape the definitions of the 2014 and 2015 Semantic Publishing Challenges to make them a major driver of the technical developments necessary for this mission.

We are an open access publication platform; thus, any paper published with CEUR-WS.org is gold open access. Not only accessing papers, but also publishing them is free of charge.

We do not actually publish open content, because the Open Definition defines that open content “can be freely used, modified, and shared by anyone for any purpose”. This contradicts the way we are currently implementing the “clear copyright” value: neither paper authors nor volume editors have to grant any permission; they reserve all rights.

By the same argument, the metadata about the papers and workshop volumes is not open. Let’s first discuss why data should be open. According to the Open Knowledge Foundation, there are three common reasons, and all of them apply to scientific publishing:

  1. Transparency: Not only do citizens want to understand what their governments are doing, the members of the scientific community also want to be able to assess the quality of the scientific output of their peers (which is the primary motivation for the Semantic Publishing Challenges).
  2. Releasing social and commercial value: Not only assessing the quality of a workshop series or of a paper, but even finding a good paper about some topic, or finding an expert in some field, requires access to data. By merely being able download the HTML and PDF files of CEUR-WS.org workshops, it is hard to realise retrieval or quality assessment in practice. It is even harder to deliver additional social and commercial value. To give a concrete example, researchers recently enquired about the possibility to develop a summarization service for our volumes and to re-publish such summarizations, which would only be with the consent of the copyright owners, i.e. the paper authors, but, to keep the publication process simple, CEUR-WS.org does not ask for them to give their consent.
  3. Participation and engagement: CEUR-WS.org is participatory, by its third fundamental value (“from scientists for scientists”). Every scientist can participate in CEUR-WS.org by publishing a workshop volume, or contributing their papers to such a volume – but once such a volume is published, participation gets reduced to being able to look at papers.

Now assume you want to open your data – how do you, technically, implement this openness, including transparency, the possibility to add value, and the possibility to participate and engage? The 5 Star Open Data scheme argues that Linked Data is the way to go:

  1. using Web-wide unique identifiers (i.e. URIs) for things (here: papers, proceedings volumes, authors, conferences, etc.) – CEUR-WS.org has been using stable URIs such as http://ceur-ws.org/Vol-1155/ for a long time,
  2. using HTTP URLs for these identifiers so that information about a thing (here, e.g., the table of contents of a proceedings volume) can be downloaded by simply typing its identifier into the browser’s address bar – this is the case at CEUR-WS.org,
  3. providing machine-comprehensible information about things for download from these URLs – this is not the case, as we only serve HTML and PDF designed for human consumption,
  4. providing links to other things so that further information can be discovered – this is not the case, as we leave submitted HTML and PDF files unchanged.

Linked Data principles (1) and (2) are prerequisites for 4-star open data, so is (3), and (4) is a prerequisite for the fifth star. All in all, the CEUR-WS.org papers, published as PDF, gain one star, and the HTML tables of content gain between one and three stars: you can manipulate them (e.g. enlarge the font size for readability) without proprietary software, but you can only manipulate their presentational aspects; you cannot, e.g., access them like a database to filter papers by topic or by author.

After the 2014 Semantic Publishing Challenge, and at the verge of announcing the 2015 Challenge, we are technically ready to publish at least the metadata of all CEUR-WS.org papers as Linked Data. The information extraction tools developed by the participants of the 2014 Challenge, in particular the winning one by Maxim Kolchin and Fedor Kozlov, combined with some scripts for automating the publishing workflow, make it possible.

However, there is a legal obstacle. The editors of the proceedings volumes own the copyright, and in particular CEUR-WS.org never asked for their permission to re-publish derivatives of the metadata of workshops and papers. An RDF representation of a workshop’s table of contents is such a derivative, even if just w.r.t. the technical format, not w.r.t. the content. One may argue that the fact that someone published a paper somewhere is public, non-copyrightable information, and our tables of content contain little more information than that. One may also argue that others have been publishing derivatives of the CEUR-WS.org metadata for a long time: DBLP indexes a subset of CEUR-WS.org with the consent of the CEUR-WS.org publisher, but actually not with the consent of the copyright owners, i.e. the proceedings editors, and it even publishes these derivatives under an open license, and it makes them available as RDF Linked Data. This is widely regarded fair use, but DBLP are doing so at their own risk – and would CEUR-WS.org itself want to run such a risk?

To be fair, CEUR-WS.org has been making an effort towards open data and linked data for a while: based on the results of a survey among former editors, the CC0 open data license became mandatory for metadata until 2014 (effective as of volume 1263). The first linked data enhusiasts published a volume annotated with machine-comprehensible RDFa attributes as early as 2009. RDFa became officially supported in 2013, and the ceur-make tool facilitates its generation – but still this is something for technophiles and only used by less than 1 out of 10 volume editors.

As a result, most of CEUR-WS.org’s data is neither open nor linked. We could wait until volume 2526, when CC0-licensed metadata will be in the majority, but thorough quality analysis requires a look back into the history of workshops, and the “old” proceedings volumes also still provide the majority of connection points to other linked open datasets, including DBLP, the Semantic Web Dog Food Corpus, COLINDA and even datasets of commercial publishers.

So, what can we do to open and to link the metadata of all volumes ≤ 1263? Note that technically it is possible to partition a linked dataset and to give its different parts different licenses – CC0 for volumes ≥ 1263, and “all rights reserved” for volumes < 1263. The question is whether this is how we want to continue implementing our values.