Assessing the quality of scientific output using data – call for participation in the Semantic Publishing Challenge @ ESWC 2014

The Semantic Publishing Challenge at ESWC 2014 has the general objective to assess the quality of scientific output by translating scientific publication data and metadata to a linked dataset and answering queries over the latter.

In particular, its Task 1 is concerned with assessing the quality of workshops published with The challenge is supported by and co-chaired by’s technical editor Christoph Lange.

The basics of participation: implement a tool that translates the HTML tables of contents of the workshop proceedings volumes to a linked dataset, and answer the given queries correctly. Write a 5-page paper that explains your tool. Submit both by 14 March 2014. If your submission is accepted, participate in the challenge on one day between 25–29 May 2014, and hope to win ☺ We would ultimately like to compute quality indicators such as the following:

  • If a workshop series has had a long history, this hints at high quality.
  • If a workshop attracts many submissions (possibly growing over years), it may be of high quality.
  • If a conference attracts many high-quality workshops, it is a high-quality conference.
  • High-quality workshops outside big conferences might be of interest to the organisers of these conferences.
  • If a workshop has a high ratio of invited papers, it may be of low quality (unless there are high-profile invited speakers).
  • A high ratio of submissions (co-)authored by a workshop’s chairs may indicate low quality.
  • A fast publication turnaround (proceedings published quickly after, or even before the workshop) gives an impression of professional organisation, possibly of quality.
  • If person P1 is an invited speaker in a workshop chaired by person P2 and vice versa, these persons might not be good speakers (and thus contributing to a high-quality workshop), but rather just good friends.

The queries that challenge participants are required to answer approximate some of these indicators. Most of the underlying information is not available from existing linked open datasets such as DBLP (see, e.g., this entry for Vol-994), but so far only from the legacy HTML tables of contents published at This challenge calls for the implementation of a tool that transforms them into machine-friendly linked open data. The resulting dataset shall be published under a licence compatible with the one of the workshop proceedings and thus expose the information about workshops for future information analysis and scientometrics.

Similarly, Task 2 calls for creating a linked open dataset from PubMedCentral and other bibliographic databases in order to assess the value of journal articles and their citations. Participants are free to submit solutions just to one task, or to both tasks – there will be independent awards for either task.

If you got interested, please check the challenge homepage for further information. We invite you to discuss anything specific to the involvement of into this challenge by posting comments here; for questions about participating in the challenge, however, please subscribe to the challenge’s mailing list.

1 comment

Leave a Reply (only about CEUR-WS matters)

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: