#PDF/A – Curse or blessing for #openaccess #ceurws

I recently attended the local #openaccess week at our university. My great colleague Thomas  gave a presentation on PDF/A and tools to test for compliance of PDF/A. I learned that that are many versions of PDF and also of PDF/A. PDF/A is meant for long-term archiving documents. But there isn’t even an agreement on the precise interpretation of the PDF/A rules. For example, PDF/A is very picky on (unencrypted) metadata. So, if you include a PDF (or JPG) image inside a PDF/A document, then different experts have different opinions on whether the embedded image must come with metadata.

Since I prefer myself LaTeX, I was wondering whether the PDF produced by LaTeX is compliant to PDF/A. Well, it usually is NOT compliant. In particular, it seems very difficult to create PDF/A-1 compliant code via LaTeX. The situation is technically better when using MS-Word or LibreOffice. However, even then most PDF documents do not come with proper metadata because authors do not care.

So, what is the value of PDF/A  for science when we hardly can produce it? For CEUR-WS.org, we would be interest to facilitate long-term archival. But I am sceptical about the contribution of PDF/A. It is a format from the printer age.

Formats like HTML, SGML, XHTML may be more promising since their focus is on content rather than fonts and a page layout.

A HTML (or similar) document can directly link to references, data sets, and tools. It may even be queried on its content.

What is your view on PDF and PDF/A?




Leave a Reply (only about CEUR-WS matters)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: