#PDF/A – Curse or blessing for #openaccess #ceurws
I recently attended the local #openaccess week at our university. My great colleague Thomas gave a presentation on PDF/A and tools to test for compliance of PDF/A. I learned that that are many versions of PDF and also of PDF/A. PDF/A is meant for long-term archiving documents. But there isn’t even an agreement on the precise interpretation of the PDF/A rules. For example, PDF/A is very picky on (unencrypted) metadata. So, if you include a PDF (or JPG) image inside a PDF/A document, then different experts have different opinions on whether the embedded image must come with metadata.
Since I prefer myself LaTeX, I was wondering whether the PDF produced by LaTeX is compliant to PDF/A. Well, it usually is NOT compliant. In particular, it seems very difficult to create PDF/A-1 compliant code via LaTeX. The situation is technically better when using MS-Word or LibreOffice. However, even then most PDF documents do not come with proper metadata because authors do not care.
So, what is the value of PDF/A for science when we hardly can produce it? For CEUR-WS.org, we would be interest to facilitate long-term archival. But I am sceptical about the contribution of PDF/A. It is a format from the printer age.
Formats like HTML, SGML, XHTML may be more promising since their focus is on content rather than fonts and a page layout.
A HTML (or similar) document can directly link to references, data sets, and tools. It may even be queried on its content.
What is your view on PDF and PDF/A?
Manfred