# PDF or PDF/A file? - (Moved)

Go to solution Solved by bdr529,

## Recommended Posts

Posted (edited)

Hello, is it possible to use AutoIT to determine whether a file is in PDF or PDF/A (or PDF/A-1, -2, -3) format?

Edited by protfromkpax
##### Share on other sites
• protfromkpax changed the title to PDF or PDF/A file?

Have you reviewed this thread?

##### Share on other sites
• Developers
Posted (edited)

Moved to the appropriate AutoIt General Help and Support forum, as the Developer General Discussion forum very clearly states:

Quote

General development and scripting discussions.

Do not create AutoIt-related topics here, use the AutoIt General Help and Support or AutoIt Technical Discussion forums.

Moderation Team

Edited by Jos

SciTE4AutoIt3 Full installer Download page   - Beta files

Live for the present,
Dream of the future,
Learn from the past.

##### Share on other sites
12 hours ago, Danp2 said:

Have you reviewed this thread?

Yes, but the thread does not contain PDF/A...

##### Share on other sites

I assume you tried running the code. What were the results?

##### Share on other sites
Posted (edited)
1 hour ago, Danp2 said:

I assume you tried running the code. What were the results?

The code checks if the file is in PDF format (at the beginning of PDF file is version 1.5, 1.7...), but doesn't know the difference between PDF and PDF/A.
I tried to run, result: "PDF-File detected" 🙂

Edited by protfromkpax
##### Share on other sites

When the original file is from Word, I can recognize the PDF/A by the parts of the PDF file.
When it contains graphics, there are "Chinese characters" inside the PDF...

##### Share on other sites

Validating that a document is PDF/A is kind of a black magic test. There are test suites available that check the contents of a PDF against the standard but the standard is kind of nebulous in its definitions. The most reliable (IMHO) is from VeraPDF (https://verapdf.org/software/). It's java based but comes with a decent command line tool.

The command line:

verapdf.bat --format text "c:\path\to\yourfile.PDF"

will simplify the output to pass/fail.

##### Share on other sites
14 hours ago, rsn said:

Validating that a document is PDF/A is kind of a black magic test. There are test suites available that check the contents of a PDF against the standard but the standard is kind of nebulous in its definitions. The most reliable (IMHO) is from VeraPDF (https://verapdf.org/software/). It's java based but comes with a decent command line tool.

The command line:


verapdf.bat --format text "c:\path\to\yourfile.PDF"

will simplify the output to pass/fail.

Thanks, the GUI works but the command prompt says "access denied"...
I'm writing sw for a specific professional group of users to sign a set of PDF files. Due to a change in the law we have to use PDF/A so I need a check. Since my sw runs on many other PCs, I'm looking for a solution where I can write some code in AutoIT and send it as an update...
I can see that it probably won't be easy 🙂

##### Share on other sites

Not sure why you'd get an access denied in any of it. The applet isn't really "installed," just kind of copied to your user profile. Maybe a java issue?

The VeraPDF test suite is actually open source (GPL3/MPL2) so if you had the time and talent (unlike me! )  you could compile your own version of the test or convert it to your language of choice. See https://github.com/verapdf. Now that I think on it, since it's so liberally licensed, you might even be able to bundle it with your app. As long as some form of java interpreter is present on the PC as well (a custom/mini build of OpenJDK would work to get around Oracle's fees for business/enterprise use).

##### Share on other sites
• Solution
#include <String.au3>
msgbox("","",check_pdfa("AutoIt_Featured_640x480.pdf"))
func check_pdfa($file_init_pdf) dim$fileopen=fileopen($file_init_pdf,16) dim$fileread=BinaryToString(FileRead ($fileopen)) fileclose($fileopen)
dim $versione_pdf=stringmid($fileread,2,7)
dim $_StringBetween_part=_StringBetween($fileread,"pdfaid:part='","'")
dim $_StringBetween_conformance=_StringBetween($fileread,"pdfaid:conformance='","'")
if not isarray($_StringBetween_part) or not isarray($_StringBetween_conformance) then
$_StringBetween_part=_StringBetween($fileread,'pdfaid:part="','"')
$_StringBetween_conformance=_StringBetween($fileread,'pdfaid:conformance="','"')
endif
if not isarray($_StringBetween_part) or not isarray($_StringBetween_conformance) then
$_StringBetween_part=_StringBetween($fileread,"pdfaid:part>","<")
$_StringBetween_conformance=_StringBetween($fileread,"pdfaid:conformance>","<")
endif
if isarray($_StringBetween_part) and isarray($_StringBetween_conformance) and ($_StringBetween_part[0]="1" or$_StringBetween_part[0]="2" or $_StringBetween_part[0]="3") and _ ($_StringBetween_conformance[0]="a" or $_StringBetween_conformance[0]="b" or$_StringBetween_conformance[0]="u") Then
if $_StringBetween_part[0]&$_StringBetween_conformance[0]<>"1u" then
return seterror(0,0,$versione_pdf&" PDF/A-"&$_StringBetween_part[0]&$_StringBetween_conformance[0]) Else return seterror(2,0,$versione_pdf)
endif
Else
return seterror(1,0,$versione_pdf) endif EndFunc Sono io a ringraziare la community di autoit ##### Link to post ##### Share on other sites 15 hours ago, bdr529 said: #include <String.au3> msgbox ( "" , "" , check _ pdfa ( "AutoIt_Featured_640x480.pdf " ) ) func check _ pdfa ($ file_init_pdf )
dim  $fileopen = fileopen ($ file_init_ToString ) $1 ( FileRead ($fileopen ) ) fileclose ( $fileopen ) dim$ versione_pdf =

stringmid ( $fileread , 2 , 7 ) dim$_StringBetween_part = _StringBetween ( $fileread , "pdfaid:part='" , "'" ) dim$_StringBetween_conformance = _StringBetween ( $filereadcon , "pdfaid " : ) pokud není isarray ($_StringBetween_part ) nebo není isarray ( $_StringBetween_conformance ) , pak$_StringBetween_part

= _StringBetween ( $fileread , 'pdfaid:part="' , '"' )$_StringBetween_conformance = _StringBetween ( $fileread , 'pdfaid:conformance="' , '"' ) endif if ne isarray ($ _Stringray )  Between nebo  ne  $_StringBetween_conformance ) potom$_StringBetween_part = _StringBetween ( $fileread , "pdfaid:part>" , "<" )$_StringBetween_conformance = _StringBetween ( $fileread , "pdfaid:conformance>" , "<" ) endif if isarray ($_StringBetween_part )  a  isarray ( $_StringBetween_conformance ) a ($_StringBetween_conformance ) a ( $_StringBetween_conformance ) a ($_StringBetween_conformance ) a ( $_StringBetween_conformance ) a ($_StringBetween_conformance ) = [ 0 ] nebo $_StringBetween_part = [ 0 ] "2" nebo$_StringBetween_part [ 0 ] = "3" ) a _ (
$_StringBetween_conformance [ 0 ] = "a" nebo$_StringBetween_conformance [ 0 ] = "b"  nebo  $_StringBetween_conformance [ 0 ] = "u" ) Potom , pokud$_StringBetween_part [ 0 ] & $_StringBetween_conform [ vrátit potom 1 " > < u> seterror ( 0 , 0 ,$versione_pdf & " PDF/A-" &$_StringBetween_part [ 0 ] &$_StringBetween_conformance [ 0 ] )
Else
return  seterror ( 2 , 0 , $versione_pdf ) endif Else return seterror ( 1 , 0 ,$ versione_pdf )
endif
EndFunc

You are a great magician! This is exactly what I was looking for. I only understand a little code, but I will learn everything.
Thank you so much, I want to dance with joy! (now many nights await me on your code 🙂 )

##### Share on other sites

I'm the one to thank the autoit community

Sono io a ringraziare la community di autoit

##### Share on other sites

@bdr529 Until I read your code, I never thought to read the metadata. I open a pdf in HxD and there it is: the versions of the PDF and which levels of conformance. I didn't realize that some of the meta data is excluded from the viewable properties. Great work!

## Create an account or sign in to comment

You need to be a member in order to leave a comment

## Create an account

Sign up for a new account in our community. It's easy!

Register a new account

• ### Recently Browsing   0 members

×

• Wiki

• Back

• #### Beta

• Git
• FAQ
×
• Create New...