Sign in to follow this  
Followers 0
dabus

How to get the amount of pages of pdf-files?

10 posts in this topic

Well, I have made an tif and pdf to pdf-converter that does work quite solid, but I am quite sure that I had an error some time ago when testing a lot of files / having about 100 conversions. So I would like to test if the file has a s many pages as it should contain - just to make it rock-solid.

I tried the extprop-udf but the explorer does not seem to get the page-count.

Is there a(n easy) way to get it without opening adobe reader? (I know how to get the info I need from this place.)

I tried to google this but as you may expect, the results were poor. :)

Share this post


Link to post
Share on other sites



I'm not very familiar with DLLCall, but perhaps you could use it with AcroRd32.dll (click here). I believe GetNumPages will be the function you would want to use.

Share this post


Link to post
Share on other sites

@all

this is the code to get the page count :

; Initialize error handler 
$oMyError = ObjEvent("AutoIt.Error","MyErrFunc")

$acrApp = ObjCreate("AcroExch.App")
$avDoc = ObjCreate("AcroExch.AVDoc")

    if IsObj($acrApp ) Then
        ConsoleWrite("OK")
    Else
        ConsoleWrite("OK")
    EndIf
    
$avDoc.Open("C:\Test.pdf", "")
$pdDoc = $avDoc.GetPDDoc()
msgbox(0,"# PAges","nbrPages = " & $pdDoc.GetNumPages())

;------------------------------ This is a COM Error handler --------------------------------
Func MyErrFunc()
  $HexNumber=hex($oMyError.number,8)
  Msgbox(0,"COM Error Test","We intercepted a COM Error !"       & @CRLF  & @CRLF & _
             "err.description is: "    & @TAB & $oMyError.description    & @CRLF & _
             "err.windescription:"     & @TAB & $oMyError.windescription & @CRLF & _
             "err.number is: "         & @TAB & $HexNumber              & @CRLF & _
             "err.lastdllerror is: "   & @TAB & $oMyError.lastdllerror   & @CRLF & _
             "err.scriptline is: "     & @TAB & $oMyError.scriptline     & @CRLF & _
             "err.source is: "         & @TAB & $oMyError.source         & @CRLF & _
             "err.helpfile is: "       & @TAB & $oMyError.helpfile       & @CRLF & _
             "err.helpcontext is: "    & @TAB & $oMyError.helpcontext _
            )
  SetError(1)  ; to check for after this function returns
Endfunc

But the condition is that you have to full version of Acrpbat writer in order the run OLE automation

See here for more info Acrobat OLE automation

regards,

ptrex

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

I'm probably way outta my league here, but I think GMK might be right with the idea of using DllCall(). Looking the links GMK and ptrex provided I thought there has to be another way. After all not everyone is going to have full version of Acrobat Writer. From the link GMK privided, pg 12 has other related documents including Developing for Adobe Reader. Quickly looking at the doc for Reader, pg 21 mentions the PrintParams object and lastPage property. If you open up any pdf in Adobe Reader and print you can see this dialog box offers Print Ranges from and to. To being what I think is the lastPage property.

As great as ptrex is and especially the code he provided on this thread, I'm thinking there's a way to accomplish this without a full version of Adobe Writer. Unfortunately I'm like GMK and new to DllCall(). Hope this helps you dabus.

Edit: More findings... there is a function within AcroRd32.dll called PDDocGetNumPages, but I'm not able to get it to work with DllCall("AcroRd32.dll", "int", "PDDocGetNumPages","int", $filename). Everything I've looked at from searching google has a similar look to ptrex's code with the "AcroExch.App" and "AcroExch.AVDoc". I would have thought there was a way to do it without Writer :)

Edited by ssubirias3

Share this post


Link to post
Share on other sites

You can read it direct from the file:

you can search the \root object, then get the page object and from the page object the page count.

Attache some extracted code.

HTH, Reinhard

#include <GUIConstants.au3>
#include <string.au3>

DIM $gw = 320, $gh = 120

$mainGUI = GUICreate("Read Pdf", $gw,$gh, 40,40,-1,$WS_EX_ACCEPTFILES); WS_EX_ACCEPTFILES
GUICtrlCreateLabel("Pdf (Drag'n Drop",10,5)
$fileINP = GUICtrlCreateInput ( "", 10,  20, 300, 20)
    GUICtrlSetState(-1,$GUI_DROPACCEPTED)
$btnRead = GUICtrlCreateButton ("Read", 10,  50, 60, 20)

GUISetState () 

$msg = 0
While $msg <> $GUI_EVENT_CLOSE
       $msg = GUIGetMsg()
       Select
        case $msg = $btnRead
            ReadFile(GUICtrlRead($fileINp))
        EndSelect
Wend

func ReadFile($file)
    local $xCtPg = 0
    $fsz= filegetsize($file)
    $fileHd = fileOpen($file,16)
    $Fch = fileread($FileHd,$fsz) 
    fileclose($fileHd)
    
    $objNr = GetObj($fch,"/Root")
    $objCon = ReadObj($fch,$objNr)
;; get Total pages
    if stringinstr($objCon,"/Pages") >0 Then
        $xon = GetObj(_stringtoHex($objCon),"/Pages")
        $xoc = ReadObj($fch,$xon)    ;$xon)
        $xCtPg = GetCount($xoc)
    endif   
    msgbox(0,"","Page(s): "&$xctPg)
endfunc
    

func GetObj($fc,$type);; Get the objectnumber
;; get obj
    $typeh = _StringToHex($type)
    $x =stringmid($fc,StringInstr($fc,$typeh),80)
    $x = _hexToString($x)
    $xct = stringlen($type)
    $xon = stringmid($x,$xct+2,stringInstr($x," R")-$xct-2)
;msgBox(0,$type,$xon)   
    return $xon
endfunc


func ReadObj($fc,$xon)  
;; get Content of Catalog
    $objh = _stringtoHex(" obj")
    $objendh = _stringtoHex("endobj")
    $xonh = _stringtoHex($xon)
    $xonhs = "0D"&$xonh&$objh
    $x = stringmid($fc,StringInstr($fc,$xonhs,0,-1))
    if $x="" then $x = stringmid($fc,StringInstr($fc,"0A"&$xonh&$objh,0,-1));uncompressed
;msgBox(0,"",_hextostring($x))
    $xoch = stringmid($x,1,StringInstr($x,$objEndh)-1)
;msgBox(0,$xon,_hextostring($xoch))
    $xoc = _hextostring($xoch)
    return $xoc
endfunc

func GetCount($xoc) ;; get pageCount 
    $xocx = stringMid($xoc,StringInStr($xoc,"/Count ")+7)
        $xct=""
        for $i = 1 to 5 
            $x = stringMid($xocx,$i,1)
            $y = stringIsInt($x)
        ;msgbox(0,"",$i&"="&$x&"isn"&$y)
            if $y = 1 then 
                $xct &= $x
            Else
                exitloop
            EndIf
        next    
;MsgBox(0,"",$xCt)
    return $xCt
endfunc

Share this post


Link to post
Share on other sites

You can read it direct from the file:

you can search the \root object, then get the page object and from the page object the page count.

Attache some extracted code.

:) It does exactly like you said without Adobe Writer!!! Very cool, and I'm sure there's some good stuff to learn from your script. Thanks for sharing!

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

:) It does exactly like you said without Adobe ... !

From time to time I work with Foxit reader ;-).

The code from Petrex and objects/statements like PDDoc, AVDoc, pdDoc.GetNumPages, ... you can only use - as he said - with the full version of Adobe Acrobat - which cost some dollar - and there is no backdoor (at least since version6).

There are possibilities with Reader only (like reading direct from the controll, execute some Adobe-JS, ...) , but all not really good, partly tricky, and often different from version to version.

An alternative is to use command-line freeware tools. The well known tool PdfTk.exe works slower, then the code I stated, because it gather and reports much more info. I think a special tool like PdfInfo.exe may work faster.

HTH, Reinhard

Edited by ReFran

Share this post


Link to post
Share on other sites

Awesome. A fine way to start my work again. :)

Share this post


Link to post
Share on other sites

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

.... Since AU3 can create PDF using this UDF. ...

Nice link, thanks.

With that, PdfTK ability to add watermarks and stamps and some more scripting you can easily add headers, footers, .... to an existing PDF for printing or whatever.

I would emphazise also to use PdfTK repair function in front of that, because some objects seems not to be written exactly.

Adobe Acrobat/Reader forgive that, but some freeware tools may not.

br, Reinhard

Edited by ReFran

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0