Jump to content
sex123

How can I check if pdf file can open or not?

Recommended Posts

sex123

I want to know how to check if a pdf file is correct or damaged. I always download pdf file by browser such as firefox. But sometimes, the pdf file can not open. it is damaged because the unstable net speed. Now I have load foxit reader and then open this pdf and  judge if this pdf file can or can not open by the foxit read titile. but it waste time. I have so many pdf files which download from the net. how can I do this work in the background. I search the net and find iText can do. But iText for pdf can work in Java and .net android etc. and the iText code is open source. How can I check if pdf is damage by autoit.

Share this post


Link to post
Share on other sites
jguinch

You can check the health of the PDF file by retrieving some informations from it.

An example with _XFDF_Info : https://www.autoitscript.com/forum/topic/160718-code-to-extract-plain-text-from-a-pdf-file/?do=findComment&comment=1166469

If IsArray( _XFDF_Info("file.pdf")) Then MsgBox(0, "", "PDF file is clean")


 

 

Share this post


Link to post
Share on other sites
sex123

You can check the health of the PDF file by retrieving some informations from it.

An example with _XFDF_Info : https://www.autoitscript.com/forum/topic/160718-code-to-extract-plain-text-from-a-pdf-file/?do=findComment&comment=1166469

If IsArray( _XFDF_Info("file.pdf")) Then MsgBox(0, "", "PDF file is clean")


 

 

​I have try your method, but even the pdf is broken, I also got the msgbox which showed that the PDF file is clean.

Share this post


Link to post
Share on other sites
sex123

here, I upload 3 damaged pdf file to you. all the three file is broken and can not open by foxit reader. two small pdf is not pdf, just rename from other kind of type file and the large pdf file is pdf file originally, but maybe not download by firefox correctly.

3pdfzip.zip

Share this post


Link to post
Share on other sites
water

When searching the internet for "PDF analysis repair" you'll find tools like PDF-Tools.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2018-12-03 - Version 1.4.11.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Share this post


Link to post
Share on other sites
sex123

I do not need repair pdf. I just judge if pdf file is damaged or not. 

Share this post


Link to post
Share on other sites
water

Then at least the analysis part will help you ;)


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2018-12-03 - Version 1.4.11.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (2018-10-31 - Version 1.3.4.1) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
PowerPoint (2017-06-06 - Version 0.0.5.0) - Download - General Help & Support
Excel - Example Scripts - Wiki
Word - Wiki
 
Tutorials:

ADO - Wiki

 

Share this post


Link to post
Share on other sites
jguinch

@sex123 : I saw there was an error in the code i linked you.. Please, can you try again with the following code ?

If IsArray( _XFDF_Info("file.pdf")) Then
    MsgBox(0, "", "PDF file is clean")
Else
    MsgBox(16, "", "PDF file seems to be corrupted")
EndIf

; #FUNCTION# ====================================================================================================================
; Name...........: _XFDF_Info
; Description....: Retrives informations from a PDF file
; Syntax.........: _XFDF_Info ( "File" [, "Info"] )
; Parameters.....: File    - PDF File.
;                  Info    - The information to retrieve
; Return values..: Success - If the Info parameter is not empty, returns the desired information for the specified Info parameter
;                          - If the Info parameter is empty, returns an array with all available informations
;                  Failure - 0, and sets @error to :
;                   1 - PDF File not found
;                   2 - Unable to find the external programm
; Remarks........: The array returned is two-dimensional and is made up as follows:
;                   $array[1][0] = Label of the first information (title, author, pages...)
;                   $array[1][1] = value of the first information
;                   ...
; ===============================================================================================================================
Func _XFDF_Info($sPDFFile, $sInfo = "")
    Local $sXPDFInfo = @ScriptDir & "\pdfinfo.exe"

    If NOT FileExists($sPDFFile) Then Return SetError(1, 0, 0)
    If NOT FileExists($sXPDFInfo) Then Return SetError(2, 0, 0)
    $sXPDFInfo = FileGetShortName($sXPDFInfo)

    Local $iPid = Run(@ComSpec & ' /c ' &  $sXPDFInfo & ' "' & $sPDFFile & '"', @ScriptDir, @SW_HIDE, 2)

    Local $sResult
    While 1
        $sResult &= StdoutRead($iPid)
        If @error Then ExitLoop
    WEnd

    Local $aInfos = StringRegExp($sResult, "(?m)^(.+?): +(.*)$", 3)
    If @error Or Mod( UBound($aInfos, 1), 2) = 1 Then Return SetError(3, 0, 0)

    Local $aResult [ UBound($aInfos, 1) / 2][2]

    For $i = 0 To UBound($aInfos) - 1 Step 2
        If $sInfo <> "" AND $aInfos[$i] = $sInfo Then Return $aInfos[$i + 1]
        $aResult[$i / 2][0] = $aInfos[$i]
        $aResult[$i / 2][1] = $aInfos[$i + 1]
    Next

    If $sInfo <> "" Then Return ""

    Return $aResult
EndFunc ; ---> _XFDF_Info

I just edit my linked code.

Share this post


Link to post
Share on other sites
sex123

I try it, and it worked. Thanks, is your software free? If it is free to me, I shall use it. I read in the linked topic, your software is not free. So free or not?

BTW,

  $sXPDFInfo = FileGetShortName($sXPDFInfo)

why add this in the func?

Another problem is that if the pdf file is not found, the func also msgbox PDF file seems to be corrupted. Actually, the file is not found. Can you fixed it?

 

Edited by sex123

Share this post


Link to post
Share on other sites
jguinch

Try this :

#Include <Array.au3>

$aInfos = _XFDF_Info("file.pdf")
Switch @error
    Case 0
        MsgBox(0, "", "PDF file is clean")
        _ArrayDisplay($aInfos)
    Case 1
        MsgBox(16, "", "PDF file not found")
    Case 2
        MsgBox(16, "", "PDFInfo.exe program not found")
    Case 3
        MsgBox(16, "", "PDF file seems to be corrupted")
EndSwitch

For license informations, see http://www.foolabs.com/xpdf/about.html

FileGetShortName is for the case where the program path contains spaces

Edited by jguinch

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×