Jump to content

Recommended Posts

Posted

Hi all,

I'm new to Autoit and I have a question regarding on how to work with PDF file. Let say, I have a pdf file (pls see the example that I attached). I need to  read the file line by line and highlight the line if a condition is met, e.g. if the score is 90 and above. Can it be achieved with autoit? Any guidance are much welcome. Thank you all.

Score.pdfFetching info...

Posted

If you have Word, this will read the entire document and split each line into an array however the ID column is cut off. I assume it's because the PDF file "contains interactive features".

#include <Array.au3>
#include <Word.au3>

_Func('...\Score.pdf')

Func _Func($sFile)
    Local $oWord = _Word_Create()
    If @ERROR Then
        ConsoleWrite('Error: _Word_Create' & @CRLF)
        Exit
    EndIf
    Local $oDoc = _Word_DocOpen($oWord, $sFile)
    If @ERROR Then
        _Word_Quit($oDoc)
        ConsoleWrite('Error: _Word_DocOpen' & @CRLF)
        Exit
    EndIf
    Local $oRange = $oDoc.Range
    Local $sText = $oRange.Text
    ConsoleWrite($sText)
    Local $aLines = StringSplit($sText, @CRLF)
    _ArrayDisplay($aLines)
    _Word_Quit($oDoc)
EndFunc

 

Posted

Hi, thanks for the reply. I come out with some general idea as below.

1. Use Xpdf to export the PDF file to text file

2. Use FileOpen to open the text file

3. Use _FileCountLines to get the number of lines

4. Loop each line, use FileReadLine to read the line

5. Use StringRegExp to check if the line matches the format. And extract the values e.g. Score value

6. Determine if extracted values meet criteria. E.g. score >= 90

7. If yes, highlight the line in PDF file (PDF comment/highlight button)

8. Save the PDF and delete the text file

Anything I missed/ or wrong here? And I have concern at step 7. How to do it? If the PDF file contains pic/chart, will the line number in the text file and pdf file tally?

 

 

Posted
  On 6/10/2021 at 2:57 AM, ducphu said:

Save the PDF and delete the text file

Expand  

Do the PDFs you are intending to use definitely allow editing?

Can you edit them now manually?

If so, using what product exactly?

Code hard, but don’t hard code...

Posted

Ok, let me try to clarify. Basically I have an input file, which is PDF.

Currently, user need to open the file manually, check through the records and highlight the records which has Score >= 90. The example I attached is the OUTPUT. The INPUT is the same file but without records highlighted.

What I want to achieve is to get this process done automatically using Autoit.

And yes, the PDF file allowed user to edit.

 

Posted

Do you have access to Word & Excel?

If so, does this function read all the columns for the input file? (Where no records have been highlighted)

  On 6/9/2021 at 3:27 PM, Luke94 said:

If you have Word, this will read the entire document and split each line into an array however the ID column is cut off. I assume it's because the PDF file "contains interactive features".

#include <Array.au3>
#include <Word.au3>

_Func('...\Score.pdf')

Func _Func($sFile)
    Local $oWord = _Word_Create()
    If @ERROR Then
        ConsoleWrite('Error: _Word_Create' & @CRLF)
        Exit
    EndIf
    Local $oDoc = _Word_DocOpen($oWord, $sFile)
    If @ERROR Then
        _Word_Quit($oDoc)
        ConsoleWrite('Error: _Word_DocOpen' & @CRLF)
        Exit
    EndIf
    Local $oRange = $oDoc.Range
    Local $sText = $oRange.Text
    ConsoleWrite($sText)
    Local $aLines = StringSplit($sText, @CRLF)
    _ArrayDisplay($aLines)
    _Word_Quit($oDoc)
EndFunc

 

Expand  

What I'm thinking is to read the PDF file with the above function, move it into Excel, highlight the records as requested and save as a PDF file. Might be a long-ass way about it but it would get you what you want. There will be an easier solution I would have thought, I just don't know of it.

Posted

Ok, I wrote some codes as below

Func _XPDF_ToText($sPDFFile, $sTXTFile, $iFirstPage = 1, $iLastPage = 0, $bLayout = True)
    Local $sXPDFToText = @ScriptDir & "\pdftotext.exe"
    Local $sOptions

    If NOT FileExists($sPDFFile) Then Return SetError(1, 0, 0)
    If NOT FileExists($sXPDFToText) Then Return SetError(2, 0, 0)

    If $iFirstPage <> 1 Then $sOptions &= " -f " & $iFirstPage
    If $iLastPage <> 0 Then $sOptions &= " -l " & $iLastPage
    If $bLayout = True Then $sOptions &= " -layout"

    Local $iReturn = ShellExecuteWait ( $sXPDFToText , $sOptions & ' "' & $sPDFFile & '" "' & $sTXTFile & '"', @ScriptDir, "", @SW_HIDE)
    If $iReturn = 0 Then Return 1

    Return 0

EndFunc
#include <MsgBoxConstants.au3>
#include <File.au3>

_XPDF_ToText("C:\Users\Duc Phu\Desktop\Score.pdf","C:\Users\Duc Phu\Desktop\temp.txt",1,0,true)

; Open temp text file
Local $hFileOpen = FileOpen("C:\Users\Duc Phu\Desktop\temp.txt",0)
; Read the fist line of the file using the handle returned by FileOpen
Local $sFileRead = FileReadLine($hFileOpen, 1)
; Retrieve the number of lines in the temp file
Local $iCountLines = _FileCountLines($hFileOpen)
Local $ReadLine[$iCountLines]
Local $ReadLineFull[$iCountLines]
Local $ReadLineScore[$iCountLines]

For $i = 1 to $iCountLines
$ReadLine[$i-1] = FileReadLine("C:\Users\Duc Phu\Desktop\temp.txt",$i)
Local $RegResult = StringRegExp($ReadLine[$i-1],'[0-9]+\s+[A-Za-z]+\s+([0-9]+)',2)
If Not @error Then
    ;If regex found matches
    $ReadLineScore[$i-1] = $RegResult[1]
    ;If score >=90 then write the match to $ReadLineFull array. We need this array for PDF searching and highlighting later on
    If $ReadLineScore[$i-1] >= 90 Then
        $ReadLineFull[$i-1] = $RegResult[0]
    Else
        $ReadLineFull[$i-1] = "-"
    EndIf
Else
    ; If not
    $ReadLineFull[$i-1] = "-"
    $ReadLineScore[$i-1] = "-"
EndIf
Next

; Close the handle returned by FileOpen.
FileClose($hFileOpen)

;Here, we have $ReadLineFull array. We need to loop through the array, if value <> "-" then we will need to
; 1st to open the PDF file
; Send Ctrl+F to open the arobat reader search box
; Send Ctrl+V to paste the value to the search box
; Wait sec to ensure the search result is returned
; Click on the full matched result. The line containing the result should be selected
; Send control click to the highlight button to highlight the line

 

Do you think that the idea here is doable

;Here, we have $ReadLineFull array. We need to loop through the array, if value <> "-" then we will need to
; 1st to open the PDF file
; Send Ctrl+F to open the arobat reader search box
; Send Ctrl+V to paste the value to the search box
; Wait sec to ensure the search result is returned
; Click on the full matched result. The line containing the result should be selected
; Send control click to the highlight button to highlight the line

 

Posted

I managed to complete this project. Below is the full code

#include <MsgBoxConstants.au3>
#include <FileConstants.au3>
#include <File.au3>

Func _XPDF_ToText($sPDFFile, $sTXTFile, $iFirstPage = 1, $iLastPage = 0, $bLayout = True)

    Local $sXPDFToText = @ScriptDir & "\pdftotext.exe"
    Local $sOptions

    If NOT FileExists($sPDFFile) Then Return SetError(1, 0, 0)
    If NOT FileExists($sXPDFToText) Then Return SetError(2, 0, 0)

    If $iFirstPage <> 1 Then $sOptions &= " -f " & $iFirstPage
    If $iLastPage <> 0 Then $sOptions &= " -l " & $iLastPage
    If $bLayout = True Then $sOptions &= " -layout"

    Local $iReturn = ShellExecuteWait ( $sXPDFToText , $sOptions & ' "' & $sPDFFile & '" "' & $sTXTFile & '"', @ScriptDir, "", @SW_HIDE)
    If $iReturn = 0 Then Return 1

    Return 0

EndFunc

Func FileSelection()
    ; Display an open dialog to select a list of file(s).
    Global $sFileOpenDialog = FileOpenDialog("Select file(s)", @DesktopDir & "\", "Adobe PDF Files (*.pdf)", BitOR($FD_FILEMUSTEXIST, $FD_MULTISELECT))
    If @error Then
        ; Display the error message.
        MsgBox(0, "", "No file(s) were selected.")
        Exit
        ; Change the working directory (@WorkingDir) back to the location of the script directory as FileOpenDialog sets it to the last accessed folder.
        ;FileChangeDir(@ScriptDir)
    Else
        ; Change the working directory (@WorkingDir) back to the location of the script directory as FileOpenDialog sets it to the last accessed folder.
        ;FileChangeDir(@ScriptDir)

        ; Replace instances of "|" with @CRLF in the string returned by FileOpenDialog.
        ;$sFileOpenDialog = StringReplace($sFileOpenDialog, "|", @CRLF)

        ; Display the list of selected files.
        ;MsgBox(0, "", "You chose the following files:" & @CRLF & $sFileOpenDialog)
    EndIf
EndFunc


FileSelection()
Local $FilesArr = StringSplit($sFileOpenDialog, "|")
Local $Dir = $FilesArr[1]
Local $File[$FilesArr[0]-1]

For $iFile = 0 to $FilesArr[0]-1-1
    $File[$iFile] = $FilesArr[$iFile+2]
    Local $CurrentFile = $Dir & "\" & $File[$iFile]

    _XPDF_ToText($CurrentFile,@ScriptDir & "\temp.txt",1,0,true)

    ; Open temp text file
    Local $hFileOpen = FileOpen(@ScriptDir & "\temp.txt",0)
    ; Retrieve the number of lines in the temp file
    Local $iCountLines = _FileCountLines($hFileOpen)
    Local $ReadLine[$iCountLines]
    Local $ReadLineFull[$iCountLines]
    Local $ReadLineScore[$iCountLines]

    For $iLine = 1 to $iCountLines
    $ReadLine[$iLine-1] = FileReadLine(@ScriptDir & "\temp.txt",$iLine)
    Local $RegResult = StringRegExp($ReadLine[$iLine-1],'[0-9]+\s+[A-Za-z]+\s+([0-9]+)',2)
    If Not @error Then
        ;If regex found matches
        $ReadLineScore[$iLine-1] = $RegResult[1]
        ;If score >=90 then write the match to $ReadLineFull array. We need this array for PDF searching and highlighting later on
        If $ReadLineScore[$iLine-1] >= 90 Then
            $ReadLineFull[$iLine-1] = $ReadLine[$iLine-1]
        Else
            $ReadLineFull[$iLine-1] = "-"
        EndIf
    Else
        ; If not
        $ReadLineFull[$iLine-1] = "-"
        $ReadLineScore[$iLine-1] = "-"
    EndIf
    Next

    ; Close the handle returned by FileOpen.
    FileClose($hFileOpen)

    ;Here, we have $ReadLineFull array. We need to loop through the array, if value <> "-" then we will need to

    ; Send Ctrl+F to open the arobat reader search box
    ; Send Ctrl+V to paste the value to the search box
    ; Wait sec to ensure the search result is returned
    ; Send Enter key
    ; Send control click to the highlight button to highlight the line

    ; 1st to open the PDF file
    ShellExecute($CurrentFile,"","","",@SW_MAXIMIZE)

    ; Wait 5 seconds for the Notepad window to exist
    $WinActive = WinWaitActive("[CLASS:AcrobatSDIWindow]", "", 5)

    If $WinActive = 0 Then
        MsgBox(0,"Error", "No Acrobat Reader window")
        Exit
    Else
        For $iLine = 1 to $iCountLines
        If $ReadLineFull[$iLine-1] <> "-" Then
            ClipPut($ReadLineFull[$iLine-1])
            Send("^f")
            Sleep(1000)
            Send("^v")
            Sleep(1000)
            Send("{ENTER}")
            Sleep(1000)
            ControlFocus("[CLASS:AcrobatSDIWindow]","","[CLASS:AVL_AVView; INSTANCE:38]")
            Sleep(1000)
            ControlClick("[CLASS:AcrobatSDIWindow]","","[CLASS:AVL_AVView; INSTANCE:38]", "left", 1, 65, 10)
            Sleep(1000)
        EndIf

        Next
        Sleep(1000)
        Send("^s")
        Sleep(1000)
        WinClose("[CLASS:AcrobatSDIWindow]", "")
        Sleep(5000)
    EndIf

Next

MsgBox(0,"AutoProcess","Done. File(s) saved and closed.",5)

 

Posted (edited)
  On 6/10/2021 at 9:39 AM, ducphu said:

Acrobat reader

Expand  

No.

Reader is not editor.

Do you thought about Acrobat Profesional ?

If so.... take a look here:

 

 

Edited by mLipok

Signature beginning:
Please remember: "AutoIt"..... *  Wondering who uses AutoIt and what it can be used for ? * Forum Rules *
ADO.au3 UDF * POP3.au3 UDF * XML.au3 UDF * IE on Windows 11 * How to ask ChatGPT for AutoIt Codefor other useful stuff click the following button:

  Reveal hidden contents

Signature last update: 2023-04-24

  • 2 weeks later...
Posted

I've been out for a week (family vacation)

It looks like this is printed from Word... could you edit this in Word before printing it to PDF? (the document properties say "Application: Acrobat PDFMaker 20 for Word")

All my code provided is Public Domain... but it may not work. ;) Use it, change it, break it, whatever you want.

  Reveal hidden contents

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...