Jump to content

Strategy for a very primitive app


Recommended Posts

Hi,

I'm brand new and I'm about to embark on creating a very simple app to delete specific pages from PDF files.  I'm going to list my high-level strategy below, and I'm looking for any recommendations on which methods I should review in the Help File, code snippets, and potential road blocks from more experienced AutoIt developers.

High Level Outline & Desired Behavior:

  1. Prompt user to select a directory
  2. Find all .pdf files in selected directory (not-recursively) and store in an array
  3. List all .pdf files in the array and prompt user to confirm (proceed or abort)
  4. Only if proceeding, launch Acrobat
  5. For each file in the array, run a series of actions:
    1. Open the file in Acrobat (Ctrl+O, type file name in dialog box, hit Open, wait for file to open)
    2. Search file for a specific string (Ctrl+F to open search box, enter string to search, hit Enter)
    3. If its not found, there's a prompt saying the string couldn't be found, identify the prompt, accept prompt, hit keystrokes to save file, close file, proceed to next file, step 5.1.  If the prompt does not appear, Acrobat found the string, so use keystroke to delete page and repeat the search operation, step 5.2.
  6. After all files in the array are processed, exit Acrobat and prompt user that the operation completed.

I'm thinking this should be pretty simple to construct, but now that I've sat down to actually do it, I'm not sure where to start.  Thoughts?

 

Thanks!

Link to comment
Share on other sites

5 minutes ago, Danp2 said:

There may be a better way to automate this.  FWIW, I previously wrote a script that took a large PDF and separated it into individual files. To do this, I used a combination of pdfsam and xpdf.

I'm familiar with pdfsam and use it in a lot of scripts, but the real challenge is locating the string (which is a series of words and includes punctuation).

I got very close using Acrobat JavaScript, but the API for searching will only locate individual words, not phrases in the document, which makes things much more complex.

When I went to do it manually, I discovered that I could do the entire operation with keystrokes, so I figured that would be pretty easy to port to AutoIt, but I'd need to loop through a list of file names.  And I wanted to create a GUI interface that allows a user to select the directory to make it more user-friendly.

Link to comment
Share on other sites

55 minutes ago, Danp2 said:

I used pdftotext from xpdf to convert the entire PDF to a text file. I then read the entire file into memory and used StringSplit to create an array with each element containing an individual page. From there, it was easy to process each page, search for specific text, and take action when found.

Thanks for that suggestion--it's an interesting way to attack the problem, and it would be good if I needed this to run headless.  If you're able to share the code, I might be able to adapt it to my requirements.

 

In the meantime, I've been able to operationalize steps 4 through 6 of my strategy above and have it confirmed working.

Now I just need help building a GUI to present a normal Windows Browse for Folder GUI, and some help saving a listing of all pdf file paths to an array.  I think if I can get a directory listing, I could use a RegEx to match the pdf files, but I'm interested in any code others have for this sort of thing.  I can't be the first one who needs to do this.

 

Thanks

Link to comment
Share on other sites

22 minutes ago, distancesprinter said:

Thanks for that suggestion--it's an interesting way to attack the problem, and it would be good if I needed this to run headless.  If you're able to share the code, I might be able to adapt it to my requirements.

Here you go...

#Include <Array.au3>
#include <file.au3>
#include <Math.au3>
#include <WinAPI.au3>

Opt("TrayMenuMode",1)
Opt("TrayOnEventMode",1)

Global $INIFile = "DistributePDFs.ini", $PDFSam, $PDFToText, $DMDocuments, $IncomingPDFs

; Create tray menu
SetupTray()

LoadOptions()

VerifyPaths()

VerifyJava()

ProcessFiles()

Exit


Func ProcessFiles()
Local $aFiles, $hFile, $cFilename, $cText, $aContents, $cDetails
Local $i, $j, $nBegin, $nEnd, $cDebtorID, $mpCount = 0

$aFiles = _FileListToArray($IncomingPDFs, "*.pdf", 1)

If @error Then
    MsgBox(48, "Warning", "No files to process!")
    Return
EndIf

ConsoleWrite("Processing " & $aFiles[0] & " PDF files" & @CRLF)

For $i = 1 To $aFiles[0]
    ConsoleWrite($aFiles[$i] & @CRLF)

    $cFilename = @ScriptDir & "\temp\" & $aFiles[$i] & ".txt"

    ; Convert to text file
    ConvertToText($IncomingPDFs & $aFiles[$i], $cFilename)

    $hFile = FileOpen($cFilename, 0)

    ; Check if file opened for reading OK
    If $hFile = -1 Then
        ContinueLoop
    EndIf

    ; Read file into memory
    $cText = FileRead($hFile)

    FileClose($hFile)

    ; Remove temporary text file
    FileDelete($cFilename)

    ; Load into an array
    $aContents = StringSplit($cText, Chr(12))

    ; Adjust for empty cell at bottom
    If $aContents[$aContents[0]] = "" Then
        $aContents[0] -= 1
    EndIf

    For $nBegin = 1 To $aContents[0]
        TrayTip ( $aFiles[$i], "Processing page " & $nBegin & " of " & $aContents[0], 1)

        $cDetails = RetrieveDetails($aContents[$nBegin])

        If $cDetails <> "" Then
            ; Get debtor ID
            $cDebtorID = StringLeft($cDetails, StringInStr($cDetails, "-") - 1)

            ; Find any associated pages
            For $nEnd = $nBegin + 1 To $aContents[0]
                If RetrieveDetails($aContents[$nEnd]) <> "" Then
                    $nEnd = $nEnd - 1
                    ExitLoop
                EndIf
            Next

            ; Special handling for last page
            If $nEnd > $aContents[0] Then
                $nEnd = $aContents[0]
            EndIf

            ; Make sure destination directory exists
            If Not FileExists($DMDocuments & $cDebtorID) Then
                DirCreate($DMDocuments & $cDebtorID)
            EndIf

            ExtractPages($IncomingPDFs & $aFiles[$i], $DMDocuments & $cDebtorID & "\" & $cDetails, $nBegin & "-" & $nEnd)

            ; Track number of multi-page documents
            If $nBegin <> $nEnd Then
                ConsoleWrite($cDebtorID & " - Multipage document" & @CRLF)
                $mpCount += 1
            EndIf

            ; Adjust current page number to prevent
            ; associated pages from being processed again
            $nBegin = $nEnd
        EndIf
    Next

    $cFilename = $IncomingPDFs & "Processed\" & $aFiles[$i]
    $j = 0

    ; Handle duplicate files
    While FileExists($cFilename)
        $j += 1
        $cFilename = $IncomingPDFs & "Processed\" & StringReplace($aFiles[$i], ".pdf", "-" & $j & ".pdf")
        ConsoleWrite("New filename = " & $cFilename & @CRLF)
    WEnd

    ; Move PDF file
    FileMove($IncomingPDFs & $aFiles[$i], $cFilename)
Next

If $mpCount <> 0 Then
    MsgBox(48, "Warning", $mpCount & " multi-page documents detected!")
    Return
EndIf

EndFunc


Func RetrieveDetails($text)
    Local $pos, $result = ''

    $pos = StringInStr($text, ".pdf")

    If $pos <> 0 Then
        $result = StringLeft($text, $pos + 3)
    EndIf

    Return $result
EndFunc


Func ExtractPages($cSource, $cDestination, $cPages)
Local $result, $cmdString, $workDir, $j
Local $szDrive, $szDir, $szFName, $szExt, $aPath, $szOrigFName

    ; Handle duplicate files
    While FileExists($cDestination)
        ; Parse destination filename
        $aPath = _PathSplit($cDestination, $szDrive, $szDir, $szFName, $szExt)

        If $j = 0 Then $szOrigFName = $szFName
        $j += 1

        $cDestination = StringReplace($cDestination, $szFName, $szOrigFName & "-" & $j)
        ConsoleWrite("New filename = " & $cDestination & @CRLF)
    WEnd

    $workDir = JustPath($PDFSam)

    $cmdString = '"' & $PDFSam & '" -f "' & $cSource & '" -o "' & $cDestination & '" -u ' & $cPages & ' -overwrite concat'

ConsoleWrite($cmdString & @CRLF)

    $result = RunWait($cmdString, $workDir, @SW_HIDE)

    Return $result
EndFunc

Func ConvertToText($cSource, $cDestination)
Local $result, $cmdString
    ConsoleWrite("Converting " & $cSource & " to text." & @CRLF)

    $cmdString = '"' & $PDFToText & '" "' & $cSource & '"  "' & $cDestination & '"'

    $result = RunWait($cmdString, "", @SW_HIDE)

    Return $result
EndFunc

Func LoadOptions()
    $PDFSam = IniRead($INIFile, "Utilities", "PDFSam", "")
    $PDFToText = IniRead($INIFile, "Utilities", "PDFToText", "")

    $DMDocuments = IniRead($INIFile, "Directories", "DMDocuments", "")
    $IncomingPDFs = IniRead($INIFile, "Directories", "IncomingPDFs", "")
EndFunc

Func VerifyPaths()
    If Not FileExists($PDFSam) Then
        MsgBox(16, "Error", "Incorrect path to PDFSam!")
        Exit
    EndIf

    If Not FileExists($PDFToText) Then
        MsgBox(16, "Error", "Incorrect path to PDFToText!")
        Exit
    EndIf

    If Not FileExists($DMDocuments) Then
        MsgBox(16, "Error", "Incorrect path to DMDocuments!")
        Exit
    EndIf

    If Not FileExists($IncomingPDFs) Then
        MsgBox(16, "Error", "Incorrect path to IncomingPDFs!")
        Exit
    EndIf
EndFunc

Func JustPath($cPath)
    Local $szDrive, $szDir, $szFName, $szExt
    _PathSplit($cPath, $szDrive, $szDir, $szFName, $szExt)
    Return $szDrive & $szDir
EndFunc ;==>JustPath

Func Terminate()
    Exit 0
EndFunc

Func SetupTray()
    ; Display on Right-click
    TraySetClick ( 8 )

    $exititem      = TrayCreateItem("Exit")
    TrayItemSetOnEvent(-1, "terminate")

    TraySetIcon(@ScriptFullPath, 1)
    TraySetState()
EndFunc

Func VerifyJava()
    Local $aArray = _JavaVersion()

    If $aArray[0] = "" Then
        MsgBox(16, "Error", "Unable to locate Java Runtime Environment!")
        Exit
    EndIf
EndFunc

Func _JavaVersion()
    Local $HKLM = 'HKLM'

    Local $sWow64 = (@AutoItX64) ? ("\Wow6432Node") : ("")

    Local $sCurrentVersion = RegRead($HKLM & '\Software' & $sWow64 & '\JavaSoft\Java Runtime Environment', 'CurrentVersion')

    ConsoleWrite("Current version = " & $sCurrentVersion & @CRLF)

    Local $aReturn[2] = [RegRead($HKLM & '\Software' & $sWow64 & '\JavaSoft\Java Runtime Environment\' & $sCurrentVersion, 'JavaHome')]
    $aReturn[1] = StringRegExpReplace(_WinAPI_ExpandEnvironmentStrings(RegRead($HKLM & '\Software\Classes\JNLPFile\Shell\Open\Command', '')), '.*?"((?:[^"/]+[/])+([^"/]+))".*', '1')
    Return $aReturn
EndFunc   ;==>_JavaVersion

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...