distancesprinter Posted July 14, 2017 Share Posted July 14, 2017 Hi, I'm brand new and I'm about to embark on creating a very simple app to delete specific pages from PDF files. I'm going to list my high-level strategy below, and I'm looking for any recommendations on which methods I should review in the Help File, code snippets, and potential road blocks from more experienced AutoIt developers. High Level Outline & Desired Behavior: Prompt user to select a directory Find all .pdf files in selected directory (not-recursively) and store in an array List all .pdf files in the array and prompt user to confirm (proceed or abort) Only if proceeding, launch Acrobat For each file in the array, run a series of actions: Open the file in Acrobat (Ctrl+O, type file name in dialog box, hit Open, wait for file to open) Search file for a specific string (Ctrl+F to open search box, enter string to search, hit Enter) If its not found, there's a prompt saying the string couldn't be found, identify the prompt, accept prompt, hit keystrokes to save file, close file, proceed to next file, step 5.1. If the prompt does not appear, Acrobat found the string, so use keystroke to delete page and repeat the search operation, step 5.2. After all files in the array are processed, exit Acrobat and prompt user that the operation completed. I'm thinking this should be pretty simple to construct, but now that I've sat down to actually do it, I'm not sure where to start. Thoughts? Thanks! Link to comment Share on other sites More sharing options...
Danp2 Posted July 14, 2017 Share Posted July 14, 2017 There may be a better way to automate this. FWIW, I previously wrote a script that took a large PDF and separated it into individual files. To do this, I used a combination of pdfsam and xpdf. Latest Webdriver UDF Release Webdriver Wiki FAQs Link to comment Share on other sites More sharing options...
distancesprinter Posted July 14, 2017 Author Share Posted July 14, 2017 5 minutes ago, Danp2 said: There may be a better way to automate this. FWIW, I previously wrote a script that took a large PDF and separated it into individual files. To do this, I used a combination of pdfsam and xpdf. I'm familiar with pdfsam and use it in a lot of scripts, but the real challenge is locating the string (which is a series of words and includes punctuation). I got very close using Acrobat JavaScript, but the API for searching will only locate individual words, not phrases in the document, which makes things much more complex. When I went to do it manually, I discovered that I could do the entire operation with keystrokes, so I figured that would be pretty easy to port to AutoIt, but I'd need to loop through a list of file names. And I wanted to create a GUI interface that allows a user to select the directory to make it more user-friendly. Link to comment Share on other sites More sharing options...
Danp2 Posted July 14, 2017 Share Posted July 14, 2017 I used pdftotext from xpdf to convert the entire PDF to a text file. I then read the entire file into memory and used StringSplit to create an array with each element containing an individual page. From there, it was easy to process each page, search for specific text, and take action when found. Latest Webdriver UDF Release Webdriver Wiki FAQs Link to comment Share on other sites More sharing options...
distancesprinter Posted July 14, 2017 Author Share Posted July 14, 2017 55 minutes ago, Danp2 said: I used pdftotext from xpdf to convert the entire PDF to a text file. I then read the entire file into memory and used StringSplit to create an array with each element containing an individual page. From there, it was easy to process each page, search for specific text, and take action when found. Thanks for that suggestion--it's an interesting way to attack the problem, and it would be good if I needed this to run headless. If you're able to share the code, I might be able to adapt it to my requirements. In the meantime, I've been able to operationalize steps 4 through 6 of my strategy above and have it confirmed working. Now I just need help building a GUI to present a normal Windows Browse for Folder GUI, and some help saving a listing of all pdf file paths to an array. I think if I can get a directory listing, I could use a RegEx to match the pdf files, but I'm interested in any code others have for this sort of thing. I can't be the first one who needs to do this. Thanks Link to comment Share on other sites More sharing options...
Danp2 Posted July 14, 2017 Share Posted July 14, 2017 22 minutes ago, distancesprinter said: Thanks for that suggestion--it's an interesting way to attack the problem, and it would be good if I needed this to run headless. If you're able to share the code, I might be able to adapt it to my requirements. Here you go... expandcollapse popup#Include <Array.au3> #include <file.au3> #include <Math.au3> #include <WinAPI.au3> Opt("TrayMenuMode",1) Opt("TrayOnEventMode",1) Global $INIFile = "DistributePDFs.ini", $PDFSam, $PDFToText, $DMDocuments, $IncomingPDFs ; Create tray menu SetupTray() LoadOptions() VerifyPaths() VerifyJava() ProcessFiles() Exit Func ProcessFiles() Local $aFiles, $hFile, $cFilename, $cText, $aContents, $cDetails Local $i, $j, $nBegin, $nEnd, $cDebtorID, $mpCount = 0 $aFiles = _FileListToArray($IncomingPDFs, "*.pdf", 1) If @error Then MsgBox(48, "Warning", "No files to process!") Return EndIf ConsoleWrite("Processing " & $aFiles[0] & " PDF files" & @CRLF) For $i = 1 To $aFiles[0] ConsoleWrite($aFiles[$i] & @CRLF) $cFilename = @ScriptDir & "\temp\" & $aFiles[$i] & ".txt" ; Convert to text file ConvertToText($IncomingPDFs & $aFiles[$i], $cFilename) $hFile = FileOpen($cFilename, 0) ; Check if file opened for reading OK If $hFile = -1 Then ContinueLoop EndIf ; Read file into memory $cText = FileRead($hFile) FileClose($hFile) ; Remove temporary text file FileDelete($cFilename) ; Load into an array $aContents = StringSplit($cText, Chr(12)) ; Adjust for empty cell at bottom If $aContents[$aContents[0]] = "" Then $aContents[0] -= 1 EndIf For $nBegin = 1 To $aContents[0] TrayTip ( $aFiles[$i], "Processing page " & $nBegin & " of " & $aContents[0], 1) $cDetails = RetrieveDetails($aContents[$nBegin]) If $cDetails <> "" Then ; Get debtor ID $cDebtorID = StringLeft($cDetails, StringInStr($cDetails, "-") - 1) ; Find any associated pages For $nEnd = $nBegin + 1 To $aContents[0] If RetrieveDetails($aContents[$nEnd]) <> "" Then $nEnd = $nEnd - 1 ExitLoop EndIf Next ; Special handling for last page If $nEnd > $aContents[0] Then $nEnd = $aContents[0] EndIf ; Make sure destination directory exists If Not FileExists($DMDocuments & $cDebtorID) Then DirCreate($DMDocuments & $cDebtorID) EndIf ExtractPages($IncomingPDFs & $aFiles[$i], $DMDocuments & $cDebtorID & "\" & $cDetails, $nBegin & "-" & $nEnd) ; Track number of multi-page documents If $nBegin <> $nEnd Then ConsoleWrite($cDebtorID & " - Multipage document" & @CRLF) $mpCount += 1 EndIf ; Adjust current page number to prevent ; associated pages from being processed again $nBegin = $nEnd EndIf Next $cFilename = $IncomingPDFs & "Processed\" & $aFiles[$i] $j = 0 ; Handle duplicate files While FileExists($cFilename) $j += 1 $cFilename = $IncomingPDFs & "Processed\" & StringReplace($aFiles[$i], ".pdf", "-" & $j & ".pdf") ConsoleWrite("New filename = " & $cFilename & @CRLF) WEnd ; Move PDF file FileMove($IncomingPDFs & $aFiles[$i], $cFilename) Next If $mpCount <> 0 Then MsgBox(48, "Warning", $mpCount & " multi-page documents detected!") Return EndIf EndFunc Func RetrieveDetails($text) Local $pos, $result = '' $pos = StringInStr($text, ".pdf") If $pos <> 0 Then $result = StringLeft($text, $pos + 3) EndIf Return $result EndFunc Func ExtractPages($cSource, $cDestination, $cPages) Local $result, $cmdString, $workDir, $j Local $szDrive, $szDir, $szFName, $szExt, $aPath, $szOrigFName ; Handle duplicate files While FileExists($cDestination) ; Parse destination filename $aPath = _PathSplit($cDestination, $szDrive, $szDir, $szFName, $szExt) If $j = 0 Then $szOrigFName = $szFName $j += 1 $cDestination = StringReplace($cDestination, $szFName, $szOrigFName & "-" & $j) ConsoleWrite("New filename = " & $cDestination & @CRLF) WEnd $workDir = JustPath($PDFSam) $cmdString = '"' & $PDFSam & '" -f "' & $cSource & '" -o "' & $cDestination & '" -u ' & $cPages & ' -overwrite concat' ConsoleWrite($cmdString & @CRLF) $result = RunWait($cmdString, $workDir, @SW_HIDE) Return $result EndFunc Func ConvertToText($cSource, $cDestination) Local $result, $cmdString ConsoleWrite("Converting " & $cSource & " to text." & @CRLF) $cmdString = '"' & $PDFToText & '" "' & $cSource & '" "' & $cDestination & '"' $result = RunWait($cmdString, "", @SW_HIDE) Return $result EndFunc Func LoadOptions() $PDFSam = IniRead($INIFile, "Utilities", "PDFSam", "") $PDFToText = IniRead($INIFile, "Utilities", "PDFToText", "") $DMDocuments = IniRead($INIFile, "Directories", "DMDocuments", "") $IncomingPDFs = IniRead($INIFile, "Directories", "IncomingPDFs", "") EndFunc Func VerifyPaths() If Not FileExists($PDFSam) Then MsgBox(16, "Error", "Incorrect path to PDFSam!") Exit EndIf If Not FileExists($PDFToText) Then MsgBox(16, "Error", "Incorrect path to PDFToText!") Exit EndIf If Not FileExists($DMDocuments) Then MsgBox(16, "Error", "Incorrect path to DMDocuments!") Exit EndIf If Not FileExists($IncomingPDFs) Then MsgBox(16, "Error", "Incorrect path to IncomingPDFs!") Exit EndIf EndFunc Func JustPath($cPath) Local $szDrive, $szDir, $szFName, $szExt _PathSplit($cPath, $szDrive, $szDir, $szFName, $szExt) Return $szDrive & $szDir EndFunc ;==>JustPath Func Terminate() Exit 0 EndFunc Func SetupTray() ; Display on Right-click TraySetClick ( 8 ) $exititem = TrayCreateItem("Exit") TrayItemSetOnEvent(-1, "terminate") TraySetIcon(@ScriptFullPath, 1) TraySetState() EndFunc Func VerifyJava() Local $aArray = _JavaVersion() If $aArray[0] = "" Then MsgBox(16, "Error", "Unable to locate Java Runtime Environment!") Exit EndIf EndFunc Func _JavaVersion() Local $HKLM = 'HKLM' Local $sWow64 = (@AutoItX64) ? ("\Wow6432Node") : ("") Local $sCurrentVersion = RegRead($HKLM & '\Software' & $sWow64 & '\JavaSoft\Java Runtime Environment', 'CurrentVersion') ConsoleWrite("Current version = " & $sCurrentVersion & @CRLF) Local $aReturn[2] = [RegRead($HKLM & '\Software' & $sWow64 & '\JavaSoft\Java Runtime Environment\' & $sCurrentVersion, 'JavaHome')] $aReturn[1] = StringRegExpReplace(_WinAPI_ExpandEnvironmentStrings(RegRead($HKLM & '\Software\Classes\JNLPFile\Shell\Open\Command', '')), '.*?"((?:[^"/]+[/])+([^"/]+))".*', '1') Return $aReturn EndFunc ;==>_JavaVersion Latest Webdriver UDF Release Webdriver Wiki FAQs Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now