RickB75 Posted April 27, 2015 Share Posted April 27, 2015 Just wondering if you guys can help me with this CMD Line Param. The function works except passing the page number into the CMD line. From what I can tell, It's not passing the page number into the CMD Line Param here:Local $iReturn = ShellExecuteWait ( $sXPDFToText , '"' & $sPDFFile & '" "' & $sTXTFile & '"', @ScriptDir, "", @SW_HIDE) It's only passing the PDF file name and the Txt file I want to save it to. I've tried adding specific page numbers in the syntax and it still processes the entire PDF. Here is the info from the help file from XPDF for the PDFtoText App. SYNOPSIS pdftotext [options] [PDF-file [text-file]] I've tried, I'm just not very good at CMD Line Param's. ; #FUNCTION# ==================================================================================================================== ; Name...........: _XPDF_ToText ; Description....: Converts a PDF file to plain text. ; Syntax.........: _XPDF_ToText ( "PDFFile" , "TxtFile" [ , FirstPage [, LastPage [, Layout ]]] ) ; Parameters.....: PDFFile - PDF Input File. ; TxtFile - Plain text file to convert to ; FirstPage - First page to convert (default is 1) ; LastPage - Last page to convert (default is last page of the document) ; Layout - If true, maintains (as best as possible) the original physical layout of the text ; If false, the behavior is to 'undo' physical layout (columns, hyphenation, etc.) ; and output the text in reading order. ; Default is True ; Return values..: Success - 1 ; Failure - 0, and sets @error to : ; 1 - PDF File not found ; 2 - Unable to find the external program ; =============================================================================================================================== Func _XPDF_ToText($sPDFFile, $sTXTFile, $iFirstPage = 1, $iLastPage = 0, $bLayout = True) Local $sXPDFToText = @ScriptDir & "\pdftotext.exe" Local $sOptions If NOT FileExists($sPDFFile) Then Return SetError(1, 0, 0) If NOT FileExists($sXPDFToText) Then Return SetError(2, 0, 0) If $iFirstPage <> 1 Then $sOptions &= " -f " & $iFirstPage If $iLastPage <> 0 Then $sOptions &= " -l " & $iLastPage If $bLayout = True Then $sOptions &= " -layout" Local $iReturn = ShellExecuteWait ( $sXPDFToText , '"' & $sPDFFile & '" "' & $sTXTFile & '"', @ScriptDir, "", @SW_HIDE) If $iReturn = 0 Then Return 1 Return 0 EndFunc ; ---> _XPDF_ToText Link to comment Share on other sites More sharing options...
Danp2 Posted April 27, 2015 Share Posted April 27, 2015 (edited) Just took a quick look, but it doesn't appear that you actually posted the code that isn't work.One suggestion: Build the entire parameter string and store it to a variable (like you are doing with $sOptions). Then output this to the console window so that you can examine the entire string for any issues.P.S. Try it from the Cmd window until you figure out the correct syntax for pdftotext. Once you've got that down, then you can try what I suggested above. Edited April 27, 2015 by Danp2 Latest Webdriver UDF Release Webdriver Wiki FAQs Link to comment Share on other sites More sharing options...
blakel Posted April 27, 2015 Share Posted April 27, 2015 I don't see the $sOptions in your ShellExecuteWait arguments. Link to comment Share on other sites More sharing options...
RickB75 Posted April 27, 2015 Author Share Posted April 27, 2015 I found this Function that was posted a year ago here on the forums. It was orginally posted on the French Version of Autoit. It works great for converting the entire PDF to txt. After trying to specify a specific page multiple times, I noticed the same same thing ($sOptions isn't in the ShellExWait). I'm not very good at all with CMD Line Param's. Thats why I posted the entire function here to see if you guys could give an example of where I need to put the $sOptions Var in the param's. To me, the $sOptions isn't very defined so my main script, I changed it a little bit to this to make the first and last page a little more defined. $iFirstPage = " -f " & $iFirstPage $iLastPage = " -l " & $iLastPageI tried running it like this but it didn't work.Local $iReturn = ShellExecuteWait ( $sXPDFToText , '"' & $iFirstPage & '" "' & $iLastPage & '"' & '"' & $sPDFFile & '" "' & $sTXTFile & '"', @ScriptDir, "", @SW_HIDE) Link to comment Share on other sites More sharing options...
Danp2 Posted April 27, 2015 Share Posted April 27, 2015 Try it from the Cmd window until you figure out the correct syntax for pdftotext. Once you've got that down, then you can try what I suggested above.That's the best advice I can give you so that you can figure out the correct syntax and make sure that pdftotext works as you would expect before introducing your own coding. Latest Webdriver UDF Release Webdriver Wiki FAQs Link to comment Share on other sites More sharing options...
RickB75 Posted April 27, 2015 Author Share Posted April 27, 2015 Thank you Dan. I'm checking it now. Link to comment Share on other sites More sharing options...
JackDinn Posted April 27, 2015 Share Posted April 27, 2015 Had similar problems passing to cmd but if i remember right i removed the @SW_HIDE from the command and you can see exactly whats being executed on the cmd , Thx all,Jack Dinn. JD's Auto Internet Speed Tester JD's Clip Catch (With Screen Shot Helper) Projects :- AutoIt - My projects My software never has bugs. It just develops random features. :-D Link to comment Share on other sites More sharing options...
Danp2 Posted April 28, 2015 Share Posted April 28, 2015 What do you plan on doing with the results once the text is extracted? Latest Webdriver UDF Release Webdriver Wiki FAQs Link to comment Share on other sites More sharing options...
BrewManNH Posted April 28, 2015 Share Posted April 28, 2015 I would look at the code posted by RickB75, just don't put quotes around the page number variables, not sure that they're needed. Also, if you do use the quotes, the quotes between $iLastPage and $sPDFFile are wrong, there's no space between them. If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag GudeHow to ask questions the smart way! I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from. Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays. - ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script. - Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label. - _FileGetProperty - Retrieve the properties of a file - SciTE Toolbar - A toolbar demo for use with the SciTE editor - GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI. - Latin Square password generator Link to comment Share on other sites More sharing options...
RickB75 Posted April 28, 2015 Author Share Posted April 28, 2015 (edited) Sorry for the late reply. Here's my current project. We have a 2 vendors that don't talk with each other working out of 2 diff databases. One vendor mines their database and once a customer record is found that might be a good candidate to trade in there current vehicle for a newer vehicle, they will send an email to management with a PDF attached with the customer info in the PDF. Thats great but, all of our mgmt and the reps primary work out of the other vendors program. Soooo, I'm building a bridge between the two systems. When Vendor (A) sends a PDF, I can extract the customer data from it, Build an XML file with this data and push the data to Vendor (B). Also, Vendor (B) has a ton of reports and analytic data that we can use to see how effective we are being with the Customers. Edited April 28, 2015 by RickB75 Link to comment Share on other sites More sharing options...
RickB75 Posted April 28, 2015 Author Share Posted April 28, 2015 (edited) I got it figured out guys!! It was the quotes around the first and last page. Here's what's working. This grabs the specified pages in the PDF's and extracts the txt.Local $iReturn = ShellExecuteWait ( $sXPDFToText , $iFirstPage & ' ' & $iLastPage & ' "' & $sPDFFile & '" "' & $sTXTFile & '"', @ScriptDir, "", @SW_HIDE)Thanks guys for all your help. @BrewManNH Thanks for mentioning the quotes! Edited April 28, 2015 by RickB75 Link to comment Share on other sites More sharing options...
jguinch Posted April 28, 2015 Share Posted April 28, 2015 Here is the original post of the 3 functions _XFDF_Info, _XPDF_Search and _XPDF_ToText : http://www.autoitscript.fr/forum/viewtopic.php?f=21&t=12056#p83720It seems I forgot to use $sOptions in the command line (sorry). Could you try with just replacing the ShellExecuteWait line by this one ;Local $iReturn = ShellExecuteWait ( $sXPDFToText , $sOptions & ' "' & $sPDFFile & '" "' & $sTXTFile & '"', @ScriptDir, "", @SW_HIDE) Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
RickB75 Posted April 28, 2015 Author Share Posted April 28, 2015 jguinch,I tried running the script with your revised ShellExecuteWait and I didn't get anything back. Below is the link I copied the Function from. The standard function works great for entire PDF's the way it's written. I just needed to break apart each page in the PDF. Also the XPDF_Info function works perfect. That's what I use to get the Page count. I haven't tried the XPDF_Search yet. Link to comment Share on other sites More sharing options...
jguinch Posted April 28, 2015 Share Posted April 28, 2015 I works for me with the ShellExecute modification.Can you show you code ? Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
RickB75 Posted April 28, 2015 Author Share Posted April 28, 2015 expandcollapse popupFor $i = 0 To UBound($filelist) - 1 $ipageCt = _XFDF_Info("C:\samplePDF\" & $filelist[$i] & ".pdf","Pages") ;MsgBox(0,"Total pages",$ipageCt) For $ipg = 1 To $ipageCt ;MsgBox(0,"page number",$ipg) _XPDF_ToText("C:\samplePDF\" & $filelist[$i] & ".pdf","C:\processedsheets\"& $filelist[$i] &"_page_" & $ipg & ".txt",$ipg,$ipg,True) Next Next ;~ ; #FUNCTION# ==================================================================================================================== ;~ ; Name...........: _XPDF_ToText ;~ ; Description....: Converts a PDF file to plain text. ;~ ; Syntax.........: _XPDF_ToText ( "PDFFile" , "TxtFile" [ , FirstPage [, LastPage [, Layout ]]] ) ;~ ; Parameters.....: PDFFile - PDF Input File. ;~ ; TxtFile - Plain text file to convert to ;~ ; FirstPage - First page to convert (default is 1) ;~ ; LastPage - Last page to convert (default is last page of the document) ;~ ; Layout - If true, maintains (as best as possible) the original physical layout of the text ;~ ; If false, the behavior is to 'undo' physical layout (columns, hyphenation, etc.) ;~ ; and output the text in reading order. ;~ ; Default is True ;~ ; Return values..: Success - 1 ;~ ; Failure - 0, and sets @error to : ;~ ; 1 - PDF File not found ;~ ; 2 - Unable to find the external program ;~ ; =============================================================================================================================== Func _XPDF_ToText($sPDFFile, $sTXTFile, $iFirstPage = 1, $iLastPage = 0, $bLayout = True) Local $sXPDFToText = @ScriptDir & "\pdftotext.exe" Local $sOptions If NOT FileExists($sPDFFile) Then Return SetError(1, 0, 0) If NOT FileExists($sXPDFToText) Then Return SetError(2, 0, 0) If $iFirstPage <> 1 Then $sOptions &= " -f " & $iFirstPage If $iLastPage <> 0 Then $sOptions &= " -l " & $iLastPage If $bLayout = True Then $sOptions &= " -layout" Local $iReturn = ShellExecuteWait ( $sXPDFToText , $sOptions & ' "' & $sPDFFile & '" "' & $sTXTFile & '"', @ScriptDir, "", @SW_HIDE) If $iReturn = 0 Then Return 1 Return 0 EndFunc ; ---> _XPDF_ToText Link to comment Share on other sites More sharing options...
Danp2 Posted April 28, 2015 Share Posted April 28, 2015 @RickB75 Not sure if this will be helpful in your situation, but I'll share just in case. Using the below method, you call the PDFToText conversion routine only once for each PDF file, which can save you a lot of time depending on the size of the files. You end up with an array ($aContents) holding the converted text.expandcollapse popup$INIFile = "myscript.ini" $PDFToText = IniRead($INIFile, "Utilities", "PDFToText", "") ; Convert to text file ConvertToText($cPDFFilename, $cFilename) $hFile = FileOpen($cFilename, 0) ; Check if file opened for reading OK If $hFile = -1 Then ContinueLoop EndIf ; Read file into memory $cText = FileRead($hFile) FileClose($hFile) ; Remove temporary text file FileDelete($cFilename) ; Load into an array $aContents = StringSplit($cText, Chr(12)) ; Adjust for empty cell at bottom If $aContents[$aContents[0]] = "" Then $aContents[0] -= 1 EndIf Func ConvertToText($cSource, $cDestination) Local $result, $cmdString ConsoleWrite("Converting " & $cSource & " to text." & @CRLF) $cmdString = '"' & $PDFToText & '" "' & $cSource & '" "' & $cDestination & '"' $result = RunWait($cmdString, "", @SW_HIDE) Return $result EndFunc Latest Webdriver UDF Release Webdriver Wiki FAQs Link to comment Share on other sites More sharing options...
RickB75 Posted April 28, 2015 Author Share Posted April 28, 2015 (edited) @Danp2 Thank you so much for the script. I'll see if I can implement that into my main script. What my initial thought was to once I convert each page to a txt file, use the FileRead or _FileReadToArray function to start grabbing specific data. In the txt file. The only issue I can see right now is that the data could be on diff lines. I was thinking about maybe a Select Case Loop. Not sure yet. Haven't gotten that far yet. @jguinch From what I can tell, and I haven't tested this but it looks like it redeclare's the $options Var. Do you think something like this would work?Local $sOptions = $iFirstPage & $iLastPage & $bLayout And then in then in the "If" Statements have it like thisIf $iFirstPage <> 1 Then $iFirstPage &= " -f " & $iFirstPageIf $iLastPage <> 0 Then $iLastPage &= " -l " & $iLastPageIf $bLayout = True Then $bLayout &= " -layout"I haven't tested this so I have no idea if it will work. It was just a thought. Edited April 28, 2015 by RickB75 Link to comment Share on other sites More sharing options...
jguinch Posted April 28, 2015 Share Posted April 28, 2015 @RickB75 : you said you didn't get anything back, but can you say more ? how many files are created and how much pages do you have in your pdf ?This code works well for me :Local $sPDF = "D:\Mes Documents\myFile.pdf" $ipageCt = _XFDF_Info($sPDF,"Pages") MsgBox(0,"Total pages",$ipageCt) For $ipg = 1 To $ipageCt _XPDF_ToText($sPDF, "d:\tmp\myFile_page_" & $ipg & ".txt", $ipg, $ipg, True) Next Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
RickB75 Posted April 28, 2015 Author Share Posted April 28, 2015 I just noticed an error on my part @jguinch. Your right. Your edited ShellExecuteWait does work. When I copied the function over and placed the new one at the bottom of my main script, I didn't point the Var $sXPDFToText to the correct Dir to start the application. I changed it and pointed to the correct Dir and it runs through fine. My apologies for not paying closer attention. Link to comment Share on other sites More sharing options...
jguinch Posted April 28, 2015 Share Posted April 28, 2015 No problem. It is a pleasure to see that someone use my code At the same time, I could see that there was a bug, and corrected it in the original posts Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now