RickB75

Help with a function and passing a Cmd Line Parameter.

21 posts in this topic

Just wondering if you guys can help me with this CMD Line Param. The function works except passing the page number  into the CMD line. From what I can tell, It's not passing the page number into the CMD Line Param here:

Local $iReturn = ShellExecuteWait ( $sXPDFToText , '"' & $sPDFFile & '" "' & $sTXTFile & '"', @ScriptDir, "", @SW_HIDE) 

It's only passing the PDF file name and the Txt file I want to save it to. I've tried adding specific page numbers in the syntax and it still processes the entire PDF. Here is the info from the help file from XPDF for the PDFtoText App. 

SYNOPSIS
       pdftotext [options] [PDF-file [text-file]]

 

I've tried, I'm just not very good at CMD Line Param's.

 

; #FUNCTION# ====================================================================================================================
; Name...........: _XPDF_ToText
; Description....: Converts a PDF file to plain  text.
; Syntax.........: _XPDF_ToText ( "PDFFile" , "TxtFile" [ , FirstPage [, LastPage [, Layout ]]] )
; Parameters.....: PDFFile    - PDF Input File.
;                  TxtFile    - Plain text file to convert to
;                  FirstPage  - First page to convert (default is 1)
;                  LastPage   - Last page to convert (default is last page of the document)
;                  Layout     - If true, maintains (as  best as possible) the original physical layout of the text
;                               If false, the behavior is to 'undo'  physical  layout  (columns, hyphenation, etc.)
;                                 and output the text in reading order.
;                               Default is True
; Return values..: Success - 1
;                  Failure - 0, and sets @error to :
;                   1 - PDF File not found
;                   2 - Unable to find the external program
; ===============================================================================================================================
Func _XPDF_ToText($sPDFFile, $sTXTFile, $iFirstPage = 1, $iLastPage = 0, $bLayout = True)
    Local $sXPDFToText = @ScriptDir & "\pdftotext.exe"
    Local $sOptions
   
    If NOT FileExists($sPDFFile) Then Return SetError(1, 0, 0)
    If NOT FileExists($sXPDFToText) Then Return SetError(2, 0, 0)
   
    If $iFirstPage <> 1 Then $sOptions &= " -f " & $iFirstPage
    If $iLastPage <> 0 Then $sOptions &= " -l " & $iLastPage
    If $bLayout = True Then $sOptions &= " -layout"
   
    Local $iReturn = ShellExecuteWait ( $sXPDFToText , '"' & $sPDFFile & '" "' & $sTXTFile & '"', @ScriptDir, "", @SW_HIDE)
    If $iReturn = 0 Then Return 1
   
    Return 0
   
EndFunc ; ---> _XPDF_ToText

 

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Just took a quick look, but it doesn't appear that you actually posted the code that isn't work.

One suggestion: Build the entire parameter string and store it to a variable (like you are doing with $sOptions). Then output this to the console window so that you can examine the entire string for any issues.

P.S. Try it from the Cmd window until you figure out the correct syntax for pdftotext. Once you've got that down, then you can try what I suggested above.

Edited by Danp2

Share this post


Link to post
Share on other sites

I don't see the $sOptions in your ShellExecuteWait arguments.

Share this post


Link to post
Share on other sites

I found this Function that was posted a year ago here on the forums. It was orginally posted on the French Version of Autoit. It works great for converting the entire PDF to txt. After trying to specify a specific page multiple times, I noticed the same same thing ($sOptions isn't in the ShellExWait). I'm not very good at all with CMD Line Param's. Thats why I posted the entire function here to see if you guys could give an example of where I need to put the $sOptions Var in the param's. To me, the $sOptions isn't very defined so my main script, I changed it a little bit to this to make the first and last page a little more defined.  

$iFirstPage = " -f " & $iFirstPage
 $iLastPage = " -l " & $iLastPage

I tried running it like this but it didn't work.

Local $iReturn = ShellExecuteWait ( $sXPDFToText , '"' & $iFirstPage & '" "' & $iLastPage & '"' & '"' & $sPDFFile & '" "' & $sTXTFile & '"', @ScriptDir, "", @SW_HIDE)

 

Share this post


Link to post
Share on other sites

Try it from the Cmd window until you figure out the correct syntax for pdftotext. Once you've got that down, then you can try what I suggested above.

​That's the best advice I can give you so that you can figure out the correct syntax and make sure that pdftotext works as you would expect before introducing your own coding.

Share this post


Link to post
Share on other sites

Thank you Dan. I'm checking it now.

Share this post


Link to post
Share on other sites

What do you plan on doing with the results once the text is extracted?

Share this post


Link to post
Share on other sites

I would look at the code posted by RickB75, just don't put quotes around the page number variables, not sure that they're needed. Also, if you do use the quotes, the quotes between $iLastPage and $sPDFFile are wrong, there's no space between them.


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

Sorry for the late reply. Here's my current project. We have a 2 vendors that don't talk with each other working out of 2 diff databases. One vendor mines their database and once a customer record is found that might be a good candidate to trade in there current vehicle for a newer vehicle, they will send an email to management with a PDF attached with the customer info in the PDF. Thats great but, all of our mgmt and the reps primary work out of the other vendors program. Soooo, I'm building a bridge between the two systems. When Vendor (A) sends a PDF, I can extract the customer data from it, Build an XML file with this data and push the data to Vendor (B). Also, Vendor (B) has a ton of reports and analytic data that we can use to see how effective we are being with the Customers. 

Edited by RickB75

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

I got it figured out guys!! It was the quotes around the first and last page. Here's what's working. This grabs the specified pages in the PDF's and extracts the txt.

Local $iReturn = ShellExecuteWait ( $sXPDFToText , $iFirstPage & ' ' & $iLastPage & ' "' & $sPDFFile & '" "' & $sTXTFile & '"', @ScriptDir, "", @SW_HIDE)

Thanks guys for all your help. @BrewManNH Thanks for mentioning the quotes!

Edited by RickB75

Share this post


Link to post
Share on other sites

Here is the original post of the 3 functions _XFDF_Info, _XPDF_Search and _XPDF_ToText : http://www.autoitscript.fr/forum/viewtopic.php?f=21&t=12056#p83720

It seems I forgot to use $sOptions in the command line (sorry). Could you try with just replacing the ShellExecuteWait line by this one ;

Local $iReturn = ShellExecuteWait ( $sXPDFToText , $sOptions & ' "' & $sPDFFile & '" "' & $sTXTFile & '"', @ScriptDir, "", @SW_HIDE)

 

Share this post


Link to post
Share on other sites

jguinch,

I tried running the script with your revised ShellExecuteWait and I didn't get anything back. Below is the link I copied the Function from. The standard function works great for entire PDF's the way it's written. I just needed to break apart each page in the PDF. Also the XPDF_Info function works perfect. That's what I use to get the Page count. I haven't tried the XPDF_Search yet.

Share this post


Link to post
Share on other sites

For $i = 0 To UBound($filelist) - 1

    $ipageCt = _XFDF_Info("C:\samplePDF\" & $filelist[$i] & ".pdf","Pages")
        ;MsgBox(0,"Total pages",$ipageCt)
    
       For $ipg = 1 To $ipageCt
           ;MsgBox(0,"page number",$ipg)
    
            _XPDF_ToText("C:\samplePDF\" & $filelist[$i] & ".pdf","C:\processedsheets\"& $filelist[$i] &"_page_" & $ipg & ".txt",$ipg,$ipg,True)
    
        Next

Next

;~ ; #FUNCTION# ====================================================================================================================
;~ ; Name...........: _XPDF_ToText
;~ ; Description....: Converts a PDF file to plain  text.
;~ ; Syntax.........: _XPDF_ToText ( "PDFFile" , "TxtFile" [ , FirstPage [, LastPage [, Layout ]]] )
;~ ; Parameters.....: PDFFile    - PDF Input File.
;~ ;                  TxtFile    - Plain text file to convert to
;~ ;                  FirstPage  - First page to convert (default is 1)
;~ ;                  LastPage   - Last page to convert (default is last page of the document)
;~ ;                  Layout     - If true, maintains (as  best as possible) the original physical layout of the text
;~ ;                               If false, the behavior is to 'undo'  physical  layout  (columns, hyphenation, etc.)
;~ ;                                 and output the text in reading order.
;~ ;                               Default is True
;~ ; Return values..: Success - 1
;~ ;                  Failure - 0, and sets @error to :
;~ ;                   1 - PDF File not found
;~ ;                   2 - Unable to find the external program
;~ ; ===============================================================================================================================
Func _XPDF_ToText($sPDFFile, $sTXTFile, $iFirstPage = 1, $iLastPage = 0, $bLayout = True)
    Local $sXPDFToText = @ScriptDir & "\pdftotext.exe"
    Local $sOptions

    If NOT FileExists($sPDFFile) Then Return SetError(1, 0, 0)
    If NOT FileExists($sXPDFToText) Then Return SetError(2, 0, 0)

    If $iFirstPage <> 1 Then $sOptions &= " -f " & $iFirstPage
    If $iLastPage <> 0 Then $sOptions &= " -l " & $iLastPage
    If $bLayout = True Then $sOptions &= " -layout"

    Local $iReturn = ShellExecuteWait ( $sXPDFToText , $sOptions & ' "' & $sPDFFile & '" "' & $sTXTFile & '"', @ScriptDir, "", @SW_HIDE)
    If $iReturn = 0 Then Return 1

    Return 0

EndFunc ; ---> _XPDF_ToText

 

Share this post


Link to post
Share on other sites

@RickB75 Not sure if this will be helpful in your situation, but I'll share just in case. Using the below method, you call the PDFToText conversion routine only once for each PDF file, which can save you a lot of time depending on the size of the files. You end up with an array ($aContents) holding the converted text.

$INIFile = "myscript.ini"
$PDFToText = IniRead($INIFile, "Utilities", "PDFToText", "")

    ; Convert to text file
    ConvertToText($cPDFFilename, $cFilename)
    
    $hFile = FileOpen($cFilename, 0)

    ; Check if file opened for reading OK
    If $hFile = -1 Then
        ContinueLoop
    EndIf

    ; Read file into memory
    $cText = FileRead($hFile)

    FileClose($hFile)

    ; Remove temporary text file
    FileDelete($cFilename)  
    
    ; Load into an array
    $aContents = StringSplit($cText, Chr(12))

    ; Adjust for empty cell at bottom
    If $aContents[$aContents[0]] = "" Then
        $aContents[0] -= 1
    EndIf
    
Func ConvertToText($cSource, $cDestination)
Local $result, $cmdString
    ConsoleWrite("Converting " & $cSource & " to text." & @CRLF)

    $cmdString = '"' & $PDFToText & '" "' & $cSource & '"  "' & $cDestination & '"'

    $result = RunWait($cmdString, "", @SW_HIDE)

    Return $result
EndFunc

 

Share this post


Link to post
Share on other sites

#17 ·  Posted (edited)

@Danp2 Thank you so much for the script. I'll see if I can implement that into my main script. What my initial thought was to once I convert each page to a txt file, use the FileRead or _FileReadToArray function to start grabbing specific data. In the txt file. The only issue I can see right now is that the data could be on diff lines. I was thinking about maybe a Select Case Loop. Not sure yet. Haven't gotten that far yet. 

@jguinch 

From what I can tell, and I haven't tested this but it looks like it redeclare's the  $options Var. Do you think something like this would work?

Local $sOptions = $iFirstPage & $iLastPage & $bLayout

 

And then in then in the "If" Statements have it like this

If $iFirstPage <> 1 Then $iFirstPage &= " -f " & $iFirstPage
If $iLastPage <> 0 Then $iLastPage &= " -l " & $iLastPage
If $bLayout = True Then $bLayout &= " -layout"

I haven't tested this so I have no idea if it will work. It was just a thought. 

Edited by RickB75

Share this post


Link to post
Share on other sites

@RickB75 : you said you didn't get anything back, but can you say more ? how many files are created and how much pages do you have in your pdf ?

This code works well for me :

Local $sPDF = "D:\Mes Documents\myFile.pdf"

$ipageCt = _XFDF_Info($sPDF,"Pages")
MsgBox(0,"Total pages",$ipageCt)

For $ipg = 1 To $ipageCt
    _XPDF_ToText($sPDF, "d:\tmp\myFile_page_" & $ipg & ".txt", $ipg, $ipg, True)
Next

 

Share this post


Link to post
Share on other sites

I just noticed an error on my part @jguinch. Your right. Your edited ShellExecuteWait does work. When I copied the function over and placed the new one at the bottom of my main script, I didn't point the Var  $sXPDFToText to the correct Dir to start the application. I changed it and pointed to the correct Dir and it runs through fine. My apologies for not paying closer attention. 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now