Jump to content

Recommended Posts

Really? Does it keep table delimination (cells) intact?

tried out a free-ware one, and although tables still look like tables, it's just a bunch of space chars padding everything...i suppose you can delim on where there are multiple spaces (of course the forum editor removes them all, ha):

Property Name Property Type Access Type Description

Hostname string GET/PUT Retrieves and sets the name of a server, where Hostname is

the server’s hostname or IP address. If Hostname is not given

or undefined, the authentication is performed on the local

Port integer GET/PUT Retrieves and sets the TCP port to use when connecting to the

server. Its default value is 0 (zero), indicating the default port

number should be used. Otherwise, enter the correct port

number.

A port number set to a negative value is treated as an incorrect

value and the default port number is used instead.

Note: The default port number for ESX Server 3.x is 443; the

Edited by jdelaney
IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Link to post
Share on other sites

The input PDF remains unchanged. The tools extracts the text from the PDF and writes it to an output file. You then can easily process the output file or many of them.

Thats the plan anyway. But I dont see another alternative to click & drag to acquire my data. Unless you could suggest anythign else? that might be easier.

Link to post
Share on other sites

Don't know. Give it a try and see how a table is converted to text.

Or use on of the online PDF converters to create a Word document. Then process the Word document using the Word UDF or my WordEX UDF.

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to post
Share on other sites

Don't know. Give it a try and see how a table is converted to text.

Or use on of the online PDF converters to create a Word document. Then process the Word document using the Word UDF or my WordEX UDF.

I wish I could but Im not being allowed to :(

Link to post
Share on other sites

Well if anyone is looking at my code.

Could someone have a look at say the first function.

As when I do copy my values I send them to clipboard and store that data in a variable called $row1

but after that I need to clear the clipboard and when i put ClipPut("")

it then doesnt recognise the next function ctrl+c.

Link to post
Share on other sites

Not allowed to what, access the internet?

If that's not the case, I just found this site, and you can automate it, since it doesn't require you to type in the mangled words (human verification)

http://www.pdfonline.com/pdf-to-word-converter/

actually a really good one...going to pass it along to my team...much easier to loop through table objects.

edit: you are wasting your time with the copy buffer

Edited by jdelaney
IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Link to post
Share on other sites

@jdelaney if he's got sensitive data in the PDFs, an online converter is a big no-no. I'm working on something for him that's a little more... conventional... to say the least.

Spoiler

“Hello, ladies, look at your man, now back to me, now back at your man, now back to me. Sadly, he isn’t me, but if he stopped using ladies scented body wash and switched to Old Spice, he could smell like he’s me. Look down, back up, where are you? You’re on a boat with the man your man could smell like. What’s in your hand, back at me. I have it, it’s an oyster with two tickets to that thing you love. Look again, the tickets are now diamonds. Anything is possible when your man smells like Old Spice and not a lady. I’m on a horse.”

 

Link to post
Share on other sites

Well... try this on for size... Make sure that sumatra is set to your default PDF reader first.

And replace $sReader $dirPDF and $sPDF with the correct values if you would.

Tested on WinXP

_PDFSearch()

Func _PDFSearch()
    Local $sReader, $dirPDF, $sPDF, $pidReader, $hReader, $hTimer, $x

    $sReader = "C:\Program Files\SumatraPDF\SumatraPDF.exe"
    $dirPDF = "C:\path\to\pdfs\"
    $sPDF = "pdfname.pdf"

    $pidReader = Run($sReader & ' "' & $dirPDF & $sPDF & '"')
    if not $pidReader Then
        msgbox(16 + 262144, @AutoItExe, "Failed to open PDF file.")
        Exit
    EndIf
    WinWait($sPDF & " - SumatraPDF")
    $hReader = WinGetHandle($sPDF & " - SumatraPDF")
    WinActivate($hReader)
    if not WinWaitActive($hReader, "", 5) Then
        msgbox(16 + 262144, @AutoItExe, "Timed out waiting for application window to gain focus.")
        Exit
    EndIf
    ControlFocus($hReader, "", "[CLASS:SUMATRA_PDF_CANVAS; INSTANCE:1]")
    Sleep(500)
    ClipPut("")
    $hTimer = TimerInit()
    $x = 500
    While not ClipGet()
        If TimerDiff($hTimer) > 5000 Then
            msgbox(16 + 262144, @AutoItExe, "Timed out attempting to get text from document.")
            Exit
        EndIf
        ControlSend($hReader, "", "[CLASS:SUMATRA_PDF_CANVAS; INSTANCE:1]", "^a")
        Sleep($x)
        ControlSend($hReader, "", "[CLASS:SUMATRA_PDF_CANVAS; INSTANCE:1]", "^c")
        Sleep($x)
        $x+=500
    WEnd
    $sStringData = ClipGet()
    If not WinClose($hReader) Then ProcessClose($pidReader)
    ConsoleWrite($sStringData & @CRLF)
    msgbox(0,"","Check your console output. Now you have everything stored in $sStringData for you to manipulate")
EndFunc

EDIT: I just pray you're not dealing with a PDF with multiple columns XD

EDIT2: Replaced shellexecute() of file to Run() of application with the file as a parameter

Edited by Mechaflash
Spoiler

“Hello, ladies, look at your man, now back to me, now back at your man, now back to me. Sadly, he isn’t me, but if he stopped using ladies scented body wash and switched to Old Spice, he could smell like he’s me. Look down, back up, where are you? You’re on a boat with the man your man could smell like. What’s in your hand, back at me. I have it, it’s an oyster with two tickets to that thing you love. Look again, the tickets are now diamonds. Anything is possible when your man smells like Old Spice and not a lady. I’m on a horse.”

 

Link to post
Share on other sites

Well... try this on for size... Make sure that sumatra is set to your default PDF reader first.

And replace $sReader $dirPDF and $sPDF with the correct values if you would.

Tested on WinXP

_PDFSearch()

Func _PDFSearch()
Local $sReader, $dirPDF, $sPDF, $pidReader, $hReader, $hTimer, $x

$sReader = "C:\Program Files\SumatraPDF\SumatraPDF.exe"
$dirPDF = "C:\path\to\pdfs\"
$sPDF = "pdfname.pdf"

$pidReader = Run($sReader & ' "' & $dirPDF & $sPDF & '"')
if not $pidReader Then
msgbox(16 + 262144, @AutoItExe, "Failed to open PDF file.")
Exit
EndIf
WinWait($sPDF & " - SumatraPDF")
$hReader = WinGetHandle($sPDF & " - SumatraPDF")
WinActivate($hReader)
if not WinWaitActive($hReader, "", 5) Then
msgbox(16 + 262144, @AutoItExe, "Timed out waiting for application window to gain focus.")
Exit
EndIf
ControlFocus($hReader, "", "[CLASS:SUMATRA_PDF_CANVAS; INSTANCE:1]")
Sleep(500)
ClipPut("")
$hTimer = TimerInit()
$x = 500
While not ClipGet()
If TimerDiff($hTimer) > 5000 Then
msgbox(16 + 262144, @AutoItExe, "Timed out attempting to get text from document.")
Exit
EndIf
ControlSend($hReader, "", "[CLASS:SUMATRA_PDF_CANVAS; INSTANCE:1]", "^a")
Sleep($x)
ControlSend($hReader, "", "[CLASS:SUMATRA_PDF_CANVAS; INSTANCE:1]", "^c")
Sleep($x)
$x+=500
WEnd
$sStringData = ClipGet()
If not WinClose($hReader) Then ProcessClose($pidReader)
ConsoleWrite($sStringData & @CRLF)
msgbox(0,"","Check your console output. Now you have everything stored in $sStringData for you to manipulate")
EndFunc

EDIT: I just pray you're not dealing with a PDF with multiple columns XD

EDIT2: Replaced shellexecute() of file to Run() of application with the file as a parameter

Thanks for your help mate. Still cant get it to do what I want without click & drag

Link to post
Share on other sites

so did $sStringData not output the text?

Spoiler

“Hello, ladies, look at your man, now back to me, now back at your man, now back to me. Sadly, he isn’t me, but if he stopped using ladies scented body wash and switched to Old Spice, he could smell like he’s me. Look down, back up, where are you? You’re on a boat with the man your man could smell like. What’s in your hand, back at me. I have it, it’s an oyster with two tickets to that thing you love. Look again, the tickets are now diamonds. Anything is possible when your man smells like Old Spice and not a lady. I’m on a horse.”

 

Link to post
Share on other sites

At what point is it failing? I put in quite a few error checks...

Spoiler

“Hello, ladies, look at your man, now back to me, now back at your man, now back to me. Sadly, he isn’t me, but if he stopped using ladies scented body wash and switched to Old Spice, he could smell like he’s me. Look down, back up, where are you? You’re on a boat with the man your man could smell like. What’s in your hand, back at me. I have it, it’s an oyster with two tickets to that thing you love. Look again, the tickets are now diamonds. Anything is possible when your man smells like Old Spice and not a lady. I’m on a horse.”

 

Link to post
Share on other sites

You're going to have to post the code you used. I tested it here and it worked very well. If you never ran into any of my error boxes, then I suspect it may be with the way you've re-written it.

Spoiler

“Hello, ladies, look at your man, now back to me, now back at your man, now back to me. Sadly, he isn’t me, but if he stopped using ladies scented body wash and switched to Old Spice, he could smell like he’s me. Look down, back up, where are you? You’re on a boat with the man your man could smell like. What’s in your hand, back at me. I have it, it’s an oyster with two tickets to that thing you love. Look again, the tickets are now diamonds. Anything is possible when your man smells like Old Spice and not a lady. I’m on a horse.”

 

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...