Sign in to follow this  
Followers 0
picea892

PDF Line and page numbers

5 posts in this topic

#1 ·  Posted (edited)

I am hopeful that one of the innovative people of this forum has developed a way to slay the adobe pdf beast. What a piece of junk pdf's are. Large, cumbersome and seemingly impossible to program using autoit.

If I have a large pdf document, with line numbers on the left and page numbers on the center bottom (which of course don't match the PDF page number). Is there any way to collect this information based on what is highlighted in the document.

So for examples actual page 201(pdf page 220) lines 5 to 30.

I suppose there is no easy route. I mean the PDFs are just large pictures right? So I'd technically have to analyze a picture in order to get the information. I'm guessing this post will be nothing more than my rant, but I have to ask the question because it frustrates me to no end.

If anyone has any solutions I am sure the children will sing songs of your glory until the end of time.

Picea

Edited by picea892

Share this post


Link to post
Share on other sites



PDFs are definitely not large pictures. They have a very complicated document structure describing their content. It is possible to get the text from a PDF, provided that the PDF is not made up of a bunch of pictures with text which often happens with scanned documents. ;)

In AutoIt, I have no idea though. I have used the non commercial Ghostscript with C# in the past.

Share this post


Link to post
Share on other sites

Thanks for the link, I'll look into it.

After I cooled down a bit, I realized that you are right. When you cut and past text out of a pdf, it cuts the line numbers with it and it has line breaks at the end of the text.

So wondering if I could search for a hard return and find the number immediately proceeding it in order to get the line number.

Share this post


Link to post
Share on other sites

you could try "Simpo PDF to txt" then autoit can read the txt. That is what I use with one of my scripts.


010101000110100001101001011100110010000001101001011100110010000

001101101011110010010000001110011011010010110011100100001

My Android cat and mouse game
https://play.google.com/store/apps/details?id=com.KaosVisions.WhiskersNSqueek

We're gonna need another Timmy!

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

you could try "Simpo PDF to txt" then autoit can read the txt. That is what I use with one of my scripts.

If you just want the text then all you have to do is send Ctrl A then Ctrl C and save the clipbord to some file isn't it?

Edited by martin

Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0