picea892 Posted December 9, 2009 Share Posted December 9, 2009 (edited) I am hopeful that one of the innovative people of this forum has developed a way to slay the adobe pdf beast. What a piece of junk pdf's are. Large, cumbersome and seemingly impossible to program using autoit. If I have a large pdf document, with line numbers on the left and page numbers on the center bottom (which of course don't match the PDF page number). Is there any way to collect this information based on what is highlighted in the document. So for examples actual page 201(pdf page 220) lines 5 to 30. I suppose there is no easy route. I mean the PDFs are just large pictures right? So I'd technically have to analyze a picture in order to get the information. I'm guessing this post will be nothing more than my rant, but I have to ask the question because it frustrates me to no end. If anyone has any solutions I am sure the children will sing songs of your glory until the end of time. Picea Edited December 9, 2009 by picea892 Link to comment Share on other sites More sharing options...
jvanegmond Posted December 9, 2009 Share Posted December 9, 2009 PDFs are definitely not large pictures. They have a very complicated document structure describing their content. It is possible to get the text from a PDF, provided that the PDF is not made up of a bunch of pictures with text which often happens with scanned documents. In AutoIt, I have no idea though. I have used the non commercial Ghostscript with C# in the past. github.com/jvanegmond Link to comment Share on other sites More sharing options...
picea892 Posted December 9, 2009 Author Share Posted December 9, 2009 Thanks for the link, I'll look into it. After I cooled down a bit, I realized that you are right. When you cut and past text out of a pdf, it cuts the line numbers with it and it has line breaks at the end of the text. So wondering if I could search for a hard return and find the number immediately proceeding it in order to get the line number. Link to comment Share on other sites More sharing options...
kaotkbliss Posted December 9, 2009 Share Posted December 9, 2009 you could try "Simpo PDF to txt" then autoit can read the txt. That is what I use with one of my scripts. 010101000110100001101001011100110010000001101001011100110010000 001101101011110010010000001110011011010010110011100100001 My Android cat and mouse gamehttps://play.google.com/store/apps/details?id=com.KaosVisions.WhiskersNSqueek We're gonna need another Timmy! Link to comment Share on other sites More sharing options...
martin Posted December 9, 2009 Share Posted December 9, 2009 (edited) you could try "Simpo PDF to txt" then autoit can read the txt. That is what I use with one of my scripts.If you just want the text then all you have to do is send Ctrl A then Ctrl C and save the clipbord to some file isn't it? Edited December 9, 2009 by martin Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now