Jump to content

Get web page contents -NOT source code- without using the clipboard


 Share

Recommended Posts

Hello everyone, the title is pretty much what I want to do but I'll elaborate below for details.

I'm currently using a script to ease out my workload, it uses StringRegExp to extract certain bits of info out of a text file that I make and then makes a neat record with whatever it got that I can easily check out later. The only manual labor that I'm forced to do is open a web page in my browser, press ctrl+a ctrl+c, paste it in notepad, replace every double tab character with a semicolon and save the text file. Unfortunately sometimes things get too hectic and I don't get the chance to sit down and sort that out, and while I'm busy somewhere else those precious bits of info just go to waste.

That's why I've decided to make the script completely autonomous, but after about two hours of searching through the forums, I've yet to find a definite solution for two reasons:

-I can't rely on _InetGetSource because it would be more than a nightmare to figure out a reliable regular expression to get my info out of the page's source code; it would be impossible, considering how complex and dynamic it is, whereas the displayed text on the browser is perfectly manageable.

-The pc is always being used, even when I'm not around and especially on hectic times, so automatically opening up the browser and making the mouse and keyboard fetch the displayed text into the clipboard would be a very unreliable approach (I tried it once, with BlockInput and everything, never again).

If you could please help me or point me to the right thread where they can help me with my problem, I'd be forever in your debt.

Thanks in advance!

Edited by Cybergon
Link to comment
Share on other sites

you need a html scraper:

write it your self or use someone elses

theres lots of way to scrape text from html.

it would be more than a nightmare to figure out a reliable regular expression to get my info out of the page's source code; it would be impossible, considering how complex and dynamic it is, whereas the displayed text on the browser is perfectly manageable.

WHO wrote the browser? Don't try to lie to us, its possible!

Edited by TormentoRobots

#Include-once;TormentoRobots

Link to comment
Share on other sites

Thanks TR, I'll be taking the web scraper approach then, I didn't know about that, wish me luck!

WHO wrote the browser? Don't try to lie to us, its possible!

If it was a simple page there'd be no problem, but in this one there's a lot of javascript magic going on and the info is all scattered in div tags that hide some stuff and show some other. It's pretty random and uses a lot of variables in the tags to make things worse. I wish I could explain it better, but it is frankly beyond me, sorry.

Link to comment
Share on other sites

The only manual labor that I'm forced to do is open a web page in my browser, press ctrl+a ctrl+c, paste it in notepad, replace every double tab character with a semicolon and save the text file.

If you do not want to go the screenscraper route the sequence you describe can be automated.

There is also a rich compliment of _IE functions for interacting with WEB pages.

What is the URL of the page?

kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

If you do not want to go the screenscraper route the sequence you describe can be automated.

There is also a rich compliment of _IE functions for interacting with WEB pages.

I've looked at all of the _IE functions thoroughly but none seem to be of any use in my particular situation, is there any you'd recommend me?

What is the URL of the page?

The particular one I'm interested in is only accessible from certain authorized places (as far as I understand) so there's no point, sorry.

Link to comment
Share on other sites

Without the URL or an example of the HTML I would only be guessing.

I'm telling you to forget about the source code, trust me, there's no way to get what I want without opening an IE window and I already gave you the reasons why that is unpractical. Unless there's a way to hide it and do everything behind curtains with _IEAction or something?

Link to comment
Share on other sites

_IEBodyReadText() appears to be what you are looking for. Read about it in the help file.

If this needs to be done without disrupting a user on the PC, use the $f_visible flag in _IECreate to prevent the browser from displaying (and insure you use _IEQuit when you are done).

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

ConsoleWrite(_INetGetText('http://autoitscript.com') & @LF)

Func _INetGetText($sURL)
    Local $bStr = InetRead($sURL, 19)
    If @error Then
        Return SetError(1, 0, 0)
    EndIf

    Local $oHTML = ObjCreate("HTMLFILE")
    If @error Then Return SetError(2, 0, 0)

    $oHTML.Open()
    $oHTML.Write(BinaryToString($bStr))

    ; $oHTML.... 

    Return SetError(0, 0, $oHTML.Body.InnerText)
EndFunc ;==>_INetGetText

Maybe this will help you get started...

Link to comment
Share on other sites

  • 1 year later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...