Jump to content

IE.au3 performance issues


Recommended Posts

Hey guys,

I've been using IE.au3 for just about everything, but for my most recent project I'm having a bit of trouble.

My script needs to read the text from a massive (i.e. 40-50 thousand lines of text) webpage.

It either reads the entire page in less than 5 seconds, or locks the entire machine up. (It sometimes makes it out of the crash after a few minutes... sometimes not)

All I'm using is the basic _IEBodyReadText($oIE)... nothing fancy. Is there something I can do to have it "play" with big webpages "nicer"?

Thanks! :)

Brickoneer

Link to comment
Share on other sites

There would be no explanation in IE.au3 that I could offer for this. Two things come to mind as likely sources of your trouble. First, are you certain that it is the _IEBodyReadText that is the source of the slowdown? Insure that it is not instead that AutoIt is waiting for the page to complete loading before executing the _IEBodyReadText function (use SciTe Debug mode as one method of discerning this (or sprinkle in some ConsoleWrite commands). If it is not the first issue, the next most likely cause is a process or system performance bottle-neck not directly related to IE.au3. Use the System Monitor to try to figure out what is consuming CPU or memory during this time. If you have multiple IE windows going, this could also contribute.

Especially if it works quickly sometimes and not others, it is unlikely that IE.au3 is the source of the trouble directly.

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

Ok, here's what I've found.

The part that is the actual slowdown is _IEBodyReadText... particularly the line:

Return $o_object.document.body.innerText

(of course that is basically the entire function right there... but that is the slowdown.)

By monitoring the machine, as soon as it goes to the _IEBodyReadText function the IE process hits a constant 99% CPU load, and memory for the IE browser hits 60,000 k.

Now it is semi-reliably reading the text in about 8.5 seconds.

If I can get it to "break" again I guess I'll be back.

Brick

[edit] It failed again. I finished typing out this post, clicked over to the IE my script was trying to read and it was frozen, not responding, and locked at 99% CPU. I gave it a few minutes to see if it would recover and it never did. I'm really not sure what the problem is... maybe I should try and do it at a lower level like the HTTP UDF. (I even failed miserably at that... I managed to get the first 2-7 bytes of the webpage html and thats it.)

[edit 2] I was wrong. It did manage to pull out. The _IEBodyReadText function took just over 10 minutes to read about 30-40,000 lines of text.

Edited by Brickoneer
Link to comment
Share on other sites

Just because you found an instruction that takes a long time to complete does not mean that you have identified root cause. As I suggested, there is likely some other performance bottleneck that you've hit that is exacerbated by this operation. In particular, look for things that may be consuming a lot of memory -- it may also be that the amount of text you are reading is causing you to deplete the memory on your system.

You can also try starting IE without any Add-ons (Start, Programs, Accessorits, System Tools) and see it it is possibly a destructive interaction with something you have added to IE.

If the text you are reading is really huge, there would be more efficient ways of accessing it. Using _IEBodyReadText you actually must first get it loaded into IE and then you make a copy of it in AutoIt, so you consume double the resources. Something like INetGet or TCP communication would only require you hold it in memory once.

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...