Jump to content

'Scrape' Text from a Dynamically Generated Web Page


Recommended Posts

I hate to just flat out ask, 'can you write this for me please?', but I'm totally stuck on what must be a fairly simple issue!

I'm trying to extract a small amount of text from a large number of dynamically generated web pages. I've got the actual generation of the pages down perfectly, and I can put the source of the pages into a variable using _IEBodyReadHTML. Extracting the data I need is where I get stuck. The pages can contain anything, in any position. However, I know what I'm looking for follows a standard format. There is one unique string, followed by the text I'm interested in surrounded by <B> tags. That unique string will not exist anywhere else on the page. Three examples would be;

<span id="_ctrl_fruit_selected"><b>Orange</b>
<span id="_ctrl_fruit_selected"><b>Apple</b>
<span id="_ctrl_fruit_selected"><b>Strawberry Mango</b>

The data I want to extract would be 'Orange', 'Apple' and 'Strawberry Mango' respectively. To put it in 'human code' I would write;

-Search file for '_ctrl_fruit_selected'
-Find the bold tags to the right of this
-Grab whatever text is between them

Putting this in computer code though, is baffling me. I know it has to have something to do with 'StringRegExp', but how to do it, I'm lost!

Link to comment
Share on other sites

You can do it without RegExp but it takes more steps

If StringInStr($Line, "_ctrl_fruit_selected" ) Then

$String = StringMid($Line, StringInStr($Line, "_ctrl_fruit_selected" ))

$String = StringMid($String, StringInStr($String, <b>)+4)

$String = StringMid($String, StringInStr($String, </b>)-1)

EndIf

StringRegExp is the way to go but wait for someone who likes RegExp

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

You can do it without RegExp but it takes more steps

If StringInStr($Line, "_ctrl_fruit_selected" ) Then

$String = StringMid($Line, StringInStr($Line, "_ctrl_fruit_selected" ))

$String = StringMid($String, StringInStr($String, <b>)+4)

$String = StringMid($String, StringInStr($String, </b>)-1)

EndIf

StringRegExp is the way to go but wait for someone who likes RegExp

GEOSoft, thank you! I hadn't thought of StringMid. I didn't quite do it like your suggestion, but I got it to work. Now to work on the next step!

Link to comment
Share on other sites

Glad it's working but I just looked and realized that the last stringmid should have been a stringLeft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...