Jump to content

Is There a Way to Extract Specific Elements From an HTML Web Page


leegold
 Share

Recommended Posts

Hi,

Using FF.au3. I want to extract from a web page all HTML elements with the general form of

<p class="row"...>,,,</p> there are many elements like this on the page and I want to get them all.

I was thinking to use _FFXPath but the page is not Well Formed  XHTML/XML. I also see FFReadHTML but as AFAIK it only copies all children of html or body tags(?)

I could use FFReadHTML and then use regex to get all the elements I want. But is there an easier way?

Thanks,

Lee G.

I

Link to comment
Share on other sites

i have always used the FFReadHTML and then used StringInStr along with StringMid

allows you to pull everything between <p class="row"...> and </p> and then store it in an array.

the thing that takes the longest is the FFReadHTML

 

i would also suggest writing the HTML to a text file for debugging reasons. once u get the script working 100%, cut out that part of the code.

Link to comment
Share on other sites

I don't know about anything to do with any third party UDF's you may be using but when I need to extract elements from web pages I break the page down gradually into chunks using _StringBetween() to get to the information I need from the web page. An example of how I do this is here: '?do=embed' frameborder='0' data-embedContent>>

I don't know if that is of any use to you but I thought I would share that just in case.

Edited by Morthawt
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...