Jump to content
Sign in to follow this  
Aeterna

Parsing HTML with StringRegExp?

Recommended Posts

Aeterna

I have no idea what I'm doing with StringRegExp yet, I've looked at the help files but the pieces arent fitting together for me yet.

Could anybody give me an example of how to take get the array of matches of the string "><b>TEXT I WANT TO GET</b></a>" ?

This was my guess but its wrong.

$aBoldItems = StringRegExp($oBody, "(?i><b>\w</b></a>)", 3)

Share this post


Link to post
Share on other sites
Authenticity

Give HTML example of the page if you don't mind ;]

Share this post


Link to post
Share on other sites
Authenticity

If you're using IE.au3 library, you just need to use _IELinkGetCollection() and walk through the links searching for If StringInStr(_IEPropertyGet($oLink, 'outertext'), '<b>') Then ....

A snippet:

For $oLink In $oLinks
    Local $sOuterText = _IEPropertyGet($oLink, 'outertext')
    If StringInStr($sOuterText, '<b>') Then
        ReDim $avNames[$avNames[0]+1]
        Local $Tmp =  _StringBetween($sOuterText, '<b>', '</b>')
        $avNames[$avNames[0]-1] = $Tmp[$Tmp[0]]
    EndIf
Next

StringRegExp is useful too, but it's not required (if you use IE.au3 library).

Share this post


Link to post
Share on other sites
andybiochem

As Authenticity said, StringRegExp is great, but often things can be done more simply.

Try _StringBetween()

e.g.

#include <String.au3>
#include <Array.au3>

$HTML = "asdasd....><b>TEXT I WANT TO GET</b></a>......dfsghsgj......><b>More TEXT I WANT TO GET</b></a>"

$array = _StringBetween($HTML,"><b>","</b></a>")

_ArrayDisplay($array)

[EDIT] Oops, sorry Authenticity I didn't notice your _StringBetween bit. ^_^

Edited by andybiochem

- Table UDF - create simple data tables - Line Graph UDF GDI+ - quickly create simple line graphs with x and y axes (uses GDI+ with double buffer) - Line Graph UDF - quickly create simple line graphs with x and y axes (uses AI native graphic control) - Barcode Generator Code 128 B C - Create the 1/0 code for barcodes. - WebCam as BarCode Reader - use your webcam to read barcodes - Stereograms!!! - make your own stereograms in AutoIT - Ziggurat Gaussian Distribution RNG - generate random numbers based on normal/gaussian distribution - Box-Muller Gaussian Distribution RNG - generate random numbers based on normal/gaussian distribution - Elastic Radio Buttons - faux-gravity effects in AutoIT (from javascript)- Morse Code Generator - Generate morse code by tapping your spacebar!

Share this post


Link to post
Share on other sites
Authenticity

No problema ;]

After reviewing the page I think RegExp is the way to go because otherwise you'll need to add unnecessary code to remove the last 2 elements from the array, and maybe the first also.

#include <Array.au3>
#include <IE.au3>

Dim $o_IE = _IECreate('http://www.kleimo.com/random/name.cfm')
Dim $oSelect = _IEGetObjByName($o_IE, 'number')
    $oSelect.selectedIndex = 5

Dim $oSubmit = _IEGetObjByName($o_IE, 'Go')
_IEAction($oSubmit, 'click')
_IELoadWait($o_IE)

Dim $sPattern = '(?i)\d+\..*?<b>(.*?)</b>'
Dim $avNames = StringRegExp(_IEDocReadHTML($o_IE), $sPattern, 3)
If IsArray($avNames) Then _ArrayDisplay($avNames)

If you need example using trancexx's HTTP library I'd like to help. Just say. ;]

Edited by Authenticity

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×