Jump to content

Parsing HTML with StringRegExp?


Recommended Posts

I have no idea what I'm doing with StringRegExp yet, I've looked at the help files but the pieces arent fitting together for me yet.

Could anybody give me an example of how to take get the array of matches of the string "><b>TEXT I WANT TO GET</b></a>" ?

This was my guess but its wrong.

$aBoldItems = StringRegExp($oBody, "(?i><b>\w</b></a>)", 3)

Link to comment
Share on other sites

If you're using IE.au3 library, you just need to use _IELinkGetCollection() and walk through the links searching for If StringInStr(_IEPropertyGet($oLink, 'outertext'), '<b>') Then ....

A snippet:

For $oLink In $oLinks
    Local $sOuterText = _IEPropertyGet($oLink, 'outertext')
    If StringInStr($sOuterText, '<b>') Then
        ReDim $avNames[$avNames[0]+1]
        Local $Tmp =  _StringBetween($sOuterText, '<b>', '</b>')
        $avNames[$avNames[0]-1] = $Tmp[$Tmp[0]]
    EndIf
Next

StringRegExp is useful too, but it's not required (if you use IE.au3 library).

Link to comment
Share on other sites

As Authenticity said, StringRegExp is great, but often things can be done more simply.

Try _StringBetween()

e.g.

#include <String.au3>
#include <Array.au3>

$HTML = "asdasd....><b>TEXT I WANT TO GET</b></a>......dfsghsgj......><b>More TEXT I WANT TO GET</b></a>"

$array = _StringBetween($HTML,"><b>","</b></a>")

_ArrayDisplay($array)

[EDIT] Oops, sorry Authenticity I didn't notice your _StringBetween bit. ^_^

Edited by andybiochem
- Table UDF - create simple data tables - Line Graph UDF GDI+ - quickly create simple line graphs with x and y axes (uses GDI+ with double buffer) - Line Graph UDF - quickly create simple line graphs with x and y axes (uses AI native graphic control) - Barcode Generator Code 128 B C - Create the 1/0 code for barcodes. - WebCam as BarCode Reader - use your webcam to read barcodes - Stereograms!!! - make your own stereograms in AutoIT - Ziggurat Gaussian Distribution RNG - generate random numbers based on normal/gaussian distribution - Box-Muller Gaussian Distribution RNG - generate random numbers based on normal/gaussian distribution - Elastic Radio Buttons - faux-gravity effects in AutoIT (from javascript)- Morse Code Generator - Generate morse code by tapping your spacebar!
Link to comment
Share on other sites

No problema ;]

After reviewing the page I think RegExp is the way to go because otherwise you'll need to add unnecessary code to remove the last 2 elements from the array, and maybe the first also.

#include <Array.au3>
#include <IE.au3>

Dim $o_IE = _IECreate('http://www.kleimo.com/random/name.cfm')
Dim $oSelect = _IEGetObjByName($o_IE, 'number')
    $oSelect.selectedIndex = 5

Dim $oSubmit = _IEGetObjByName($o_IE, 'Go')
_IEAction($oSubmit, 'click')
_IELoadWait($o_IE)

Dim $sPattern = '(?i)\d+\..*?<b>(.*?)</b>'
Dim $avNames = StringRegExp(_IEDocReadHTML($o_IE), $sPattern, 3)
If IsArray($avNames) Then _ArrayDisplay($avNames)

If you need example using trancexx's HTTP library I'd like to help. Just say. ;]

Edited by Authenticity
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...