Sign in to follow this  
Followers 0
jackyyll

Regexp Help..

5 posts in this topic

Okay, so I have a website that's html I need to parse. I need to find :

1. <b>- TEXT -</b> The text between the two things (i think i already have this one down with (<b>- )(.*)( -</b>) but i dont know if multiple <b>'s will affect it .. dont think it will)

2. URLS:

<a href="mob.php?id=1723&h=3419142165fc215ce0250faa75b35b01">Bob</a>

<a href="world.php?room=2&h=837ea8cec75449736acad26628ded10f&lastroom=1"> East</a>

I'm having alot of trouble with the URL's... Cus i needa find the room=* the h=* (dif on both/all links) the id=* the lastroom=* and the link text. I tried this

<a href="mob.php\?id=(.*)">(.*)</a>

and it just gives me this :

0 => 1723&h=3419142165fc215ce0250faa75b35b01">Bob</a>
<a href="mob.php?id=6157&h=a71ec30f6b3b50ecd15be71b3ef1270e">Man in dark gray</a>
    
<a href="world.php?room=2&h=837ea8cec75449736acad26628ded10f&lastroom=1
1 =>  East

Any ideas? :/

Share this post


Link to post
Share on other sites



These seem to work:

$a = '<a href="mob.php?id=1723&h=3419142165fc215ce0250faa75b35b01">Bob</a>'

$b = '<a href="world.php?room=2&h=837ea8cec75449736acad26628ded10f&lastroom=1"> East</a>'


MsgBox(0, "$a", "id: " & StringMid($a, StringInStr($a, 'id=')+3, StringInStr($a, '&')-(StringInStr($a, 'id=')+3)))

MsgBox(0, "$a", "h: " & StringMid($a, StringInStr($a, 'h=')+2, StringInStr($a, '">')-(StringInStr($a, 'h=')+2)))

MsgBox(0, "$b", "room: " & StringMid($b, StringInStr($b, 'room=')+5, StringInStr($b, '&')-(StringInStr($b, 'room=')+5)))

MsgBox(0, "$b", "h: " & StringMid($b, StringInStr($b, 'h=')+2, StringInStr($b, '&', "", 2)-(StringInStr($b, 'h=')+2)))

MsgBox(0, "$b", "lastroom: " & StringMid($b, StringInStr($b, 'lastroom=')+9, StringInStr($b, '">')-(StringInStr($b, 'lastroom=')+9)))

Share this post


Link to post
Share on other sites

I use something like this for my RSS readers:

Func _StringBetweenCodeTags($s_String, $s_Start, $s_End)
    $a_Array = StringRegExp($s_String, '(?:' & $s_Start & ')(.*?)(?:' & $s_End & ')', 3)
    If @error == 0 Then Return $a_Array
    Return 0
EndFunc
I use FileRead() to get all the info originally, but you could do it a different way... It just needs a string.


[center]Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.[/center]

Share this post


Link to post
Share on other sites

Something like this:

room=([^&]+)&h=([^&]+)&lastroom=([^"]+)">([^<]+)<


“Give a man a script; you have helped him for today. Teach a man to script; and you will not have to hear him whine for help.”AutoIt4UE - Custom AutoIt toolbar and wordfile for UltraEdit/UEStudio users.AutoIt Graphical Debugger - A graphical debugger for AutoIt.SimMetrics COM Wrapper - Calculate string similarity.

Share this post


Link to post
Share on other sites

This puts all the records into an array ("var"="value") where each element alternates either "var" or "value", and each url is separated by an element containing "**end of url**"

#include <Array.au3>

$string = '<a href="mob.php?id=1723&h=3419142165fc215ce0250faa75b35b01">Bob</a><a href="world.php?room=2&h=837ea8cec75449736acad26628ded10f&lastroom=1"> East</a>'
Dim $infoArr[1]

While StringInStr($string, "?")
    $results = StringRegExp($string, '(?:\?)(.*?)(\#)(?:=)', 1)
    If @extended == 1 Then
        _ArrayAdd($infoArr, $results[0])
        $string = StringTrimLeft($string, $results[1])
    Else
        ExitLoop
    EndIf
    While 1
        $results = StringRegExp($string, '(?:=)(.*?)(\#)(?:&)', 1)
        If @extended == 1 Then
            If StringInStr($results[0], ">") == 0 Then
                _ArrayAdd($infoArr, $results[0])
                $string = StringTrimLeft($string, $results[1])
            Else
                ExitLoop
            EndIf
        Else
            ExitLoop
        EndIf
        $results = StringRegExp($string, '(?:&)(.*?)(\#)(?:=)', 1)
        If @extended == 1 Then
            _ArrayAdd($infoArr, $results[0])
            $string = StringTrimLeft($string, $results[1])
        Else
            ExitLoop
        EndIf
    WEnd
    $results = StringRegExp($string, '(?:=)(.*?)(\#)(?:")', 1)
    If @extended == 1 Then
        _ArrayAdd($infoArr, $results[0])
        $string = StringTrimLeft($string, $results[1])
    Else
        ExitLoop
    EndIf
                $infoArr[0] = "**beginning of html**"
    _ArrayAdd($infoArr, "**end of url**")
WEnd
_ArrayDisplay($infoArr, "")

Hope this helps.


[u]My UDFs[/u]Coroutine Multithreading UDF LibraryStringRegExp GuideRandom EncryptorArrayToDisplayString"The Brain, expecting disaster, fails to find the obvious solution." -- neogia

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0