Jump to content

Regexp Help..


Recommended Posts

Okay, so I have a website that's html I need to parse. I need to find :

1. <b>- TEXT -</b> The text between the two things (i think i already have this one down with (<b>- )(.*)( -</b>) but i dont know if multiple <b>'s will affect it .. dont think it will)

2. URLS:

<a href="mob.php?id=1723&h=3419142165fc215ce0250faa75b35b01">Bob</a>

<a href="world.php?room=2&h=837ea8cec75449736acad26628ded10f&lastroom=1"> East</a>

I'm having alot of trouble with the URL's... Cus i needa find the room=* the h=* (dif on both/all links) the id=* the lastroom=* and the link text. I tried this

<a href="mob.php\?id=(.*)">(.*)</a>

and it just gives me this :

0 => 1723&h=3419142165fc215ce0250faa75b35b01">Bob</a>
<a href="mob.php?id=6157&h=a71ec30f6b3b50ecd15be71b3ef1270e">Man in dark gray</a>
    
<a href="world.php?room=2&h=837ea8cec75449736acad26628ded10f&lastroom=1
1 =>  East

Any ideas? :/

Link to comment
Share on other sites

  • Moderators

These seem to work:

$a = '<a href="mob.php?id=1723&h=3419142165fc215ce0250faa75b35b01">Bob</a>'

$b = '<a href="world.php?room=2&h=837ea8cec75449736acad26628ded10f&lastroom=1"> East</a>'


MsgBox(0, "$a", "id: " & StringMid($a, StringInStr($a, 'id=')+3, StringInStr($a, '&')-(StringInStr($a, 'id=')+3)))

MsgBox(0, "$a", "h: " & StringMid($a, StringInStr($a, 'h=')+2, StringInStr($a, '">')-(StringInStr($a, 'h=')+2)))

MsgBox(0, "$b", "room: " & StringMid($b, StringInStr($b, 'room=')+5, StringInStr($b, '&')-(StringInStr($b, 'room=')+5)))

MsgBox(0, "$b", "h: " & StringMid($b, StringInStr($b, 'h=')+2, StringInStr($b, '&', "", 2)-(StringInStr($b, 'h=')+2)))

MsgBox(0, "$b", "lastroom: " & StringMid($b, StringInStr($b, 'lastroom=')+9, StringInStr($b, '">')-(StringInStr($b, 'lastroom=')+9)))
Link to comment
Share on other sites

  • Moderators

I use something like this for my RSS readers:

Func _StringBetweenCodeTags($s_String, $s_Start, $s_End)
    $a_Array = StringRegExp($s_String, '(?:' & $s_Start & ')(.*?)(?:' & $s_End & ')', 3)
    If @error == 0 Then Return $a_Array
    Return 0
EndFunc
I use FileRead() to get all the info originally, but you could do it a different way... It just needs a string.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

Something like this:

room=([^&]+)&h=([^&]+)&lastroom=([^"]+)">([^<]+)<

“Give a man a script; you have helped him for today. Teach a man to script; and you will not have to hear him whine for help.”AutoIt4UE - Custom AutoIt toolbar and wordfile for UltraEdit/UEStudio users.AutoIt Graphical Debugger - A graphical debugger for AutoIt.SimMetrics COM Wrapper - Calculate string similarity.

Link to comment
Share on other sites

This puts all the records into an array ("var"="value") where each element alternates either "var" or "value", and each url is separated by an element containing "**end of url**"

#include <Array.au3>

$string = '<a href="mob.php?id=1723&h=3419142165fc215ce0250faa75b35b01">Bob</a><a href="world.php?room=2&h=837ea8cec75449736acad26628ded10f&lastroom=1"> East</a>'
Dim $infoArr[1]

While StringInStr($string, "?")
    $results = StringRegExp($string, '(?:\?)(.*?)(\#)(?:=)', 1)
    If @extended == 1 Then
        _ArrayAdd($infoArr, $results[0])
        $string = StringTrimLeft($string, $results[1])
    Else
        ExitLoop
    EndIf
    While 1
        $results = StringRegExp($string, '(?:=)(.*?)(\#)(?:&)', 1)
        If @extended == 1 Then
            If StringInStr($results[0], ">") == 0 Then
                _ArrayAdd($infoArr, $results[0])
                $string = StringTrimLeft($string, $results[1])
            Else
                ExitLoop
            EndIf
        Else
            ExitLoop
        EndIf
        $results = StringRegExp($string, '(?:&)(.*?)(\#)(?:=)', 1)
        If @extended == 1 Then
            _ArrayAdd($infoArr, $results[0])
            $string = StringTrimLeft($string, $results[1])
        Else
            ExitLoop
        EndIf
    WEnd
    $results = StringRegExp($string, '(?:=)(.*?)(\#)(?:")', 1)
    If @extended == 1 Then
        _ArrayAdd($infoArr, $results[0])
        $string = StringTrimLeft($string, $results[1])
    Else
        ExitLoop
    EndIf
                $infoArr[0] = "**beginning of html**"
    _ArrayAdd($infoArr, "**end of url**")
WEnd
_ArrayDisplay($infoArr, "")

Hope this helps.

[u]My UDFs[/u]Coroutine Multithreading UDF LibraryStringRegExp GuideRandom EncryptorArrayToDisplayString"The Brain, expecting disaster, fails to find the obvious solution." -- neogia

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...