jackyyll Posted April 11, 2006 Share Posted April 11, 2006 Okay, so I have a website that's html I need to parse. I need to find : 1. <b>- TEXT -</b> The text between the two things (i think i already have this one down with (<b>- )(.*)( -</b>) but i dont know if multiple <b>'s will affect it .. dont think it will) 2. URLS: <a href="mob.php?id=1723&h=3419142165fc215ce0250faa75b35b01">Bob</a> <a href="world.php?room=2&h=837ea8cec75449736acad26628ded10f&lastroom=1"> East</a> I'm having alot of trouble with the URL's... Cus i needa find the room=* the h=* (dif on both/all links) the id=* the lastroom=* and the link text. I tried this <a href="mob.php\?id=(.*)">(.*)</a> and it just gives me this : 0 => 1723&h=3419142165fc215ce0250faa75b35b01">Bob</a> <a href="mob.php?id=6157&h=a71ec30f6b3b50ecd15be71b3ef1270e">Man in dark gray</a> <a href="world.php?room=2&h=837ea8cec75449736acad26628ded10f&lastroom=1 1 => East Any ideas? :/ Link to comment Share on other sites More sharing options...
Moderators big_daddy Posted April 11, 2006 Moderators Share Posted April 11, 2006 These seem to work: $a = '<a href="mob.php?id=1723&h=3419142165fc215ce0250faa75b35b01">Bob</a>' $b = '<a href="world.php?room=2&h=837ea8cec75449736acad26628ded10f&lastroom=1"> East</a>' MsgBox(0, "$a", "id: " & StringMid($a, StringInStr($a, 'id=')+3, StringInStr($a, '&')-(StringInStr($a, 'id=')+3))) MsgBox(0, "$a", "h: " & StringMid($a, StringInStr($a, 'h=')+2, StringInStr($a, '">')-(StringInStr($a, 'h=')+2))) MsgBox(0, "$b", "room: " & StringMid($b, StringInStr($b, 'room=')+5, StringInStr($b, '&')-(StringInStr($b, 'room=')+5))) MsgBox(0, "$b", "h: " & StringMid($b, StringInStr($b, 'h=')+2, StringInStr($b, '&', "", 2)-(StringInStr($b, 'h=')+2))) MsgBox(0, "$b", "lastroom: " & StringMid($b, StringInStr($b, 'lastroom=')+9, StringInStr($b, '">')-(StringInStr($b, 'lastroom=')+9))) Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted April 11, 2006 Moderators Share Posted April 11, 2006 I use something like this for my RSS readers:Func _StringBetweenCodeTags($s_String, $s_Start, $s_End) $a_Array = StringRegExp($s_String, '(?:' & $s_Start & ')(.*?)(?:' & $s_End & ')', 3) If @error == 0 Then Return $a_Array Return 0 EndFuncI use FileRead() to get all the info originally, but you could do it a different way... It just needs a string. Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
Stumpii Posted April 12, 2006 Share Posted April 12, 2006 Something like this: room=([^&]+)&h=([^&]+)&lastroom=([^"]+)">([^<]+)< Give a man a script; you have helped him for today. Teach a man to script; and you will not have to hear him whine for help.AutoIt4UE - Custom AutoIt toolbar and wordfile for UltraEdit/UEStudio users.AutoIt Graphical Debugger - A graphical debugger for AutoIt.SimMetrics COM Wrapper - Calculate string similarity. Link to comment Share on other sites More sharing options...
neogia Posted April 12, 2006 Share Posted April 12, 2006 This puts all the records into an array ("var"="value") where each element alternates either "var" or "value", and each url is separated by an element containing "**end of url**" expandcollapse popup#include <Array.au3> $string = '<a href="mob.php?id=1723&h=3419142165fc215ce0250faa75b35b01">Bob</a><a href="world.php?room=2&h=837ea8cec75449736acad26628ded10f&lastroom=1"> East</a>' Dim $infoArr[1] While StringInStr($string, "?") $results = StringRegExp($string, '(?:\?)(.*?)(\#)(?:=)', 1) If @extended == 1 Then _ArrayAdd($infoArr, $results[0]) $string = StringTrimLeft($string, $results[1]) Else ExitLoop EndIf While 1 $results = StringRegExp($string, '(?:=)(.*?)(\#)(?:&)', 1) If @extended == 1 Then If StringInStr($results[0], ">") == 0 Then _ArrayAdd($infoArr, $results[0]) $string = StringTrimLeft($string, $results[1]) Else ExitLoop EndIf Else ExitLoop EndIf $results = StringRegExp($string, '(?:&)(.*?)(\#)(?:=)', 1) If @extended == 1 Then _ArrayAdd($infoArr, $results[0]) $string = StringTrimLeft($string, $results[1]) Else ExitLoop EndIf WEnd $results = StringRegExp($string, '(?:=)(.*?)(\#)(?:")', 1) If @extended == 1 Then _ArrayAdd($infoArr, $results[0]) $string = StringTrimLeft($string, $results[1]) Else ExitLoop EndIf $infoArr[0] = "**beginning of html**" _ArrayAdd($infoArr, "**end of url**") WEnd _ArrayDisplay($infoArr, "") Hope this helps. [u]My UDFs[/u]Coroutine Multithreading UDF LibraryStringRegExp GuideRandom EncryptorArrayToDisplayString"The Brain, expecting disaster, fails to find the obvious solution." -- neogia Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now