Lej Posted August 2, 2009 Share Posted August 2, 2009 (edited) Hello! I have HTML code on the following format: <font color=#666666>@</font><font color=#a7b767>@@</font><font color=#123456>@</font><br><font color=#a11167>@@</font><font color=#3636a6>@@@</font><br> Basically a number of <font> tags with @'s in them followed by a <br> tag and then repeating, never starting with and always ending in a <br> tag. I want to extract data from this to an array. In this particular case it would become [666666, @, a7b767, @@, 123456, @, <br>, a11167, @@, 3636a6, @@@, <br>] Having heard of RegEx before but never used it I looked it up in the help and it seems to be exactly what I need. However I'm having problems making the pattern. This is that I got: '(??:(?:<font color=#)([[:xdigit:]]{6})(?)([@]{1,})(?:</font>)){1,}(<br>)){1,}' I don't get why this hideous pattern only catches [3636a6, @@@, <br>]. Help appreciated. kladd.au3 Edited August 2, 2009 by Lej Link to comment Share on other sites More sharing options...
Valuater Posted August 2, 2009 Share Posted August 2, 2009 Possibly and easier approach... #include <array.au3> #include <string.au3> $String = '<font color=#666666>@</font><font color=#a7b767>@@</font><font color=#123456>@</font><br><font color=#a11167>@@</font><font color=#3636a6>@@@</font><br>' $String_Array = _StringBetween($String, '<font color=#', '>@') _ArrayDisplay($String_Array) 8) Link to comment Share on other sites More sharing options...
Lej Posted August 2, 2009 Author Share Posted August 2, 2009 This only picks out the color codes. I need the color codes, strings and position of line breaks ordered by when they appear. I don't see how I can't use _StringBetween to do that easily. Link to comment Share on other sites More sharing options...
AMp Posted August 2, 2009 Share Posted August 2, 2009 (edited) I think you're looking for something like this:Global $String = "<font color=#666666>@</font><font color=#a7b767>@@</font><font color=#123456>@</font><br><font color=#a11167>@@</font><font color=#3636a6>@@@</font><br>" MsgBox(0, "", StringReplace(StringRegExpReplace(StringReplace(StringRegExpReplace($String, _ "[font color\#\/\=]", ""), "<b>", "<LINEBREAKER>"), "[<>]", ""), "LINEBREAKER", "<br>"))It's kinda messy but it's 7:51AM and I didn't sleep.I kinda made it in 30 seconds so sorry if it's too messy.But hey! It strips everything but the <br>'s and @'s and colorcodes.Enjoy. Edited August 2, 2009 by AMp Link to comment Share on other sites More sharing options...
Lej Posted August 2, 2009 Author Share Posted August 2, 2009 I guess explaining over the internet is not my forte ^^ That's close though, if the different parts were elements in an array it'd be perfect. Your code produces: "666666@a7b767@@123456@<br>a11167@@3636a6@@@<br>" While this would be perfect: ["666666", "@", "a7b767", "@@", "123456", "@", "<br>", "a11167", "@@", "3636a6", "@@@", <br>] Link to comment Share on other sites More sharing options...
GEOSoft Posted August 2, 2009 Share Posted August 2, 2009 (edited) I'm too tired to fix this right now (it's adding blank elements) so I'll just make it work for now ; $sStr = "<font color=#666666>@</font><font color=#a7b767>@@</font><font color=#123456>@</font><br><font color=#a11167>@@</font><font color=#3636a6>@@@</font><br>" $sRegExp = "(?i)([[:xdigit:]]{3,}|@{1,}|<br>)" $aRtn = StringRegExp($sStr, $sRegExp, 3) If NOT @Error Then $sRtn = "" For $i = 0 To Ubound($aRtn)-1 $sRtn &= $aRtn[$i] & "," Next MsgBox(0, "Result", "[" & StringTrimRight($sRtn, 1) & ']") EndIf ;Edit Code fix becuase of broken code box height again EDIT 2: Don't know where my head was at. I changed the RegExp so it works without blank array elements Edited August 2, 2009 by GEOSoft George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
AMp Posted August 2, 2009 Share Posted August 2, 2009 (edited) I can find one small error, but i'm sure you can fix it. I'm really too tired for it so here's my last try; Global $HTML = "<font color=#666666>@</font><font color=#a7b767>@@</font><font color=#123456>@</font><br><font color=#a11167>@@</font><font color=#3636a6>@@@</font><br>" ; Main MsgBox(0, "", FormatString($HTML)) Func FormatString($String) Local $buffer = "" $NewString = StringReplace(StringRegExpReplace(StringReplace(StringRegExpReplace($String, _ "[font color\#\/\=]", ""), "<b>", "<LINEBREAKER>"), "[<>]", ""), "LINEBREAKER", "<br>") $Split = StringSplit(StringReplace(StringReplace($NewString, "@", " "), "<br>", "LINEBREAKER "), " ") For $x = 1 To $Split[0] - 1 If $Split[$x] = "LINEBREAKER" Then $buffer &= '"<br/>", ' ElseIf $Split[$x] = "" Then $buffer &= '"@@", ' Else $buffer &= '"' & $Split[$x] & '", ' EndIf Next Return "[" & StringTrimRight($buffer, 2) & "]" EndFunc ;==>FormatString Again sorry if it's messy, 8:35AM now. Also (for GEOSoft's code); MsgBox(0, "Result", "[" & StringTrimRight($sRtn, 1) & ']") Change to MsgBox(0, "Result", "[" & StringTrimRight($sRtn, 1) & "]") Hope either one is works good enough for you, i'm sure you can make changes in mines it's really written sadly. Goodnights >_< Edited August 2, 2009 by AMp Link to comment Share on other sites More sharing options...
Lej Posted August 2, 2009 Author Share Posted August 2, 2009 Thanks for all the help, combining/modifying some of the solutions solved my problem >_ Link to comment Share on other sites More sharing options...
Malkey Posted August 2, 2009 Share Posted August 2, 2009 I guess explaining over the internet is not my forte ^^ That's close though, if the different parts were elements in an array it'd be perfect. Your code produces: "666666@a7b767@@123456@<br>a11167@@3636a6@@@<br>" While this would be perfect: ["666666", "@", "a7b767", "@@", "123456", "@", "<br>", "a11167", "@@", "3636a6", "@@@", <br>] One more. ; #include <array.au3> $sStr = "<font color=#666666>@</font><font color=#a7b767>@@</font><font color=#123456>@</font><br><font color=#a11167>@@</font><font color=#3636a6>@@@</font><br>" $sRegExp = "(?i)<font color=#([[:xdigit:]]{3,})>(@+)</font>(<br>){0,1}" $aRtn = StringRegExp($sStr, $sRegExp, 3) _ArrayDisplay($aRtn) ; Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now