Sign in to follow this  
Followers 0
Lej

RegEx matching parts of HTML code

9 posts in this topic

#1 ·  Posted (edited)

Hello! I have HTML code on the following format:

<font color=#666666>@</font><font color=#a7b767>@@</font><font color=#123456>@</font><br><font color=#a11167>@@</font><font color=#3636a6>@@@</font><br>

Basically a number of <font> tags with @'s in them followed by a <br> tag and then repeating, never starting with and always ending in a <br> tag.

I want to extract data from this to an array.

In this particular case it would become

[666666, @, a7b767, @@, 123456, @, <br>, a11167, @@, 3636a6, @@@, <br>]

Having heard of RegEx before but never used it I looked it up in the help and it seems to be exactly what I need.

However I'm having problems making the pattern.

This is that I got:

'(?:(?:(?:<font color=#)([[:xdigit:]]{6})(?:>)([@]{1,})(?:</font>)){1,}(<br>)){1,}'

I don't get why this hideous pattern only catches [3636a6, @@@, <br>].

Help appreciated.

kladd.au3

Edited by Lej

Share this post


Link to post
Share on other sites



Possibly and easier approach...

#include <array.au3>
#include <string.au3>

$String = '<font color=#666666>@</font><font color=#a7b767>@@</font><font color=#123456>@</font><br><font color=#a11167>@@</font><font color=#3636a6>@@@</font><br>'


$String_Array = _StringBetween($String, '<font color=#', '>@')

_ArrayDisplay($String_Array)

8)


NEWHeader1.png

Share this post


Link to post
Share on other sites

This only picks out the color codes. I need the color codes, strings and position of line breaks ordered by when they appear. I don't see how I can't use _StringBetween to do that easily.

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

I think you're looking for something like this:

Global $String = "<font color=#666666>@</font><font color=#a7b767>@@</font><font color=#123456>@</font><br><font color=#a11167>@@</font><font color=#3636a6>@@@</font><br>"



MsgBox(0, "", StringReplace(StringRegExpReplace(StringReplace(StringRegExpReplace($String, _
        "[font color\#\/\=]", ""), "<b>", "<LINEBREAKER>"), "[<>]", ""), "LINEBREAKER", "<br>"))

It's kinda messy but it's 7:51AM and I didn't sleep.

I kinda made it in 30 seconds so sorry if it's too messy.

But hey! It strips everything but the <br>'s and @'s and colorcodes.

Enjoy.

Edited by AMp

Share this post


Link to post
Share on other sites

I guess explaining over the internet is not my forte ^^

That's close though, if the different parts were elements in an array it'd be perfect.

Your code produces:

"666666@a7b767@@123456@<br>a11167@@3636a6@@@<br>"

While this would be perfect:

["666666", "@", "a7b767", "@@", "123456", "@", "<br>", "a11167", "@@", "3636a6", "@@@", <br>]

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

I'm too tired to fix this right now (it's adding blank elements) so I'll just make it work for now

;
$sStr = "<font color=#666666>@</font><font color=#a7b767>@@</font><font color=#123456>@</font><br><font color=#a11167>@@</font><font color=#3636a6>@@@</font><br>"

$sRegExp = "(?i)([[:xdigit:]]{3,}|@{1,}|<br>)"
$aRtn = StringRegExp($sStr, $sRegExp, 3)
If NOT @Error Then
    $sRtn = ""
    For $i = 0 To Ubound($aRtn)-1
        $sRtn &= $aRtn[$i] & ","
    Next
    MsgBox(0, "Result", "[" & StringTrimRight($sRtn, 1) & ']")
EndIf
;
Edit Code fix becuase of broken code box height again

EDIT 2: Don't know where my head was at. I changed the RegExp so it works without blank array elements

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

I can find one small error, but i'm sure you can fix it. I'm really too tired for it so here's my last try;

Global $HTML = "<font color=#666666>@</font><font color=#a7b767>@@</font><font color=#123456>@</font><br><font color=#a11167>@@</font><font color=#3636a6>@@@</font><br>"



; Main
MsgBox(0, "", FormatString($HTML))



Func FormatString($String)
    Local $buffer = ""

    $NewString = StringReplace(StringRegExpReplace(StringReplace(StringRegExpReplace($String, _
            "[font color\#\/\=]", ""), "<b>", "<LINEBREAKER>"), "[<>]", ""), "LINEBREAKER", "<br>")
    $Split = StringSplit(StringReplace(StringReplace($NewString, "@", " "), "<br>", "LINEBREAKER "), " ")

    For $x = 1 To $Split[0] - 1
        If $Split[$x] = "LINEBREAKER" Then
            $buffer &= '"<br/>", '
        ElseIf $Split[$x] = "" Then
            $buffer &= '"@@", '
        Else
            $buffer &= '"' & $Split[$x] & '", '
        EndIf
    Next

    Return "[" & StringTrimRight($buffer, 2) & "]"
EndFunc   ;==>FormatString

Again sorry if it's messy, 8:35AM now. Also (for GEOSoft's code);

MsgBox(0, "Result", "[" & StringTrimRight($sRtn, 1) & ']")

Change to

MsgBox(0, "Result", "[" & StringTrimRight($sRtn, 1) & "]")

Hope either one is works good enough for you, i'm sure you can make changes in mines it's really written sadly. Goodnights >_<

Edited by AMp

Share this post


Link to post
Share on other sites

Thanks for all the help, combining/modifying some of the solutions solved my problem >_

Share this post


Link to post
Share on other sites

I guess explaining over the internet is not my forte ^^

That's close though, if the different parts were elements in an array it'd be perfect.

Your code produces:

"666666@a7b767@@123456@<br>a11167@@3636a6@@@<br>"

While this would be perfect:

["666666", "@", "a7b767", "@@", "123456", "@", "<br>", "a11167", "@@", "3636a6", "@@@", <br>]

One more.

;
#include <array.au3>

$sStr = "<font color=#666666>@</font><font color=#a7b767>@@</font><font color=#123456>@</font><br><font color=#a11167>@@</font><font color=#3636a6>@@@</font><br>"

$sRegExp = "(?i)<font color=#([[:xdigit:]]{3,})>(@+)</font>(<br>){0,1}"

$aRtn = StringRegExp($sStr, $sRegExp, 3)
_ArrayDisplay($aRtn)
;

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0