What's wrong with my regex pattern?

LondonNDIB · November 2, 2012

I have a multi-line string ($oHTML) that includes the following :

[blah blah]<!--Price and info-->
<table width="565" border="0" cellspacing="0" cellpadding="0" background="images/new04/eng/add_cart_bgd01.jpg">

<tr>
<td width="70"><img src="images/new04/eng/weight.jpg" onClick="javascript:popUp('postal_weight_info.asp')" style="cursor:pointer"></td>
<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">29g</font></td>
<td width="78"><img src="images/new04/eng/stock04.jpg" ></td>
<td width="47"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">10+</font></td>
[blah blah]

I want to isolate the "29" in there. I tried this:

$_usweight = StringRegExp( $oHTML, '(?i)(?:<!--Price and info-->)(?s:.*?)(?:<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">)(.*?)(?:g)', 1 )

But, as evidenced by my being here... it doesn't work. I don't understand why it doesn't work. I've narrowed it down to this part:

<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">

because it works if I strip that down to just "14px". I figured eventually that it had to do with the colons, but escaping them didn't solve anything. Thanks for your help. Edited November 2, 2012 by LondonNDIB

iamtheky · November 2, 2012

provided that the section has the same tags each time

stringregexp ($string , '<font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">](.*?g)</font></td>' , 3)

LondonNDIB · November 2, 2012

What's the square bracket for?

iamtheky · November 2, 2012

its straight copied out of the first post, so that would be a question for you.

<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">]29g</font></td>

Edited November 2, 2012 by boththose

LondonNDIB · November 2, 2012

Lol... right. It was left over from formatting glitches on the post. Sorry.

LondonNDIB · November 2, 2012

Alright, that didn't work... so is there something up with the colons or not?

LondonNDIB · November 2, 2012

Its behaving like the colon is a special character, but escaping it doesn't help. I don't get it.

iamtheky · November 2, 2012

be more specific, what doesnt work? because this returns 29g

edit: without the errant closing bracket

#Include <Array.au3>


$string = '[blah blah]<' & _
'!--Price and info-->' & _
'<table width="565" border="0" cellspacing="0" cellpadding="0" background="images/new04/eng/add_cart_bgd01.jpg">' & _
'<tr>' & _
'<td width="70"><img src="images/new04/eng/weight.jpg" onClick="javascript:popUp(' & 'postal_weight_info.asp' & ')" style="cursor:pointer"></td>' & _
'<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">29g</font></td>' & _
'<td width="78"><img src="images/new04/eng/stock04.jpg" ></td>' & _
'<td width="47"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">10+</font></td>' & _
'[blah blah]'

;~ msgbox (0, '' , $string)

$reg = stringregexp ($string , '<font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">(.*?g)</font></td>' , 3)

_ArrayDisplay($reg)

Edited November 2, 2012 by boththose

LondonNDIB · November 2, 2012

First, my string is multiline (newlines exist)... I don't know if that makes a difference but that's why I used the ?s

What "doesn't work" is I'm not getting a match. Your simplified example doesn't cover the same scope.

In english, what I want is this:

after the first instance of

find the first instance of
<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">
and return the follow charachters until you run into a 'g'

Why doesn't this work?

StringRegExp( $oHTML, '(?i)(?:<!--Price and info-->)(?s:.*?)(?:<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">)(.*?)(?:g)', 1 )

It does if I change it to:

StringRegExp( $oHTML, '(?i)(?:<!--Price and info-->)(?s:.*?)(?:14px">)(.*?)(?:g)', 1 )

(ie. remove the colons from the equation) but then it may not return the result I actually want.

To me, that's saying:

find the first bit we're looking for, followed by any number of characters including newlines up until the next bit we're looking for, then return the next characters until we find a 'g'

Edited November 2, 2012 by LondonNDIB

LondonNDIB · November 2, 2012

When I use a string like you did, it doesn't work.

So is there a problem with using the return from a _IEBodyReadHTML then? It works in every case I've tried except when there's a colon in the test. Bizarre.

LondonNDIB · November 2, 2012

Please forgive me... I erroneously made the assumption that the _IEBodyReadHTML function returned the same string that one sees as right-click/view source. Its not. There are differences in case, whitespace, and even quotes.

So this wasn't a problem with RegEx at all. Sorry for wasting your time.

Sign In

What's wrong with my regex pattern?

Recommended Posts

LondonNDIB

iamtheky

LondonNDIB

iamtheky

LondonNDIB

LondonNDIB

LondonNDIB

iamtheky

LondonNDIB

LondonNDIB

LondonNDIB

Create an account or sign in to comment

Create an account

Sign in

Similar Content

Flip JSON array

Get Folder Name Strating with data_* from Temp Folder

Extract hex number from string

StringRegExpSplit

How do I get the output of the matching pattern?

Browse

AutoIt Resources

Release

Beta