Jump to content

What's wrong with my regex pattern?


Recommended Posts

I have a multi-line string ($oHTML) that includes the following :

[blah blah]<!--Price and info-->
<table width="565" border="0" cellspacing="0" cellpadding="0" background="images/new04/eng/add_cart_bgd01.jpg">

<tr>
<td width="70"><img src="images/new04/eng/weight.jpg" onClick="javascript:popUp('postal_weight_info.asp')" style="cursor:pointer"></td>
<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">29g</font></td>
<td width="78"><img src="images/new04/eng/stock04.jpg" ></td>
<td width="47"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">10+</font></td>
[blah blah]

I want to isolate the "29" in there. I tried this:

$_usweight = StringRegExp( $oHTML, '(?i)(?:<!--Price and info-->)(?s:.*?)(?:<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">)(.*?)(?:g)', 1 )

But, as evidenced by my being here... it doesn't work. I don't understand why it doesn't work. I've narrowed it down to this part:

<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">
because it works if I strip that down to just "14px". I figured eventually that it had to do with the colons, but escaping them didn't solve anything. Thanks for your help. Edited by LondonNDIB
Link to comment
Share on other sites

provided that the section has the same tags each time

stringregexp ($string , '<font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">](.*?g)</font></td>' , 3)

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

its straight copied out of the first post, so that would be a question for you.

<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">]29g</font></td>

Edited by boththose

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

be more specific, what doesnt work? because this returns 29g

edit: without the errant closing bracket

#Include <Array.au3>


$string = '[blah blah]<' & _
'!--Price and info-->' & _
'<table width="565" border="0" cellspacing="0" cellpadding="0" background="images/new04/eng/add_cart_bgd01.jpg">' & _
'<tr>' & _
'<td width="70"><img src="images/new04/eng/weight.jpg" onClick="javascript:popUp(' & 'postal_weight_info.asp' & ')" style="cursor:pointer"></td>' & _
'<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">29g</font></td>' & _
'<td width="78"><img src="images/new04/eng/stock04.jpg" ></td>' & _
'<td width="47"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">10+</font></td>' & _
'[blah blah]'

;~ msgbox (0, '' , $string)

$reg = stringregexp ($string , '<font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">(.*?g)</font></td>' , 3)

_ArrayDisplay($reg)
Edited by boththose

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

First, my string is multiline (newlines exist)... I don't know if that makes a difference but that's why I used the ?s

What "doesn't work" is I'm not getting a match. Your simplified example doesn't cover the same scope.

In english, what I want is this:

after the first instance of

<!--Price and info-->
find the first instance of
<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">
and return the follow charachters until you run into a 'g'
Why doesn't this work?
StringRegExp( $oHTML, '(?i)(?:<!--Price and info-->)(?s:.*?)(?:<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">)(.*?)(?:g)', 1 )

It does if I change it to:

StringRegExp( $oHTML, '(?i)(?:<!--Price and info-->)(?s:.*?)(?:14px">)(.*?)(?:g)', 1 )
(ie. remove the colons from the equation) but then it may not return the result I actually want.

To me, that's saying:

find the first bit we're looking for, followed by any number of characters including newlines up until the next bit we're looking for, then return the next characters until we find a 'g'

Edited by LondonNDIB
Link to comment
Share on other sites

Please forgive me... I erroneously made the assumption that the _IEBodyReadHTML function returned the same string that one sees as right-click/view source. Its not. There are differences in case, whitespace, and even quotes.

So this wasn't a problem with RegEx at all. Sorry for wasting your time.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...