Jump to content

Recommended Posts

Posted (edited)

I have a multi-line string ($oHTML) that includes the following :

[blah blah]<!--Price and info-->
<table width="565" border="0" cellspacing="0" cellpadding="0" background="images/new04/eng/add_cart_bgd01.jpg">

<tr>
<td width="70"><img src="images/new04/eng/weight.jpg" onClick="javascript:popUp('postal_weight_info.asp')" style="cursor:pointer"></td>
<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">29g</font></td>
<td width="78"><img src="images/new04/eng/stock04.jpg" ></td>
<td width="47"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">10+</font></td>
[blah blah]

I want to isolate the "29" in there. I tried this:

$_usweight = StringRegExp( $oHTML, '(?i)(?:<!--Price and info-->)(?s:.*?)(?:<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">)(.*?)(?:g)', 1 )

But, as evidenced by my being here... it doesn't work. I don't understand why it doesn't work. I've narrowed it down to this part:

<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">
because it works if I strip that down to just "14px". I figured eventually that it had to do with the colons, but escaping them didn't solve anything. Thanks for your help. Edited by LondonNDIB
Posted

provided that the section has the same tags each time

stringregexp ($string , '<font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">](.*?g)</font></td>' , 3)

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Posted (edited)

its straight copied out of the first post, so that would be a question for you.

<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">]29g</font></td>

Edited by boththose

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Posted (edited)

be more specific, what doesnt work? because this returns 29g

edit: without the errant closing bracket

#Include <Array.au3>


$string = '[blah blah]<' & _
'!--Price and info-->' & _
'<table width="565" border="0" cellspacing="0" cellpadding="0" background="images/new04/eng/add_cart_bgd01.jpg">' & _
'<tr>' & _
'<td width="70"><img src="images/new04/eng/weight.jpg" onClick="javascript:popUp(' & 'postal_weight_info.asp' & ')" style="cursor:pointer"></td>' & _
'<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">29g</font></td>' & _
'<td width="78"><img src="images/new04/eng/stock04.jpg" ></td>' & _
'<td width="47"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">10+</font></td>' & _
'[blah blah]'

;~ msgbox (0, '' , $string)

$reg = stringregexp ($string , '<font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">(.*?g)</font></td>' , 3)

_ArrayDisplay($reg)
Edited by boththose

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Posted (edited)

First, my string is multiline (newlines exist)... I don't know if that makes a difference but that's why I used the ?s

What "doesn't work" is I'm not getting a match. Your simplified example doesn't cover the same scope.

In english, what I want is this:

after the first instance of

<!--Price and info-->
find the first instance of
<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">
and return the follow charachters until you run into a 'g'
Why doesn't this work?
StringRegExp( $oHTML, '(?i)(?:<!--Price and info-->)(?s:.*?)(?:<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">)(.*?)(?:g)', 1 )

It does if I change it to:

StringRegExp( $oHTML, '(?i)(?:<!--Price and info-->)(?s:.*?)(?:14px">)(.*?)(?:g)', 1 )
(ie. remove the colons from the equation) but then it may not return the result I actually want.

To me, that's saying:

find the first bit we're looking for, followed by any number of characters including newlines up until the next bit we're looking for, then return the next characters until we find a 'g'

Edited by LondonNDIB
Posted

When I use a string like you did, it doesn't work.

So is there a problem with using the return from a _IEBodyReadHTML then? It works in every case I've tried except when there's a colon in the test. Bizarre.

Posted

Please forgive me... I erroneously made the assumption that the _IEBodyReadHTML function returned the same string that one sees as right-click/view source. Its not. There are differences in case, whitespace, and even quotes.

So this wasn't a problem with RegEx at all. Sorry for wasting your time.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...