LondonNDIB Posted November 2, 2012 Posted November 2, 2012 (edited) I have a multi-line string ($oHTML) that includes the following :[blah blah]<!--Price and info--> <table width="565" border="0" cellspacing="0" cellpadding="0" background="images/new04/eng/add_cart_bgd01.jpg"> <tr> <td width="70"><img src="images/new04/eng/weight.jpg" onClick="javascript:popUp('postal_weight_info.asp')" style="cursor:pointer"></td> <td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">29g</font></td> <td width="78"><img src="images/new04/eng/stock04.jpg" ></td> <td width="47"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">10+</font></td> [blah blah] I want to isolate the "29" in there. I tried this: $_usweight = StringRegExp( $oHTML, '(?i)(?:<!--Price and info-->)(?s:.*?)(?:<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">)(.*?)(?:g)', 1 ) But, as evidenced by my being here... it doesn't work. I don't understand why it doesn't work. I've narrowed it down to this part: <td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px"> because it works if I strip that down to just "14px". I figured eventually that it had to do with the colons, but escaping them didn't solve anything. Thanks for your help. Edited November 2, 2012 by LondonNDIB
iamtheky Posted November 2, 2012 Posted November 2, 2012 provided that the section has the same tags each time stringregexp ($string , '<font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">](.*?g)</font></td>' , 3) ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__)
iamtheky Posted November 2, 2012 Posted November 2, 2012 (edited) its straight copied out of the first post, so that would be a question for you.<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">]29g</font></td> Edited November 2, 2012 by boththose ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__)
LondonNDIB Posted November 2, 2012 Author Posted November 2, 2012 Lol... right. It was left over from formatting glitches on the post. Sorry.
LondonNDIB Posted November 2, 2012 Author Posted November 2, 2012 Alright, that didn't work... so is there something up with the colons or not?
LondonNDIB Posted November 2, 2012 Author Posted November 2, 2012 Its behaving like the colon is a special character, but escaping it doesn't help. I don't get it.
iamtheky Posted November 2, 2012 Posted November 2, 2012 (edited) be more specific, what doesnt work? because this returns 29g edit: without the errant closing bracket #Include <Array.au3> $string = '[blah blah]<' & _ '!--Price and info-->' & _ '<table width="565" border="0" cellspacing="0" cellpadding="0" background="images/new04/eng/add_cart_bgd01.jpg">' & _ '<tr>' & _ '<td width="70"><img src="images/new04/eng/weight.jpg" onClick="javascript:popUp(' & 'postal_weight_info.asp' & ')" style="cursor:pointer"></td>' & _ '<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">29g</font></td>' & _ '<td width="78"><img src="images/new04/eng/stock04.jpg" ></td>' & _ '<td width="47"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">10+</font></td>' & _ '[blah blah]' ;~ msgbox (0, '' , $string) $reg = stringregexp ($string , '<font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">(.*?g)</font></td>' , 3) _ArrayDisplay($reg) Edited November 2, 2012 by boththose ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__)
LondonNDIB Posted November 2, 2012 Author Posted November 2, 2012 (edited) First, my string is multiline (newlines exist)... I don't know if that makes a difference but that's why I used the ?s What "doesn't work" is I'm not getting a match. Your simplified example doesn't cover the same scope. In english, what I want is this:after the first instance of <!--Price and info-->find the first instance of<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">and return the follow charachters until you run into a 'g' Why doesn't this work?StringRegExp( $oHTML, '(?i)(?:<!--Price and info-->)(?s:.*?)(?:<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">)(.*?)(?:g)', 1 ) It does if I change it to:StringRegExp( $oHTML, '(?i)(?:<!--Price and info-->)(?s:.*?)(?:14px">)(.*?)(?:g)', 1 ) (ie. remove the colons from the equation) but then it may not return the result I actually want. To me, that's saying:find the first bit we're looking for, followed by any number of characters including newlines up until the next bit we're looking for, then return the next characters until we find a 'g' Edited November 2, 2012 by LondonNDIB
LondonNDIB Posted November 2, 2012 Author Posted November 2, 2012 When I use a string like you did, it doesn't work. So is there a problem with using the return from a _IEBodyReadHTML then? It works in every case I've tried except when there's a colon in the test. Bizarre.
LondonNDIB Posted November 2, 2012 Author Posted November 2, 2012 Please forgive me... I erroneously made the assumption that the _IEBodyReadHTML function returned the same string that one sees as right-click/view source. Its not. There are differences in case, whitespace, and even quotes. So this wasn't a problem with RegEx at all. Sorry for wasting your time.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now