Jump to content
Sign in to follow this  
LondonNDIB

What's wrong with my regex pattern?

Recommended Posts

LondonNDIB

I have a multi-line string ($oHTML) that includes the following :

[blah blah]<!--Price and info-->
<table width="565" border="0" cellspacing="0" cellpadding="0" background="images/new04/eng/add_cart_bgd01.jpg">

<tr>
<td width="70"><img src="images/new04/eng/weight.jpg" onClick="javascript:popUp('postal_weight_info.asp')" style="cursor:pointer"></td>
<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">29g</font></td>
<td width="78"><img src="images/new04/eng/stock04.jpg" ></td>
<td width="47"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">10+</font></td>
[blah blah]

I want to isolate the "29" in there. I tried this:

$_usweight = StringRegExp( $oHTML, '(?i)(?:<!--Price and info-->)(?s:.*?)(?:<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">)(.*?)(?:g)', 1 )

But, as evidenced by my being here... it doesn't work. I don't understand why it doesn't work. I've narrowed it down to this part:

<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">
because it works if I strip that down to just "14px". I figured eventually that it had to do with the colons, but escaping them didn't solve anything. Thanks for your help. Edited by LondonNDIB

Share this post


Link to post
Share on other sites
iamtheky

provided that the section has the same tags each time

stringregexp ($string , '<font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">](.*?g)</font></td>' , 3)

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites
LondonNDIB

What's the square bracket for?

Share this post


Link to post
Share on other sites
iamtheky

its straight copied out of the first post, so that would be a question for you.

<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">]29g</font></td>

Edited by boththose

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites
LondonNDIB

Lol... right. It was left over from formatting glitches on the post. Sorry.

Share this post


Link to post
Share on other sites
LondonNDIB

Alright, that didn't work... so is there something up with the colons or not?

Share this post


Link to post
Share on other sites
LondonNDIB

Its behaving like the colon is a special character, but escaping it doesn't help. I don't get it.

Share this post


Link to post
Share on other sites
iamtheky

be more specific, what doesnt work? because this returns 29g

edit: without the errant closing bracket

#Include <Array.au3>


$string = '[blah blah]<' & _
'!--Price and info-->' & _
'<table width="565" border="0" cellspacing="0" cellpadding="0" background="images/new04/eng/add_cart_bgd01.jpg">' & _
'<tr>' & _
'<td width="70"><img src="images/new04/eng/weight.jpg" onClick="javascript:popUp(' & 'postal_weight_info.asp' & ')" style="cursor:pointer"></td>' & _
'<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">29g</font></td>' & _
'<td width="78"><img src="images/new04/eng/stock04.jpg" ></td>' & _
'<td width="47"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">10+</font></td>' & _
'[blah blah]'

;~ msgbox (0, '' , $string)

$reg = stringregexp ($string , '<font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">(.*?g)</font></td>' , 3)

_ArrayDisplay($reg)
Edited by boththose

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites
LondonNDIB

First, my string is multiline (newlines exist)... I don't know if that makes a difference but that's why I used the ?s

What "doesn't work" is I'm not getting a match. Your simplified example doesn't cover the same scope.

In english, what I want is this:

after the first instance of

<!--Price and info-->
find the first instance of
<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">
and return the follow charachters until you run into a 'g'
Why doesn't this work?
StringRegExp( $oHTML, '(?i)(?:<!--Price and info-->)(?s:.*?)(?:<td width="54"><font style="font-family:Verdana, Arial, Helvetica, sans-serif; font-size:14px">)(.*?)(?:g)', 1 )

It does if I change it to:

StringRegExp( $oHTML, '(?i)(?:<!--Price and info-->)(?s:.*?)(?:14px">)(.*?)(?:g)', 1 )
(ie. remove the colons from the equation) but then it may not return the result I actually want.

To me, that's saying:

find the first bit we're looking for, followed by any number of characters including newlines up until the next bit we're looking for, then return the next characters until we find a 'g'

Edited by LondonNDIB

Share this post


Link to post
Share on other sites
LondonNDIB

When I use a string like you did, it doesn't work.

So is there a problem with using the return from a _IEBodyReadHTML then? It works in every case I've tried except when there's a colon in the test. Bizarre.

Share this post


Link to post
Share on other sites
LondonNDIB

Please forgive me... I erroneously made the assumption that the _IEBodyReadHTML function returned the same string that one sees as right-click/view source. Its not. There are differences in case, whitespace, and even quotes.

So this wasn't a problem with RegEx at all. Sorry for wasting your time.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Similar Content

    • gruntydatsun
      By gruntydatsun
      I have an XML file and every time there are three lines in a row with only <null/> in them, i want to insert a fourth line with <null/>.   Each line starts with 3 white spaces, followed by <null/> and ends with a white space followed by CR LF.   The presence of the three lines as described is unique to the points where I want to insert a line in this document.
       I'm trying to figure out how to apply the repeating part of a regex  {1,4} but apply it to this whole segment. 
      So far I have the below which picks up an individual line ok:
      ^\s{3}<null/>\s\r\n I tried wrapping it all in braces () then adding {3} but I'm obviously getting something wrong. 
      Attached is a section from the xml file with a block of nulls that should be matched if anyone would like to have a look.
      Help_From_Forum.xml
    • milkmoron
      By milkmoron
      I am trying to search in a web browser dates XX/XX/XXXX that are also links. I want to click them after and remove them from the array. This is all I have so far. Nothing shows up. What am I doing wrong?
      ControlFocus ("Customer Center", "", "")
      Local $aArray = StringRegExp('(..)/(..)/(....)', '(..)/(..)/(....)', $STR_REGEXPARRAYFULLMATCH)
      For $i = 0 To UBound($aArray) - 1
          MsgBox($MB_SYSTEMMODAL, "RegExp Test with Option 2 - " & $i, $aArray[$i])
      Next
       
    • WoodGrain
      By WoodGrain
      Hi All,
      I'd like to replace 'COMMA' with ',' for example:
      $myString = "COMMA" StringRegExpReplace($myString, 'COMMA', ',') Now I've tried escaping the ',' in various ways unsuccessfully, such as:
      '[,]'
      "[,]"
      '\,'
      [,] seems to work in the pattern, I just can't figure out how to use it in the replace, and it seems everyone online is only interested in removing/replacing commas lol.
      I also tried creating and using a variable as the replacement but also didn't work:
      $myComma = "," $myString = "COMMA" StringRegExpReplace($myString, 'COMMA', $myComma) I'm sure it's super simple if someone could point me in the right direction - thanks.
    • rcmaehl
      By rcmaehl
      Hi all,
      I still suck at regex as always and I need some help. According to the regex tester I normally use this should be working fine but it doesn't....
      StringRegExp($sString, "\A[1-9]+[0-9]*(\-[1-9]+[0-9]*)?,*\Z") I basically want to match:
      all numbers EXCEPT 0, but including 10, 20, etc with each number separated by a comma and allowing a "-" separated range as a value For example:
      1-5,7,10-12 I've spent a couple hours modifying it but I'm not sure where I've gone wrong. Any help would be appreciated!
    • ISI360
      By ISI360
      Hi!

      I need a little bit help from some RegEx experts please:
      I would make my ISN AutoIt Studio faster when generating the scripttree. And what would be better to do this via regex?
      Problem is i am not really good at this regex stuff. So maybe someone could help me here.
       
      The challange is to get all Global Variables from a script via RegEx in a Array.
      Here is a example script with some tests:
      Global $Var1 = 1234 Local $Local_Var = 1234 $Ignore_me_too = 1234 Global $Var2 = 1234, $var3 = 1242 Global $ahIcons[30], $ahLabels[30] Global Const $Var4 = iniread($inivar1,"jj","jj","") , $var5= iniread($inivar2,"jj","jj","") Global $Var_String = "was" Global $Array_Test[16] = [1,15,16,0,31,15,25,15,25,30,8,30,8,15,1,15] Global Enum $MARGIN_SCRIPT_NUMBER = 0, $MARGIN_SCRIPT_ICON, $MARGIN_SCRIPT_FOLD Global Const $Delim = '\', $Delim1 = '|' Global $hard1 = "a", _ $hard2 = "b", _ $hard3 = "c"  
      The returning array should look like this:
      $Var1 $Var2 $var3 $Var4 $var5 $Var_String $Array_Test $MARGIN_SCRIPT_NUMBER $MARGIN_SCRIPT_ICON $MARGIN_SCRIPT_FOLD $Delim $Delim1 $hard1 $hard2 $hard3  
      I already made some success with a expression i found in the SciTE Jump Tool:  (\$\w+)(?:[\h\[.=+*/^,)\-])?
      This nearly returns the perfect results. But it does not check if it´s a global variable (with the const and enum options) and also returns variables in commands (for example $inivar1)
      I also found this regex: (?im:^(?=Global|Const|Enum|Static)(?:Global)?\h*(?:Const|Enum|Static)?(?:(?<=Enum)\h+Step\h+[+*-]\d+)?\h*)([^\r\n .\=]+)
      This returns also usefull results...but trying to understand this explodes my head

      Maybe someone can help me here?
      Thanks in advance!
×