Sign in to follow this  
Followers 0
martijn

StringRegExp issue

4 posts in this topic

I use the following script to parse an html file:

$arr = StringRegExp($text,"(<td.*?>|<td>)",3)

and that works. But if I use

$arr = StringRegExp($text,"(<td.*?>)",3)

it does not work. It leaves out the <td> ?

Can anyone explain?

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

#Include <array.au3>
$arr = _SRE_Between($text, '<td', '>')
_ArrayDisplay($arr, 'Array')

Func _SRE_BetweenEX($s_String, $s_Start, $s_End, $iCase = 'i')
    If $iCase <> 'i' Then $iCase = ''
    $a_Array = StringRegExp ($s_String, '(?' & $iCase & _
            ':' & $s_Start & ')(.*?)(?' & $iCase & _
            ':' & $s_End & ')', 3)
    If @extended & IsArray($a_Array) Then Return $a_Array
    Return SetError(1, 0, 0)
EndFunc   ;==>_SRE_BetweenEX

Edit:

Forgot Code Tags

Edited by SmOke_N

[center]Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.[/center]

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Thank you for your reply, but my question was about the use of StringRegExp. I need more complex regular expressions, but to start with I used a simple script. This 'simple script' already gives me headaches.

Can someone please explain why I does not match a simple <td> in my second code sample? I thought that a ? would limit the 'greediness' and return the smallest match possible. And in combination with a * (none or more matches) it expect it to return items with no matches as well. So both <td class="..."> and <td>.

Many thanks in advance :)

Edit: I've tried your script, but that doesn't work either. Here's the code I used

#Include <array.au3>
$text = "<td remark=this one shows up><td><td remark=and this one too><td><table><td question=but the empty td's don't show>"
$arr = _SRE_BetweenEX($text, '<td', '>')
_ArrayDisplay($arr, 'Array')

Func _SRE_BetweenEX($s_String, $s_Start, $s_End, $iCase = 'i')
    If $iCase <> 'i' Then $iCase = ''
    $a_Array = StringRegExp ($s_String, '(?' & $iCase & _
            ':' & $s_Start & ')(.*?)(?' & $iCase & _
            ':' & $s_End & ')', 3)
    If @extended & IsArray($a_Array) Then Return $a_Array
    Return SetError(1, 0, 0)
EndFunc  ;==>_SRE_BetweenEX
Edited by martijn

Share this post


Link to post
Share on other sites

It should still give the dollar sign. Yes, the .*? should give 0 characters, but that part of the stringregexp function is not working properly. I am in the midst of tracking down the problem. If all else fails, I may just rewrite the repeater/predictor code.

This seems to be a bug in the StringRegExp function :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0