Jump to content
Sign in to follow this  
martijn

StringRegExp issue

Recommended Posts

martijn

I use the following script to parse an html file:

$arr = StringRegExp($text,"(<td.*?>|<td>)",3)

and that works. But if I use

$arr = StringRegExp($text,"(<td.*?>)",3)

it does not work. It leaves out the <td> ?

Can anyone explain?

Share this post


Link to post
Share on other sites
SmOke_N

#Include <array.au3>
$arr = _SRE_Between($text, '<td', '>')
_ArrayDisplay($arr, 'Array')

Func _SRE_BetweenEX($s_String, $s_Start, $s_End, $iCase = 'i')
    If $iCase <> 'i' Then $iCase = ''
    $a_Array = StringRegExp ($s_String, '(?' & $iCase & _
            ':' & $s_Start & ')(.*?)(?' & $iCase & _
            ':' & $s_End & ')', 3)
    If @extended & IsArray($a_Array) Then Return $a_Array
    Return SetError(1, 0, 0)
EndFunc   ;==>_SRE_BetweenEX

Edit:

Forgot Code Tags

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites
martijn

Thank you for your reply, but my question was about the use of StringRegExp. I need more complex regular expressions, but to start with I used a simple script. This 'simple script' already gives me headaches.

Can someone please explain why I does not match a simple <td> in my second code sample? I thought that a ? would limit the 'greediness' and return the smallest match possible. And in combination with a * (none or more matches) it expect it to return items with no matches as well. So both <td class="..."> and <td>.

Many thanks in advance :)

Edit: I've tried your script, but that doesn't work either. Here's the code I used

#Include <array.au3>
$text = "<td remark=this one shows up><td><td remark=and this one too><td><table><td question=but the empty td's don't show>"
$arr = _SRE_BetweenEX($text, '<td', '>')
_ArrayDisplay($arr, 'Array')

Func _SRE_BetweenEX($s_String, $s_Start, $s_End, $iCase = 'i')
    If $iCase <> 'i' Then $iCase = ''
    $a_Array = StringRegExp ($s_String, '(?' & $iCase & _
            ':' & $s_Start & ')(.*?)(?' & $iCase & _
            ':' & $s_End & ')', 3)
    If @extended & IsArray($a_Array) Then Return $a_Array
    Return SetError(1, 0, 0)
EndFunc  ;==>_SRE_BetweenEX
Edited by martijn

Share this post


Link to post
Share on other sites
martijn

It should still give the dollar sign. Yes, the .*? should give 0 characters, but that part of the stringregexp function is not working properly. I am in the midst of tracking down the problem. If all else fails, I may just rewrite the repeater/predictor code.

This seems to be a bug in the StringRegExp function :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×