Jump to content

Tutorial: Simple regular expression multiple result handling


Recommended Posts

How often have you had some input like this:

<option value=1>Apple</option>
<option value=2>Pear</option>
<option value=3>Banana</option>
<option value=4>Orange</option>

You write a regexp to get the values and fruit names like so:

$ar = StringRegExp($in, "<option value=(.*?)>(.*?)</option>", 3)

And end up with this:

Posted Image

While you really just wanted:

Posted Image

Now you either have to adjust your code so it works with the one dimensional array, or you have to make a loop through all the elements and create a new array: But that is just everyone reinventing the same wheel all over again.

Come _ArrayCombineElements:

Func _ArrayCombineElements($arr, $num)
    Local $newArr[Ceiling(UBound($arr)/$num)][$num]
    $m = 0
    $n = 0
    For $i = 0 To UBound($arr)-1
        $newArr[$n][$m] = $arr[$i]
        $m += 1
        If $m >= $num Then
            $m = 0
            $n +=1
        EndIf
    Next
    Return $newArr
EndFunc

You have never seen a function dirtier, but you like getting dirty sometimes.

Example:

#include <Array.au3>

$in = "<option value=1>Apple</option>" & _
"<option value=2>Pear</option>" & _
"<option value=3>Banana</option>" & _
"<option value=4>Orange</option>"

$ar = StringRegExp($in, "<option value=(.*?)>(.*?)</option>", 3)

$br = _ArrayCombineElements($ar, 2)

_ArrayDisplay($br)

Func _ArrayCombineElements($arr, $num)
    Local $newArr[Ceiling(UBound($arr)/$num)][$num]
    $m = 0
    $n = 0
    For $i = 0 To UBound($arr)-1
        $newArr[$n][$m] = $arr[$i]
        $m += 1
        If $m >= $num Then
            $m = 0
            $n +=1
        EndIf
    Next
    Return $newArr
EndFunc

Script output: What you expected in the first place.

Edited by Manadar
Link to post
Share on other sites

I would have used option 4 like this:

#include<Array.au3>

$in = "<option value=1>Apple</option>" & _
        "<option value=2>Pear</option>" & _
        "<option value=3>Banana</option>" & _
        "<option value=4>Orange</option>"

$ar = StringRegExp($in, "<option value=(.*?)>(.*?)</option>", 4)

Local $aRet[UBound($ar)][UBound($ar[0]) - 1]
Local $aTemp
For $i = 0 To UBound($ar) - 1
    $aTemp = $ar[$i]

    For $n = 0 To UBound($ar[$i]) - 2
        $aRet[$i][$n] = $aTemp[$n + 1]
    Next
Next
$aTemp = 0

_ArrayDisplay($aRet)

Mat

Link to post
Share on other sites

I would have used option 4 like this:

#include<Array.au3>

$in = "<option value=1>Apple</option>" & _
        "<option value=2>Pear</option>" & _
        "<option value=3>Banana</option>" & _
        "<option value=4>Orange</option>"

$ar = StringRegExp($in, "<option value=(.*?)>(.*?)</option>", 4)

Local $aRet[UBound($ar)][UBound($ar[0]) - 1]
Local $aTemp
For $i = 0 To UBound($ar) - 1
    $aTemp = $ar[$i]

    For $n = 0 To UBound($ar[$i]) - 2
        $aRet[$i][$n] = $aTemp[$n + 1]
    Next
Next
$aTemp = 0

_ArrayDisplay($aRet)

Mat

I prefer your method. No need to say number of groups before hand.

#include<Array.au3>

$in = "<option value=1>Apple</option>" & _
        "<option value=2>Pear</option>" & _
        "<option value=3>Banana</option>" & _
        "<option value=4>Orange</option>"

$arr = _WinRegExp($in, "<option value=(.*?)>(.*?)</option>")
_ArrayDisplay($arr)

Func _WinRegExp($test, $pattern)
    $arr = StringRegExp($test, $pattern, 4)

    Local $newArr[UBound($arr)][UBound($arr[0])]
    Local $aTemp
    For $i = 0 To UBound($arr) - 1
        $aTemp = $arr[$i]

        For $n = 0 To UBound($arr[$i]) - 1
            $newArr[$i][$n] = $aTemp[$n]
        Next
    Next
    $aTemp = 0
    Return $newArr
EndFunc

Maybe I'll change the tutorial on the first page. This depends on why someone would prefer option 3 over 4.

Oh and I also added the global result... why not.

Edited by Manadar
Link to post
Share on other sites

I never understood why 4 returned a jagged array rather than a normal multidimensional array... But I think that the size of the sub-arrays are not fixed at being the same as the first.

In your example, there will always be exactly 2 matched groups, even if they are blank. What happens if you edited the expression to include the possiblity that "value" was not set? On a normal regex I would use:

"<option(?:\s+value=(.*?))?>(.*?)</option>"
Or:
"<option>(.*?)</option>|<option value=(.*?)>(.*?)</option>"

I imagine the sub arrays would be of different sizes. Then there is a problem. See Edit

Unfortunately I can't test right now... I was trying the other day to see if I could get a version of Au3Int online like haskell and ruby have so you can play with autoit in the browser... but didn't get much success. I'll have to try again as I could really use it right now.

Mat

Edit: I was wrong in my assumptions... All the arrays appear to have the same length. Furthermore, they reserve a space for matches even when they cannot be matched. So the question is... Why a jagged array in the first place?

Edited by Mat
Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...