Jump to content

StringRegexp headscratching


benners
 Share

Recommended Posts

I have written a small function that I will use to check a string for the existance of a word or words. The function has a flag, $i_AllWords that can be set to 1 to check that the string contains all the words. When I run it with this flag set to 1 and the $i_Return as 0 I get the return I am expecting, that there has been a match. When I change the $i_Return to show an array of the matches, an empty array is returned. There is no @error produced within the function. I must be missing or misunderstanding something but I can't figure out what.

Can anyone enlighten me?.

#include <array.au3>

$str = 'http://www.google.com'
$s_Search = '.com|WWW|google'
$v_Ret = _String_SearchForWords($str, $s_Search, 1)

If IsArray($v_Ret) Then
    _ArrayDisplay($v_Ret)
Else
    MsgBox(0, 'Positive Match', $v_Ret)
EndIf

; #FUNCTION# ====================================================================================================================
; Name ..........: _String_SearchForWords
; Description ...: Check a string for the existance of a word or words
; Syntax ........: _String_SearchForWords($s_Test, $s_Search[, $i_AllWords = 0[, $i_Return = 0]])
; Parameters ....: $s_Test     - A string value. The string to check for the words
;                  $s_Search   - A string value. The regular expression to match.
;                  $i_AllWords - [optional] An integer value. Default is 0.
;                                0 - anywords can be found
;                                1 - All the words must be found
;                  $i_Return     - [optional] An integer value. Default is 0.
;                                0 - Returns 1 (match) or 0 (no match)
;                                1 - Return array of matches.
;                                2 - Return array of matches including the full match
;                                3 - Return array of global matches
;                                4 - Return an array of arrays containing global matches including the full match
; Return values .: Returns either an array of a true zflase depending on $i_Return value
; Author ........: Benners
; Modified ......:
; Remarks .......:
; Related .......:
; Link ..........:
; Example .......: $v_Ret = _String_SearchForWords($str, $s_Search, 1)
; ===============================================================================================================================
Func _String_SearchForWords($s_Test, $s_Search, $i_AllWords = 0, $i_Return = 0)
    ; search for any of the words and any matches with case sense
    Local $s_Pattern = '(' & $s_Search & ')'

    ; change the pattern to search for all the words and give an exact match without case sense
    If $i_AllWords Then $s_Pattern = '(?i)^(?=.*\b' & StringReplace($s_Search, '|', '\b)(?=.*\b') & '\b)'

    Local $v_Ret = StringRegExp($s_Test, $s_Pattern, $i_Return)
    if @error then ConsoleWrite('Error: ' & @error & @crlf)
    Return $v_Ret
EndFunc   ;==>_String_SearchForWords

 

Link to comment
Share on other sites

You forgot the capturing group. Try with that regex :

If $i_AllWords Then $s_Pattern = "(?i)^" & StringRegExpReplace($s_Search, "([^|]+)(?:\||$)", "(?=.*\\b(\\Q$1\\E)\\b)")

Edit : I added \Q \E to allow the use of regex-reserved chars (like . \ [ ] ( ) .....)

 

Edited by jguinch
Link to comment
Share on other sites

Thanks jguinch, it took me ages to work out that little bit of regexp. No matter how often I try I never fully understand the syntax. I did originally have the \Q and \E options in, I used _StringBetween as a base for starting which has them,  but removed them as for my purpose it didn't make any difference.

Now  it's working, I finished it off with a case option.

#include <array.au3>

$s_Test = 'WWW.google.com'
$s_Search = '.com|google|www'
$v_Ret = _String_SearchForWords($s_Test, $s_Search, 0, True, 3)

If @error Then
    MsgBox(0, 'Error Occured', 'error : ' & @error & @CRLF & 'extended: ' & @extended & @CRLF & 'Return : ' & $v_Ret)
Else
    If IsArray($v_Ret) Then
        _ArrayDisplay($v_Ret)
    Else
        MsgBox(0, 'Positive Match', $v_Ret)
    EndIf
EndIf

; #FUNCTION# ====================================================================================================================
; Name ..........: _String_SearchForWords
; Description ...: Check a string for the existance of a word or words
; Syntax ........: _String_SearchForWords($s_Test, $s_Search[, $i_AllWords = 0[, $b_Case = False[, $i_Return = 0]]])
; Parameters ....: $s_Test     - A string value. The string to check for the words
;                  $s_Search   - A string value. The regular expression to match.
;                  $i_AllWords - [optional] An integer value. Default is 0.
;                                  0 - any words can be found
;                                  1 - All the words must be found
;                  $b_Case     - [optional] A boolean value. Default is False.
;                                  False - search case insensitive
;                                  True  - search matching case sense
;                  $i_Return   - [optional] An integer value. Default is 0.
;                                  0 - Returns 1 (match) or 0 (no match)
;                                  1 - Return array of matches.
;                                  2 - Return array of matches including the full match
;                                  3 - Return array of global matches
;                                  4 - Return an array of arrays containing global matches including the full match
; Return values .: Returns either an array of a true zflase depending on $i_Return value
; Author ........: Benners
; Modified ......:
; Remarks .......:
; Related .......:
; Link ..........:
; Example .......: $v_Ret = _String_SearchForWords($str, $s_Search, 1)
; ===============================================================================================================================
Func _String_SearchForWords($s_Test, $s_Search, $i_AllWords = 0, $b_Case = False, $i_Return = 0)
    ; decide if case sensitive searching
    Local $s_Case = $b_Case ? "(?s)" : "(?is)"

    ; search for any of the words and any matches
    Local $s_Pattern = '(' & $s_Search & ')'

    ; change the pattern to search for all the words and give an exact match
    If $i_AllWords Then $s_Pattern = "^" & StringRegExpReplace($s_Search, "([^|]+)(?:\||$)", "(?=.*\\b(\\Q$1\\E)\\b)")

    Local $v_Ret = StringRegExp($s_Test, $s_Case & $s_Pattern, $i_Return)

    If @error Then
        Local $s_Return = ''

        Switch $i_Return
            Case 0
                $s_Return = 'Bad pattern. @extended = offset of error in pattern'
            Case 1, 2
                Switch @error
                    Case 0
                        $s_Return = 'Array is valid. Check @extended for next offset'
                    Case 1
                        $s_Return = 'Array is invalid. No matches.'
                    Case 2
                        $s_Return = 'Bad pattern, array is invalid. @extended = offset of error in pattern.'
                EndSwitch
            Case 3, 4
                Switch @error
                    Case 0
                        $s_Return = 'Array is valid'
                    Case 1
                        $s_Return = 'Array is invalid. No matches.'
                    Case 2
                        $s_Return = 'Bad pattern, array is invalid. @extended = offset of error in pattern.'
                EndSwitch
        EndSwitch

        Return SetError(@error, @extended, $s_Return)
    EndIf

    Return $v_Ret
EndFunc   ;==>_String_SearchForWords

 

Edited by benners
as mikell asked, is it better to have the \Q \E options or are they indeed redundant?.
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...