Sign in to follow this  
Followers 0
Kiai

Regular expression, exclude word if near other word

28 posts in this topic

I want to make a regular expression that only matches on a word if it is not preceded by another word.

For example, I want to find the word 'up' in "I am feeling up" but not in the phrase "shut the *@## up".

It seems like this should work:

(?<!shut)(.*)(up)

or this

(?!shut)(.*)(up)

but both still match 'up'.

I managed to make this work fine for the positive matching -- to find 'up' only when preceded by 'shut' -- just need to be able to exclude 'up's preceded by shut.

\bshut\W+(?:\w+\W+){0,6}?up\b

Thanks from a regular expression novice.

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

So you want to match "this" at "I like this" but not match "this" in "This is nice" or "Don't |>o this"?

What about capturing?

Edit: corrected a wrong parse.

Edited by Authenticity

Share this post


Link to post
Share on other sites

So you want to match "this" at "I like this" but not match "this" in "This is nice" or "Don't |>o this"?

What about capturing?

Edit: corrected a wrong parse.

Not sure I understand but do you mean, do I want to capture the output? In this case I just want a positive or negative result -- want to find the word 'up' any time it is not preceded on the same line by the word 'shut'. So "I'm feeling up" matches, "Shut the hell up" wouldn't match.

Share this post


Link to post
Share on other sites

Dim $sStr1 = "I'm feeling up", $sStr2 = "Shut up", $sStr3 = "shut your mouth, I'm feeling up"
Dim $sStr4 = "Shut the hell up" & @CRLF &  "Please shut up" & @CRLF & "Oh my god, shut up" & @CRLF & _
    "I was feeling upset, but now I'm feeling good" & @CRLF & "UP shut UP shot"

Dim $sPattern = "\G(?i)(?>[^su]*(?:\bshut\b(?!)|(?>\bup\b.*)))+"
Dim $aLines

For $i = 1 To 4
    $aLines =  StringSplit(Eval('sStr' & $i), @CRLF, 1)
    If @error Then
        ConsoleWrite($aLines[$aLines[0]] & @TAB)
        If StringRegExp($aLines[$aLines[0]], $sPattern) Then
            ConsoleWrite('| Match' & @LF)
        Else
            ConsoleWrite('| Mismatch' & @LF)
        EndIf
    Else
        For $j = 1 To $aLines[0]
            ConsoleWrite($aLines[$j] & @TAB)
            If StringRegExp($aLines[$j], $sPattern) Then
                ConsoleWrite('| Match' & @LF)
            Else
                ConsoleWrite('| Mismatch' & @LF)
            EndIf
        Next
    EndIf
Next

If you can help to find situations in which the pattern is incorrect or something more efficient.

Share this post


Link to post
Share on other sites

This works great so long as there isn't a word with letters s-u prior to up. So it fails on "I am feeling super up." It seems like there should be a way to do this...

Dim $sStr1 = "I'm feeling up", $sStr2 = "Shut up", $sStr3 = "shut your mouth, I'm feeling up"
Dim $sStr4 = "Shut the hell up" & @CRLF &  "Please shut up" & @CRLF & "Oh my god, shut up" & @CRLF & _
    "I was feeling upset, but now I'm feeling good" & @CRLF & "UP shut UP shot"

Dim $sPattern = "\G(?i)(?>[^su]*(?:\bshut\b(?!)|(?>\bup\b.*)))+"
Dim $aLines

For $i = 1 To 4
    $aLines =  StringSplit(Eval('sStr' & $i), @CRLF, 1)
    If @error Then
        ConsoleWrite($aLines[$aLines[0]] & @TAB)
        If StringRegExp($aLines[$aLines[0]], $sPattern) Then
            ConsoleWrite('| Match' & @LF)
        Else
            ConsoleWrite('| Mismatch' & @LF)
        EndIf
    Else
        For $j = 1 To $aLines[0]
            ConsoleWrite($aLines[$j] & @TAB)
            If StringRegExp($aLines[$j], $sPattern) Then
                ConsoleWrite('| Match' & @LF)
            Else
                ConsoleWrite('| Mismatch' & @LF)
            EndIf
        Next
    EndIf
Next

If you can help to find situations in which the pattern is incorrect or something more efficient.

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Dim $sStr1 = "I'm feeling up", $sStr2 = "Shut up", $sStr3 = "shut your mouth, I'm feeling up"
Dim $sStr4 = 'I am feeling super up.'
Dim $sStr5 = "Shut the hell up" & @CRLF &  "Please shut up" & @CRLF & "Oh my god, shut up" & @CRLF & _
    "I was feeling upset, but now I'm feeling good" & @CRLF & "UP shut UP shot"

Dim $sPattern = "\G(?i)(?>[^su]*(?>(?>\bshut\b.*+)(?!)|(?>\bup\b.*+))+)"
Dim $aLines

For $i = 1 To 5
    $aLines =  StringSplit(Eval('sStr' & $i), @CRLF, 1)
    If @error Then
        ConsoleWrite($aLines[$aLines[0]] & @TAB)
        If StringRegExp($aLines[$aLines[0]], $sPattern) Then
            ConsoleWrite('| Match' & @LF)
        Else
            ConsoleWrite('| Mismatch' & @LF)
        EndIf
    Else
        For $j = 1 To $aLines[0]
            ConsoleWrite($aLines[$j] & @TAB)
            If StringRegExp($aLines[$j], $sPattern) Then
                ConsoleWrite('| Match' & @LF)
            Else
                ConsoleWrite('| Mismatch' & @LF)
            EndIf
        Next
    EndIf
Next

Edit: ... which know make me think if "UP shut UP shot" should be matched because it has "UP" preceded by "shut". :huh:

Edited by Authenticity

Share this post


Link to post
Share on other sites

I have a bit of a headache trying to figure out why this solution works most of the time (thanks for your help). This phrase fails: "I can't seem to fill up" ... not sure why.

I did learn a bit more about look behind -- I think what I was trying before was failing because I didn't recognize that lookbehind has to have a set length.

From http://www.autoitscript.com/autoit3/pcrepattern.html

(?<!foo)bar

does find an occurrence of "bar" that is not preceded by "foo". The contents of a lookbehind assertion are restricted such that all the strings it matches must have a fixed length.

I am essentially trying to find an occurence of 'bar' not preceded by an un-set number of characters by 'foo'. I thought this would work:

(?<!foo).*bar --- matches both 'foobar' and 'need to go to the bar'

or

(?<!foo.*)bar--- causes an error

Dim $sStr1 = "I'm feeling up", $sStr2 = "Shut up", $sStr3 = "shut your mouth, I'm feeling up"
Dim $sStr4 = 'I am feeling super up.'
Dim $sStr5 = "Shut the hell up" & @CRLF &  "Please shut up" & @CRLF & "Oh my god, shut up" & @CRLF & _
    "I was feeling upset, but now I'm feeling good" & @CRLF & "UP shut UP shot"

Dim $sPattern = "\G(?i)(?>[^su]*(?>(?>\bshut\b.*+)(?!)|(?>\bup\b.*+))+)"
Dim $aLines

For $i = 1 To 5
    $aLines =  StringSplit(Eval('sStr' & $i), @CRLF, 1)
    If @error Then
        ConsoleWrite($aLines[$aLines[0]] & @TAB)
        If StringRegExp($aLines[$aLines[0]], $sPattern) Then
            ConsoleWrite('| Match' & @LF)
        Else
            ConsoleWrite('| Mismatch' & @LF)
        EndIf
    Else
        For $j = 1 To $aLines[0]
            ConsoleWrite($aLines[$j] & @TAB)
            If StringRegExp($aLines[$j], $sPattern) Then
                ConsoleWrite('| Match' & @LF)
            Else
                ConsoleWrite('| Mismatch' & @LF)
            EndIf
        Next
    EndIf
Next

Edit: ... which know make me think if "UP shut UP shot" should be matched because it has "UP" preceded by "shut". :)

Share this post


Link to post
Share on other sites

Heh, as Jon said. Anyone who understand the heck of it is clinically insane. The only think I can think of right now that supposed to work 100% is splitting the line to words and check if there are up and/or shut in the array and whether the subscript of shut is greater than up's or if any of those exist, maybe using to flags and a simple logic loop.

Share this post


Link to post
Share on other sites

This appears to give a match with a three word separation or greater of the two key words "shut" and "up".

;
; Only matches on a word if it is not preceded by another word. 
Dim $sStr[6] = ["I'm feeling super up", "Shut the door", "Shut up", _
        " shut word up ", "Shut the hell up. ", "Shut word word word up."]

For $i = 0 To 5
    $Res = Not StringRegExp($sStr[$i], "(?i)(shut)(\W\w*){0,3}(up)", 0)
    If $Res Then
        MsgBox(0, "", $sStr[$i] & " | " & "MATCH")
    Else
        MsgBox(0, "", $sStr[$i] & " | " & "NOT a match")
    EndIf
Next
;

Share this post


Link to post
Share on other sites

try also this:

#include <Array.au3>
$string = 'I''m feeling up up shut up'&@CRLF&'up feel'&@CRLF&'hi up'&@CRLF&@LF&'shuttt'&@LF
$a = StringRegExp($string, '(?:^)?.*(\bshut\b)[^\n\r]*|(?(1)|(?:^)?.*\bup\b[^\n\r]*)', 3)

_ArrayDisplay($a)

Share this post


Link to post
Share on other sites

Heh, very nice trick Malkey. :)

Just another patch to make it independent of word limits and to match things like "Shut the soup" as well.

;
; Only matches on a word if it is not preceded by another word.
Dim $sStr[6] = ["I'm feeling super up", "Shut the door", "Shut up", _
        "shut            word word up word word ", "shut the soup.", "Shut word word word up."]

For $i = 0 To 5
    $Res = Not StringRegExp($sStr[$i], "(?i)\bshut\b(\W\w*)*?\bup\b", 0)
    If $Res Then
        ConsoleWrite($sStr[$i] & " | " & "MATCH" & @LF)
    Else
        ConsoleWrite($sStr[$i] & " | " & "NOT a match" & @LF)
    EndIf
Next

Share this post


Link to post
Share on other sites

Good pick up on "Shut the soup", Authenticity.

The word count was there to allow a match on "Shut the refrigerator door before I throw up."

Here's a modification.

;
; Only matches on a word if it is not preceded by another word.
Dim $sStr[7] = ["I'm feeling super up", "Shut the soup", "Shut up", _
        " shut word up ", "Shut the hell up. ", " Shut word word word up.", _
        "Shut the refrigerator door before I throw up."]

For $i = 0 To 6
    $Res = Not StringRegExp($sStr[$i], "(?i)(shut)(\W\w*){0,2}( up)", 0)
    If $Res Then
        ConsoleWrite($sStr[$i] & " | " & "MATCH" & @CRLF)
    Else
        ConsoleWrite($sStr[$i] & " | " & "NOT a match" & @CRLF)
    EndIf
Next
;

Clinically insane you reckon.

Share this post


Link to post
Share on other sites

You jusr helped me find a cleaner way of doing what I thought I knew how to do -- how to find 2 words hear one another. But, I was actually trying to find 'up' only if it isn't preceded by 'shut'. (Don't want to match on 'shut up' or 'shut the hell up', want to match on 'feeling up;,

Good pick up on "Shut the soup", Authenticity.

The word count was there to allow a match on "Shut the refrigerator door before I throw up."

Here's a modification.

;
; Only matches on a word if it is not preceded by another word.
Dim $sStr[7] = ["I'm feeling super up", "Shut the soup", "Shut up", _
        " shut word up ", "Shut the hell up. ", " Shut word word word up.", _
        "Shut the refrigerator door before I throw up."]

For $i = 0 To 6
    $Res = Not StringRegExp($sStr[$i], "(?i)(shut)(\W\w*){0,2}( up)", 0)
    If $Res Then
        ConsoleWrite($sStr[$i] & " | " & "MATCH" & @CRLF)
    Else
        ConsoleWrite($sStr[$i] & " | " & "NOT a match" & @CRLF)
    EndIf
Next
;

Clinically insane you reckon.

Share this post


Link to post
Share on other sites

You jusr helped me find a cleaner way of doing what I thought I knew how to do -- how to find 2 words hear one another. But, I was actually trying to find 'up' only if it isn't preceded by 'shut'. (Don't want to match on 'shut up' or 'shut the hell up', want to match on 'feeling up;,

Hi,

If you don't have the guts/time to read the whole body of this post, then you definitely should read the very bottom of it: there's something for you (and for other as well.)

I've found your problem an interesting one and I've devoted some time working on it. It seems it didn't receive a correct answer up to now.

Of course it can be objected that the solution below is possibly under-optimal from a strict computational point of view, but I nonetheless believe it answers the problem with only one regex and little glue code around, which was implicitely the challenge offered. I also generally warn against routine use of "clever" regexes (or whatever tricky code) without heavy commenting, for ease of maintainance considerations. People maintaining "clever" Perl code know what I'm talking about :)

Local $wordleft = "shut"
Local $wordright = "up"

Local $sStr[18] = ["I'm feeling super up", "Shut the soup", "Shut up", _
        " shut word up ", "Shut the hell up. ", " Shut word word word up.", _
        "Shut the refrigerator door before I throw up.", "He is upstairs.", _
        "She was shut upstairs.", "He is working upstairs.", "upon", "up", _
        "uP", "Up", " up shut UP", "shutup", "shutdown up", "up shut" ]
        
Local $leftw = _StringReverse($wordleft & ' '), $rightw = _StringReverse($wordright), $magic = "@"
Local $badstr = $rightw & $leftw

For $i = 0 To UBound($sStr) - 1
    If StringRegExpReplace(_StringReverse($sStr[$i]) & $magic & $leftw, "(?i)(?:.*?)(?:\b"&$badstr&"\b|(?:\b"&$rightw&"\b)(?:.*?)(.)(?:"&$leftw&"\b))(?:.*)", "$1") = $magic Then
        ConsoleWrite($sStr[$i] & " <<== " & "MATCH " & @LF)
    Else
        ConsoleWrite($sStr[$i] & " <--- " & "NOT a match " & @LF)
    EndIf
Next

Let's look at the basic idea. Your problem boils down to matching this:

INITIAL  <unkown1> <optional taboo> <unknown2> <mandatory word> <unkown3>

Regexes aren't extremely good at matching <something optional> surrounded by <something unkown> and then doing some further matching as well in the same run. PCRE (the implementation inside AutoIt) is no exception. The problem here is that you have to match <optional LEFT taboo word> if ever it's here. From here on, the rest of the expression is straightforward. I read that you have tried anchoring on the mandatory word and then tried to backpedal looking for a lookbehind match with the optional word. Unfortunately this doesn't work with non fixed strings. Then we have to take the problem from another point of view (if we still insist on using only one regex).

The trick is to first transform the problem in order to look for a match for the mandatory part before looking for the optional one. It can be done simply by reversing all strings. This way, if the mandatory match fails, it does first and we're done. But if the mandatory match succeeds then we have some possibilities to examine the rest of the input string with common regexes operators, looking for the optional "taboo" word. From now on, all strings mentionned are reversed on a character basis.

VERSION 1   <unkown3> <mandatory word> <unknown2> <optional taboo> <unkown1>

Here I've choosen to _StringRegExpReplace the whole string and leave only a result that can be compared to a fixed string. To do so, I've found easy to append the "taboo" word (reversed) so we are certain to always find it. Since it must be a word, be put an extra space at the right place.

VERSION 2   <unkown3> <mandatory word> <unknown2> <optional taboo> <unkown1> <space> <dummy taboo>

We now have to devise a way to discriminate between an occurence of "taboo" in the actual input string and the occurence we appended to it. This can be done by inserting a single character ($magic) that we kown is never found in the input string. Here, I've choosen @ but you're free to pick anything else (even most control chars) as long as this character will never be in any practical input string nor interfere with the matching process.

VERSION 3   <unkown3>  <mandatory word>   <unknown2> <optional taboo> <unkown1> <magic> <space> <dummy taboo>
                 (?:.*?)  (?:\b"&$rightw&"\b) (?:.*?) ^--------------------------------^--(.)(?:"&$leftw&"\b))(?:.*)

Now we can try to match each group of this puzzle and capture only the significant part, _StringRegExpReplace will help us there.

It turns out that a single group (containing a single character) tells us if the input string has the form we want, so let's make it the only capturing group (.) while all other are non-capturing (?: ).

Now either there is no match in the initial string for the "taboo" word and the (.) matches and captures "magic" or there is a previous significant occurence of the "taboo" word and then the (.) will capture the last character of <unknown2>. This is represented by the carets positions in the previous figure. The final (?:.*) group eats up what's left of the string so that this part doesn't show in the result.

There's is a finesse with this approach that must be defended against. Any input string where <unknown 2> = <word separator>, i.e. containing the sequence:

<optional taboo> <space> <mandatory word>,

would give a match, which is a blatant violation of the rules of this game. That's because there's no room for any character to be captured in place of "magic": we already have got too far.

A simple outer alternation takes care of this for us. Remove the outer "badstr" match alternation if your application can never produce that particular kind of string.

I hope all this makes some sense.

-------------------------

I would like to somehow hijack the end of this post to give two extremely useful references to anyone having to deal with regexes:

Mastering Regular Expressions

Jeffrey E.F. Friedl

O'Reilly & Associates

ISBN 1-1565922573

This book is the regex bible and a pleasant reading but you must have some motivation to dig seriously into this.

Last, maybe the most useful Windows tool ever for PCRE regexes for anyone -- let it novice or confirmed: Regex Coach by Dr. Edmund Weitz.

It can be freely downloaded here and at a large number of other places. Just get sure you get the latest version.

This free tool is a real-time CL-PCRE test box but it is also utterly useful in letting you debug your regex, possibly down to step by step operation! Anyone using regexes will learn a lot just by watching simple or more complex regexes while they work. It will also show why some seamingly simple regexes are slow like hell under NFA engines like PCRE and pushes forward the user to discover simple optimizations. As a sidenote, code geeks will certainly appreciate that it has been written without changing a single line of code of the CL-PCRE library, which I find completely amazing!

Given the number of recurent questions about regexes in this forum I really hope the reference to Regex Coach could be made sticky some day. Don't ask, I have no affiliation with its author.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#15 ·  Posted (edited)

lol, thanks Kiai, your exercise help me understand it even more:

Dim $aStr[11] = _
    ['shut the hell up', 'up hell the shut up', 'shutup', 'please, shut the window', _
    'shut the soup', "I'm feeling up", "I'm feeling up so shut up", "I'm feeling shut so up shut", _
    'shut word word shut up', 'shuton on upload', 'shut the upload']
    
Dim $sPatt = '^(?i)(?>.*?\b(?:(up)|(shut)\b))(?(2).*?\bup\b|(*FAIL))'

For $i = 0 To 10
    If Not StringRegExp($aStr[$i], $sPatt) Then
        ConsoleWrite('+' & $aStr[$i] & @LF)
    Else
        ConsoleWrite('!' & $aStr[$i] & @LF)
    EndIf
Next

Assuming "up" coming before any "shut", even "shut up" is valid. If not, then anchoring is not the way to go, or something else.

Edited by Authenticity

Share this post


Link to post
Share on other sites

Assuming "up" coming before any "shut", even "shut up" is valid. If not, then anchoring is not the way to go, or something else.

Why not, but then you're answering a different question asked by yourself, at least as I understand it.

want to find the word 'up' any time it is not preceded on the same line by the word 'shut'.

[Double emphasis mine]

The mere fact that the OP wanted to find what I would call a conditional look behind made the original problem much more challenging.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#17 ·  Posted (edited)

Yeah, when you rephrase it this way then it's sufficient to seek everything in reverse order. Then you'll just need to search for anchoring "\bpu\b" and lock it and then see if you can match "\btuhs\b". Hope you understand what I mean.

#include <String.au3>

Dim $aStr[11] = _
    ['shut the hell up', 'up hell the shut up', 'shutup', 'please, shut the window', _
    'shut the soup', "I'm feeling up", "I'm feeling up so shut up", "I'm feeling shut so up shut", _
    'shut word word shut up', 'shuton on upload', 'shut the upload']
   
Dim $sPatt = '^(?i)(?>.*?\b(?:(up)|(shut))\b)(?(2)(?>.*?\bup\b)(*FAIL)|(*ACCEPT))'

For $i = 0 To 10
    If StringRegExp($aStr[$i], $sPatt) Then
        ConsoleWrite('+' & $aStr[$i] & @LF)
    Else
        ConsoleWrite('!' & $aStr[$i] & @LF)
    EndIf
Next

Without _StringReverse(). Tell me if I understand correctly.

Edited by Authenticity

Share this post


Link to post
Share on other sites

Yeah, when you rephrase it this way then it's sufficient to seek everything in reverse order. Then you'll just need to search for anchoring "\bpu\b" and lock it and then see if you can match "\btuhs\b". Hope you understand what I mean.

Yep, I think that's correct.

Without _StringReverse(). Tell me if I understand correctly.

I see. I didn't succeed in having numbered or named conditionals working, but it must be due to me using an old version last time I tried (really some time ago).

Now I never used any backtracking verb, e.g. any (*xxx). They are new experimental perlisms I've not really looked at.

I've still to install yesterday's AutoIt beta with it's PCRE 7.9 support (and modify a large number of InetGet* in my production code, which should keep me busy for "some" time).

As it is, your "forward" code fails to detect as "NO MATCH" the instances in bold below:

!shut the hell up

+up hell the shut up

!shutup

!please, shut the window

!shut the soup

+I'm feeling up

+I'm feeling up so shut up

!I'm feeling shut so up shut

!shut word word shut up

!shuton on upload

!shut the upload

It can most probably be made to work this way (without any string reversal) using the most advanced functions now available. I didn't look at it closely but it seems like the first match of the positive lookahead (up) causes the conditional to *ACCEPT blindly, so that any string like "<unknown1><up><unknown2>" is a match without the pattern ever looking at what's inside <unknown2>, even if it contains the taboo sequence "<shut><whatever><up>. It could be solved by recursing the process inside <unknown2> or using other simpler means, but the new functions are a bit ... new (what else?) to me.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#19 ·  Posted (edited)

Those appear to be a match if I understood what the OP is asking for. It seems like "up shut up" should be matched. If the OP can post a few lines or sentences that the regexp should and shouldn't match it'll be much more comfortable than guessing what the regexp should or otherwise not should match.

Edit:

It can most probably be made to work this way (without any string reversal) using the most advanced functions now available. I didn't look at it closely but it seems like the first match of the positive lookahead (up) causes the conditional to *ACCEPT blindly, so that any string like "<unknown1><up><unknown2>" is a match without the pattern ever looking at what's inside <unknown2>, even if it contains the taboo sequence "<shut><whatever><up>. It could be solved by recursing the process inside <unknown2> or using other simpler means, but the new functions are a bit ... new (what else?) to me.

You should look again to see why it's captures and then used in a conditional structure. If you can match "up" then it doesn't matter what come after it so it's a match. If it matches "shut" first then it can just fail because whether it's followed by an "up" or not it's a failure.

You could change the second part of the regexp pattern to (?(2)(*FAIL)|(*ACCEPT)).

Edited by Authenticity

Share this post


Link to post
Share on other sites

#include <String.au3>
#include<array.au3>

Dim $aStr[11] = _
    ['shut the hell up', 'up hell the shut up', 'shutup', 'please, shut the window', _
    'shut the soup', "I'm feeling up", "I'm feeling up so shut up", "I'm feeling shut so up shut", _
    'shut word word shut up', 'shuton on upload', 'shut the upload']

$str=_ArrayToString($aStr," ")

$ss=StringSplit($str," ")
for $c=1 to $ss[0]
    if $ss[$c]="up" and $ss[$c-1]="shut" then MsgBox(0,"","Match "&$ss[$c]&" "&$ss[$c-1],0)

Next

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0