genius257

[Solved]StringRegExp with offset and start of string anchor

15 posts in this topic

#1 ·  Posted (edited)

So I'm having a issue with StringRegExp when using the offer parameter and using the start of string anchor if the offset is greater than 1

I just wonder if it's a bug or it is supposed to work like that?

See example below

StringRegExp("abc", "^[a-z]", 1, 1)
ConsoleWrite(@error&@CRLF);success
StringRegExp("abc", "^[a-z]", 1, 2)
ConsoleWrite(@error&@CRLF);failure

Thanks in advance

Edited by genius257

Share this post


Link to post
Share on other sites



#2 ·  Posted

They should both error, carat goes on the inside

StringRegExp("abc", "[^a-z]", 1, 1)
ConsoleWrite(@error&@CRLF);failure
StringRegExp("abc", "[^a-z]", 1, 2)
ConsoleWrite(@error&@CRLF);failure

 

StringRegExp("abc", "abc", 1, 1)
ConsoleWrite(@error&@CRLF)
StringRegExp("abc", "bc", 1, 2)
ConsoleWrite(@error&@CRLF)
StringRegExp("abc", "c", 1, 3)
ConsoleWrite(@error&@CRLF)

;errors

StringRegExp("abc", "ab", 1, 3)
ConsoleWrite(@error&@CRLF)
StringRegExp("abc", "a", 1, 2)
ConsoleWrite(@error&@CRLF)
StringRegExp("abc", "[^abc]", 1, 1)
ConsoleWrite(@error&@CRLF)
StringRegExp("abc", "[^bc]", 1, 2)
ConsoleWrite(@error&@CRLF)
StringRegExp("abc", "[^c]", 1, 3)
ConsoleWrite(@error&@CRLF)

 


,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites

#3 ·  Posted

NO.

First RegEx is to get the "a", second RegEx is to get the "b"

From the documentation:

Quote
Outside a character class, the caret matches at the start of the subject text, and also just after a non-final newline sequence if option (?m) is active. By default the newline sequence is @CRLF.
Inside a character class, a leading ^ complements the class (excludes the characters listed there).

 

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

ahh, i reversed it.  context free is tough, but thats a start point and so it gets 'abc', and then 'bc'  

 

edit, tested real quick and with dashes im getting the largest susbset, not the smallest subset of the group - running more

Edited by iamtheky

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

whered you get the quote from?  If you put that carat there you are only getting the first character, and only if it's letter, and only if its lowercase.  What is the desired end goal?

 

#include<array.au3>

$aMatch = StringRegExp("a0bc", "^[a-z]", 3, 1)
_ArrayDisplay($aMatch)

$aMatch = StringRegExp("a0bc", "^[a-z]", 3, 2)
_ArrayDisplay($aMatch)
ConsoleWrite(@error&@CRLF)

$aMatch = StringRegExp("a0bc", "^[a-z]", 3, 3)
_ArrayDisplay($aMatch)
ConsoleWrite(@error&@CRLF)

$aMatch = StringRegExp("a0bc", "^[a-z]", 3, 4)
_ArrayDisplay($aMatch)
ConsoleWrite(@error&@CRLF)

$aMatch = StringRegExp("A0bc", "^[a-z]", 3, 1)
_ArrayDisplay($aMatch)
ConsoleWrite(@error&@CRLF)

$aMatch = StringRegExp("A0bc", "^[a-z]", 3, 4)
_ArrayDisplay($aMatch)
ConsoleWrite(@error&@CRLF)

 

Edited by iamtheky

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

From the StringRegExp documentation in the Anchors table in Remarks.

I'm iterating through a string, looking for exact matches:

Global $Types[][] = [ _
    ['^("[^"]*"|''''[^'''']*'''')',"String"], _
    ["^\$[_a-zA-Z0-9]+","Variable"] _
]

$sOutput = ""

$sInput = '$var = "this is a test"'
$iOffset = 1

#include <Array.au3>

While 1
    StringRegExp($sInput, "^\s*(\S)", 1, $iOffset)
    If @error<>0 Then ExitLoop
    $iOffset = @extended

    For $i=0 To UBound($Types, 1)-1
        $a = StringRegExp($sInput, $Types[$i][0], 1, $iOffset-1)
        If @error=0 Then
            $iOffset=@extended
            $sOutput&=$Types[$i][1]&";"
            ExitLoop
        EndIf
    Next
WEnd

I do know there are better ways of doing this, I'm just wondering if it's supposed to fail when using "^" and offset greater than 1

Edited by genius257

Share this post


Link to post
Share on other sites

#8 ·  Posted

5 minutes ago, genius257 said:

I'm just wondering if it's supposed to fail when using "^" and offset greater than 1

Obviously yes !
^  matches at the start of the subject text , while offset is The string position to start the match
First position (just after ^) is offset 1, so others (offset > 1) won't match if the ^ anchor is used - and if you don't use a workaround  :)

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

Thanks @mikell.

It seems silly to me, as i see it, the offset would define where the string would be trimmed and matched, but i guess not.

guess I'll haft to sub string myself and just add the @extended to the offset... >.>

Edited by genius257

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

14 minutes ago, genius257 said:

guess I'll haft to sub string myself and just add the return to the offset...

This is the workaround indeed  :)
Using offset you force the position where to start the match, so you'll jump into troubles if you do this with the ^ anchor in the pattern

$offset = 1
$res = StringRegExp("a123b456c", "[a-z]", 1, $offset)
$offset = @extended
ConsoleWrite($res[0]&@CRLF)
$res = StringRegExp("a123b456c", "[a-z]", 1, $offset)
$offset = @extended
ConsoleWrite($res[0]&@CRLF)
$res = StringRegExp("a123b456c", "[a-z]", 1, $offset)
ConsoleWrite($res[0]&@CRLF)

 

Edited by mikell

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

14 minutes ago, mikell said:

This is the workaround indeed  :)
Using offset you force the position where to start the match, so you'll jump into troubles if you do this with the ^ anchor in the pattern

$offset = 1
$res = StringRegExp("abc", "[a-z]", 1, $offset)
ConsoleWrite($res[0]&@CRLF)
$offset += StringLen($res[0])
$res = StringRegExp("abc", "[a-z]", 1, $offset)
ConsoleWrite($res[0]&@CRLF)
$offset += StringLen($res[0])
$res = StringRegExp("abc", "[a-z]", 1, $offset)
ConsoleWrite($res[0]&@CRLF)

 

kinda.

more like:

StringRegExp(StringMid($sInput, $offset), "^[a-z]", 1)

but it works now i guess..

Edited by genius257
forgot the anchor in the pattern

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

Sorry, I edited my previous example, not sure you saw it...  much better anyway...

Edit
.... because it's easy to use in a loop  :)

Edited by mikell

Share this post


Link to post
Share on other sites

#13 ·  Posted

The main problem with your solution is that if not using the anchor, it will match anywhere in the string. This will make it useless if the purpose it to iterate though it and process every char or do something else, should the match(es) fail.

I appreciate all the help ;)

Anyway this is my result (I think my offset calculation will be wrong in some cases and should be adjusted at a later time, but it works for now :))

Global $Types[][] = [ _
    ['^("[^"]*"|''''[^'''']*'''')',"String"], _
    ["^\$[_a-zA-Z0-9]+","Variable"] _
]

$sOutput = ""

$sInput = FileRead(@ScriptFullPath)
$sInput = '$var="this is a test" &"test"'
$iOffset = 1

While 1
    StringRegExp(StringMid($sInput, $iOffset), "^\s*(\S)", 1)
    If @error<>0 Then ExitLoop
    $iOffset += @extended-1

    ConsoleWrite(StringMid($sInput, $iOffset-1)&@CRLF)

    $bMatch=False
    For $i=0 To UBound($Types, 1)-1
        $a = StringRegExp(StringMid($sInput, $iOffset-1), $Types[$i][0], 1)
        If @error=0 Then
            $iOffset+=@extended-2
            $sOutput&=$Types[$i][1]&";"
            $bMatch=True
            ExitLoop
        EndIf
    Next
    If Not $bMatch Then $sOutput&="Unknown"&";"
WEnd

MsgBox(0, "", $sOutput)

 

Share this post


Link to post
Share on other sites

#14 ·  Posted

Maybe i misunderstood something but if you use an offset in StringRegExp and want to match from the beginning of the current position then you have to use \G instead of ^:

StringRegExp("abc", "\G[a-z]", 1, 1)
ConsoleWrite(@error&@CRLF)
StringRegExp("abc", "\G[a-z]", 1, 2)
ConsoleWrite(@error&@CRLF)

 

1 person likes this

Share this post


Link to post
Share on other sites

#15 ·  Posted

2 minutes ago, AspirinJunkie said:

Maybe i misunderstood something but if you use an offset in StringRegExp and want to match from the beginning of the current position then you have to use \G instead of ^:

StringRegExp("abc", "\G[a-z]", 1, 1)
ConsoleWrite(@error&@CRLF)
StringRegExp("abc", "\G[a-z]", 1, 2)
ConsoleWrite(@error&@CRLF)

 

Ah! you are right!

Thank you ^^' Totally missed that.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now