Jump to content
Sign in to follow this  
jchd

RegExps and offest

Recommended Posts

jchd

In StringRegExp the fourth parameter is the offset to consider in the test string. This works well as far as you don't need to anchor the pattern to the offset part:

ConsoleWrite(StringRegExp("abcdef", "c", 0, 3) & @LF)
ConsoleWrite(StringRegExp("abcdef", "^c", 0, 3) & @LF)
ConsoleWrite(StringRegExp("abcdef", "\Ac", 0, 3) & @LF)

I would expect ^ or \A to refer to the offset, not the first character of test string.

Of course it's possible to work around but I believe the current behavior essentially defeats the purpose of offset.

What do others think?


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
Fulano

It's not as clear, but if you absolutely have to anchor it to the offset you might try this:

StringRegExp (StringTrimLeft ($string, $offset), $pattern, $flag)

#fgpkerw4kcmnq2mns1ax7ilndopen (Q, $0); while ($l = <Q>){if ($l =~ m/^#.*/){$l =~ tr/a-z1-9#/Huh, Junketeer's Alternate Pro Ace /; print $l;}}close (Q);[code] tag ninja!

Share this post


Link to post
Share on other sites
Melba23

jchd,

Reading the Help file:

offset [optional] The string position to start the match

I have always assumed, as you obviously did, that it would work as if the early part of the string did not exist. And when I have used the offset parameter to find multiple locations of a value within the original string, it works as advertised - like this:

#include <Array.au3>

Global $aArray[3]

$sString = "Bla bla bla"

$iOffset = 1

For $i = 0 To 2
    $aTemp = StringRegExp($sString, "a", 1, $iOffset)
    $iOffset = @extended
    $aArray[$i] = $iOffset - 1
Next

_ArrayDisplay($aArray)

So I can only imagine that positional "metadata" like \A and ^ override the offset. Whether this is due to the original PCRE engine or AutoIt's implementation I leave to the Devs - but my money would be on the former. :mellow:

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
jchd

Hi Melba,

I don't see how PCRE by itself would impose an offset behavior since there is no such thing in the PCRE interface, AFAIK. I believe it's due to AutoIt implementation.

@Fulano: yes, obviously, or rather prefix the pattern with (?i)\A.{<offset - 1>} to avoid replicating a possibly long string each time, but again that essentially violates the "natural" use of offset.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
Fulano

::shrugs:: the 'natural' use of broken code is to work around it :mellow:

I'm actually not sure if the overhead involved in storing the ignored stuff in (?i)\A.{<offset - 1>} would be worse than the function overhead involved in cutting the string apart...

Edited by Fulano

#fgpkerw4kcmnq2mns1ax7ilndopen (Q, $0); while ($l = <Q>){if ($l =~ m/^#.*/){$l =~ tr/a-z1-9#/Huh, Junketeer's Alternate Pro Ace /; print $l;}}close (Q);[code] tag ninja!

Share this post


Link to post
Share on other sites
jchd

The pattern does store nothing, just skip characters.

Of course I'm running pretty loops around it but I see that as an implementation overlook.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
Fulano

Good point :mellow:


#fgpkerw4kcmnq2mns1ax7ilndopen (Q, $0); while ($l = <Q>){if ($l =~ m/^#.*/){$l =~ tr/a-z1-9#/Huh, Junketeer's Alternate Pro Ace /; print $l;}}close (Q);[code] tag ninja!

Share this post


Link to post
Share on other sites
jchd

OK I admit I'm a dumbass. <-- I don't know what that word exactly means, but it sounds pejorative enough to me.

PCRE indeed specifies an offset possibility, like the 4th parameter in StringRegExp.

In its great wisdom, PCRE also specifies that \G can be used to anchor the match to the tail of what was previously matched. In our my case, \G anchors the rest of the pattern right after offset part of the string.

For instance:

ConsoleWrite(StringRegExp("abcdef", "(?m)\Gc", 0, 3) & @LF)

gives True (1).

The PCRE docs makes it clear, in fact I just had to read it!

The \G assertion is true only when the current matching position is at the start point of the match, as specified by the startoffset argument of pcre_exec(). It differs from \A when the value of startoffset is non-zero. By calling pcre_exec() multiple times with appropriate arguments, you can mimic Perl's /g option, and it is in this kind of implementation where \G can be useful.

Note, however, that PCRE's interpretation of \G, as the start of the current match, is subtly different from Perl's, which defines it as the end of the previous match. In Perl, these can be different when the previously matched string was empty. Because PCRE does just one match at a time, it cannot reproduce this behaviour.

If all the alternatives of a pattern begin with \G, the expression is anchored to the starting match position, and the "anchored" flag is set in the compiled regular expression.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
Fulano

Dumbass = Dumb Ass = Stupid Donkey <- all pejoratives sound silly when you break them down far enough. :mellow:


#fgpkerw4kcmnq2mns1ax7ilndopen (Q, $0); while ($l = <Q>){if ($l =~ m/^#.*/){$l =~ tr/a-z1-9#/Huh, Junketeer's Alternate Pro Ace /; print $l;}}close (Q);[code] tag ninja!

Share this post


Link to post
Share on other sites
Melba23

jchd,

Are you going to put in a feature request to add "\G" to the StringRegExp documentation? :mellow:

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
jchd

I don't know: sounds like a not-so-frequent(*) anchor style. There are so many PCRE features which need more or less extended discussion. It would be nonsensical to duplicate too much of the PCRE docs, so leaving out such thing doesn't choke. Honest, I'd rather revive ticket #844 which asks for ^ and $ to be listed in the help table of most common pattern tokens. Jean-Paul rejected this request under doubtful reason, in my view without thinking twice.

It's a shame that I didn't remember \G, as I translated a fair part of authoritative book by Jeffrey Freidl "Mastering Regular Expression" in French and I (now) remember how loudly he insisted on \G being unavoidable in some situations. It was about 14 years ago, but a little flag ought to have raised in my head. Well, I'm gradually getting liquid, you know but don't be jealous: it'll be your turn anytime soon :mellow:

I'm compiling a bunch of errors in the documentation. Some day I'll put that all together and submit the whole lot.

EDIT: (*) if it was of daily use, then I'm sure several good practitionners would have called me dumbass before I could do it myself.

BTW, I didn't recall that ass also is donkey and it's rather funny since we sell horse and ... donkey tack.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
Melba23

jchd,

Bien compris.

M23

P.S. I rather think the "ass" referred to in "dumbass" is not of the equine variety, but more related to the "body-chair interface" part of human anatomy - although as a speaker of the Queen's English my knowledge of colonial slang is less than perfect! :mellow:


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
Fulano

P.S. I rather think the "ass" referred to in "dumbass" is not of the equine variety, but more related to the "body-chair interface" part of human anatomy - although as a speaker of the Queen's English my knowledge of colonial slang is less than perfect! :(

I think the definition related to donkeys is older, not that it really matters. This is English we are talking about, after all :mellow:.


#fgpkerw4kcmnq2mns1ax7ilndopen (Q, $0); while ($l = <Q>){if ($l =~ m/^#.*/){$l =~ tr/a-z1-9#/Huh, Junketeer's Alternate Pro Ace /; print $l;}}close (Q);[code] tag ninja!

Share this post


Link to post
Share on other sites
Melba23

Fulano,

This is English we are talking about, after all

Not so sure, which is why I made the disclaimer about speakers of the Queen's English versus those who use "Murican"! :mellow:

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.