Jump to content
therks

Regular expression - capture a character when "escaped", split otherwise

Recommended Posts

therks
Posted (edited)

I'm looking for a regex genius, cus I'm stumped when it comes to assertions.

So what I have now, is this regular expression: ([^|=]+)=([^|]+)
It takes a string (user input) of keys=values separated by pipes (ie: "param=value|param=value") and splits them into an array.

Example:

$vParamData = 'example=value|fruit=apple|phrase=Hello world'
$aRegEx = StringRegExp($vParamData, '([^|=]+)=([^|]+)', 3)

; Result
;   [0] => example
;   [1] => value
;   [2] => fruit
;   [3] => apple
;   [4] => phrase
;   [5] => Hello world

So that's working fine, but I'm wondering if there's also a way I could have this capture escaped pipes instead of splitting by them.

ie:

$vParamData = 'pipe test=this \| is a pipe|example=value'
$aRegEx = StringRegExp($vParamData, '([^|=]+)=([^|]+)', 3)

; I'm getting this:
;   [0] => pipe test
;   [1] => this \
;   [2] => example
;   [3] => value

; But I'd like a result like this:
;   [0] => pipe test
;   [1] => this \| is a pipe
;   [2] => example
;   [3] => value

Is there some pattern that would accomplish this, or am I better off parsing it some other way?

Edited by therks

Share this post


Link to post
Share on other sites
iamtheky

there are more efficient ways to do the pieces, but here is replacement method that simply avoids the escaped pipes and then slaps it all back together

#include<array.au3>

$vParamData = 'pipe test=this \| is a pipe|example=value'
$aData = stringsplit(stringreplace(_ArrayToString(stringsplit(stringreplace($vParamData , "\|" , "\*") , "|" , 2) , "=") , "\*" , "\|") , "=" , 2)
_ArrayDisplay($aData)

 

  • Thanks 1

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites
BugFix
; mask the pipe with a char combination, i.e. "@#"

$vParamData = 'pipe test=this @# is a pipe|example=value'
$aRegEx = StringRegExp($vParamData, '([^|=]+)=([^|]+)', 3)
For $i = 0 To UBound($aRegEx) -1
    $aRegEx[$i] = StringReplace($aRegEx[$i], '@#', '|') ; now replace with the pipe
Next

_ArrayDisplay($aRegEx)

 

  • Thanks 1

Best Regards BugFix  

Share this post


Link to post
Share on other sites
iamtheky
Posted (edited)

Might be able to pull off reusing the capture....

$vParamData = 'pipe test=this \| is a pipe|example=value'
msgbox(0, '' , stringreplace(stringregexpreplace($vParamData , "(\\\|)|(=)|(\|)" , "$1" & @LF  ) , "\|" & @LF , "\|"))

 

and without the additional stringreplace, but becoming much more fragile (but works for this specific case).

$vParamData = 'pipe test=this \| is a pipe|example=value'

msgbox(0, '' , stringregexpreplace($vParamData , "(\\\|.*)\||=|(=)|(\|)" , "$1" & @LF ))

 

Edited by iamtheky
  • Thanks 1

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites
jchd

Another skin:

$vParamData = 'pipe test=this \| is a pipe|example=value|another\|example=five|last \|one = one\|two\|three'
$aRegEx = StringRegExp($vParamData, '(.+?)=((?:\\\||[^\\|])+)(?:\||$)', 3)
_ArrayDisplay($aRegEx)

 

  • Like 1
  • Thanks 1

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
mikell

:)

#Include <Array.au3>
$vParamData = 'pipe test=this \| is a pipe|example=value|another\|example=five|last \|one = one\|two\|three'
$aRegEx = StringSplit(StringRegExpReplace($vParamData, '(?<!\\)\|', '='), "=", 3)
_ArrayDisplay($aRegEx)

 

  • Thanks 1

Share this post


Link to post
Share on other sites
jchd

:evil:

Real programmers use a single function call, quiche eaters use two.
© Ed Post. https://www.ee.ryerson.ca/~elf/hack/realmen.html

  • Haha 1

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
mikell

Real Programmers aren't afraid to use GOTOs. :P
So I assume that these 1982 scientists would actually use in AutoIt function calls instead of GOTOs

BTW I'm a fake programmer indeed. Does \Q..\E mean literal quiche eater ? :huh2:

Share this post


Link to post
Share on other sites
jchd

Computed indirect GOTOs, preferably.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • nend
      By nend
      This is a program that I made to help my self learn better regular expressions.
      There are a lot of other programs/website with the similar functions.
      But the main advantage of this program is that you don't have to click a button after every changes.
      The program detected changes and react on it.
      Function:
      Match Match of arrays Match and replace Load source data from website Load source data from a website with GET/POST Load text data from file Clear fields Export and Import settings (you can finish the expression a other time, just export/import it) Cheat sheet Generate AutoIt code The source code is not difficult and I think most user will understand it.
      In the zip file there are 2 export files (POST and a reg back example), you can drag and drop these files on the gui to import them.
      Download Regex Toolkit Regex toolkit.zip (Sourcode, exmaple and exe file)
      EDIT: Updated to version V1.2.0
      Changes are:
      Expand and collapse of the cheat sheet (Thanks to Melba23 for the Guiextender UDF) Usefull regular expressions websites links included in the program Text data update time EDIT: Updated to version V1.3.0
      Changes are:
       Automatic generate AutoIt code  Icons on the tab  Few minor bug fixes EDIT: Updated to version V1.4.0
      Changes are:
      Link to AutoIt regex helpfile If the regular expression has a error than the text becomes red Option Offset with Match and array of Matches Option Count with Match and replace Some small minor bug fixed EDIT: Updated to version V1.4.1
      Changes are:
      Small bug in "create AutoIt" code fixed
    • therks
      By therks
      So I have this pattern: 
      ^(?:(\d+)|(\d+):(\d+)|(\d+):(\d+):(\d+))$ And I'm expecting (depending on input) to get a 1, 2 or 3 index array (or @error for invalid input).
      But instead I get this:
      #include <Debug.au3> Func Test($String) _DebugArrayDisplay(StringRegExp($String, '^(?:(\d+)|(\d+):(\d+)|(\d+):(\d+):(\d+))$', 1)) EndFunc Test('10') ; Results (normal, expected): ; Row 0|10 Test('10:20') ; Results (extra blank index): ; Row 0| ; Row 1|10 ; Row 2|20 Test('10:20:30') ; Results (three blank indices): ; Row 0| ; Row 1| ; Row 2| ; Row 3|10 ; Row 4|20 ; Row 5|30 Is this normal? Should I just code around it, or is there a better way to do what I'm looking for?
      I also tried reversing my regex, but it was even uglier results:
      #include <Debug.au3> Func Test($String) _DebugArrayDisplay(StringRegExp($String, '^(?:(\d+):(\d+):(\d+))|(\d+):(\d+)|(\d+)$', 1)) EndFunc Test('10') ; Results (yuck): ; Row 0| ; Row 1| ; Row 2| ; Row 3| ; Row 4| ; Row 5|10 Test('10:20') ; Results (slightly better): ; Row 0| ; Row 1| ; Row 2| ; Row 3|10 ; Row 4|20 Test('10:20:30') ; Results (nice): ; Row 0|10 ; Row 1|20 ; Row 2|30  
    • Deye
      By Deye
      Hi,
      I want to add any needed conditions to the StringRegExp command so it can pull out only  "File.au3", "WinAPIFiles.au3", "Test.bmp" into the array
      #include <FileConstants.au3> #include <MsgBoxConstants.au3> #include 'WinAPIFiles.au3' #include "File.au3" ; Script Start - Add your code below here Local $bFileInstall = False ; Change to True and ammend the file paths accordingly. ; This will install the file C:\Test.bmp to the script location. If $bFileInstall Then FileInstall("C:\Test.bmp", @ScriptDir & "\Test.bmp") $sFile = FileRead(@ScriptFullPath) $aResults = StringRegExp($sFile, "(?i)(FileInstall\s*|include\s*)(.*)", 3) _ArrayDisplay($aResults) Thanks In Advance
      Deye
    • FroVN
      By FroVN
      i have a text : <Name>Jonh</Name>.<Age>15</Age>
      how i can get Jonh and 15 in one stringregexp? pls give me example
    • Chimp
      By Chimp
      regex and iso escape sequences
      Hi, I would like to extract all ISO escape squences embedded in a string and separate them from the rest of the string, still keeping the information about their position, so that, for exemple, a string like this one (or even more complex):
      (the string could start with normal text or iso sequences)
       
      '\u001B[4mUnicorn\u001B[0m' should be 'transformed' in an array like this
      $a[0] = '\u001B[4m' ; first iso escape sequence $a[1] = 'Unicorn' ; normal text $a[2] = '\u001B[4m' ; second iso escape sequence ... and so on (note: the above escape sequence has 'control codes' marked as "\u001B' for the asc "esc" char for exemple and a similar notation is also used for other control chars, but in the real string to be parsed those control chars  are embedded  as a single byte with a value from 01 to 31). at this link (http://artscene.textfiles.com/ansi/) there are many example of real ANSI text files .
      searching on the web I've found some possible solutions that make use of regexp to achieve similar purpose, and above some others, the regexp pattern posted in the following link by kfir (https://stackoverflow.com/questions/14693701/how-can-i-remove-the-ansi-escape-sequences-from-a-string-in-python) seems to be able to catch a wider range of ISO escape sequences (not only color sequences), but my lack of skills on regexp, prevents me from evaluating and testing such patterns
      I would be very grateful if some regexp guru could come to my rescue...
      thanks everybody  for reading...
×