Sign in to follow this  
Followers 0
qwert

Request for help formulating regular expression

9 posts in this topic

I’m a complete newbie when it comes to using StringRegExp. Although I’ve successfully modified a couple of patterns in existing scripts, I’ve never been sure where to start in formulating a new pattern. Thus, I’ve avoided using them. But now I need to.

 

The string I’d like to recognize and replace is of the form:

 

prekeykey<any characters>post

 

To put it in words, I’m looking for any occurrence of two or more consecutive key strings that are embraced by pre and post strings, regardless of any follow-on characters. The slash characters are part of the individual elements, but could possibly occur on their own. (What makes this doubly confusing is that the slash character is an element of the RegExp syntax.)

 

Examples of strings to be recognized and replaced:

prekeykeykeykey post

prekeykeykeykeyabcdpost

prekeykeypost

 

But neither of the following should be “found”, since the have only one key:

prekeypost

prekeyabcdpost

 

If someone versed in RegExp’s would be so kind as to provide me with a nudge in the form of a suitable pattern, I might be able to make my way further along the path toward a working knowledge of these things. So far, I’ve looked at 100 examples and can’t get a toehold.  Detecting "two consecutive" appears to be a rare requirement.

 

Thanks in advance for any help.

Share this post


Link to post
Share on other sites



Something like this could do the job :

If StringRegExp($string, "^\\pre(\\key){2,}.*?\\post$") Then
    ConsoleWrite("Match")
Else
    ConsoleWrite("Not match")
EndIf

^                 starts with
pre           pre
(key){2,}   key two or more time
.*?              something or not, till next item
post         post
$                ends with

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

qwert,

This seems to work: :)

Global $aTest[5] = ["\pre\key\key\key\key\ \post", _
"\pre\key\key\key\keyabcd\post", _
"\pre\key\key\post", _
"\pre\key\post", _
"\pre\keyabcd\post"]

For $i = 0 To 4
    ConsoleWrite($aTest[$i] & " - " & StringRegExpReplace($aTest[$i], "(?U)^(\\pre.*)(\\)(key.*)(\g2)(\g3)(\g2)(.*)$", "$1\\FRED\\FRED\\$7") & @CRLF)
Next
SRE decode:

(?U)      - Not greedy so as few characters as possible
^         - Start of string
(\\pre.*) - Capture "\pre" and any characters up to \ (save as group 1)
(\\)      - Capture a backslash - save as Group 2
(key.*)   - Capture key followed by any characters (save as Group 3) until we meet
(\g1)     - Another group 2 followed by
(\g2)     - Another group 3 and
(\g1)     - yet another group 2
(.*)      - Capture all (save as group 7) until
$         - end of string

Replace with
$1        - Group 1
\\FRED\\FRED\\ ; The replacement for the found double key
$7        - Group 7
This is the first time I have used backreferenced groups - seems quite a powerful thing to have in the armoury. :graduated:

M23

Edited by Melba23
Realised it was SRER and not SRE

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

My 2 cents  :D

This one allows special characters in the test substring

Global $aTest[6] = ["\pre\key\key\key\key\ \post", _
        "\pre\key\key\key\keyabcd\post", _
        "\pre\key\key\post", _
        "\pre\key\keyabcd\post", _
        "\pre\key\post", _
        "\pre\keyabcd\post"]

$test = "key"
For $i = 0 To 5
  ConsoleWrite(StringRegExp($aTest[$i], '((?<=\\)\Q' & $test & '\E.*?\\){2,}') & " - " & $aTest[$i] & @CRLF)
Next

Q...E  for possible special characters

(?<=)  preceded by a

{2,}  2 or more times

Edit

Melba, :thumbsup:

But it will fail on "prekeykeyabcdpost" because the content of the backref is not the same

Though I don't know if "prekeyabkeycdpost" should be matched or not

My code matches this, but if it shouldn't just remove the .*?

Edited by mikell

Share this post


Link to post
Share on other sites

I see people at work here.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Some short feedback (as I have more testing to do):

I have "^pre(key){2,}.*?post$" working for my simplest case (the one I described).  I dropped the $ ("ends with") because there can be other characters after the post element, which I didn't mention. 

I'll look at each method in detail over the next couple of days.  Already, I'm optimistic that these will give me pretty good starting points for other uses. 

Thanks to each of you for the responses.


Share this post


Link to post
Share on other sites

mikell,

 

it will fail on "prekeykeyabcdpost"

The OP's requirements were not altogether clear, but I assumed that the successive keys needed to be identical. Perhaps the OP could tell us so we can modify our respective patterns. :)

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

Yes, identical.  They are repetitions of the same "key" string.  Sorry that that wasn't clear.

Share this post


Link to post
Share on other sites

Here are a few more guesses of what your required return strings is supposed to be.

Maybe giving an example input text together with an example of the expected output in the opening post would have been helpful to the helpers.

Global $sTest = "In between \pre\key\key\key\key\ \post some text" & @CRLF & _
        "\pre\key\key\key\keyabcd\post" & @CRLF & _
        "\pre\key\key\post" & @CRLF & _
        "\pre\key\post" & @CRLF & _
        "\pre\keyabcd\post"

ConsoleWrite(StringRegExpReplace($sTest, "(\\pre.*?)(\\key[^\\]*)\2+(.*?\\post)", "$1\\FRED\\FRED$3") & @CRLF)
#cs Returns:-
    In between \pre\FRED\FRED\ \post some text
    \pre\FRED\FREDabcd\post
    \pre\FRED\FRED\post
    \pre\key\post
    \pre\keyabcd\post
#ce

ConsoleWrite(@CRLF & " ------ " & @CRLF & StringRegExpReplace($sTest, "(\\pre.*?\\)(key\\)\2+(.*?\\?post)", "$1FRED\\FRED\\$3") & @CRLF & @CRLF)
#cs Returns:-
    In between \pre\FRED\FRED\ \post some text
    \pre\FRED\FRED\keyabcd\post
    \pre\FRED\FRED\post
    \pre\key\post
    \pre\keyabcd\post
#ce

; Post#1: The string I’d like to recognize and replace is of the form: "\pre\key\key<any characters>\post"
ConsoleWrite(@CRLF & " ------ " & @CRLF & StringRegExpReplace($sTest, "(\\pre.*?)(\\key[^\\]*)\2+(.*?\\post)", "\\FRED\\FRED") & @CRLF)
#cs Returns:-
    In between \FRED\FRED some text
    \FRED\FRED
    \FRED\FRED
    \pre\key\post
    \pre\keyabcd\post
#ce

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • Robinson1
      By Robinson1
      Well the plan is to use the power of regular expressions engine of AutoIT for patching binary data.
      Something like this: StringRegExp( $BinaryData,  "(?s)\x55\x8B.."
       
      <cut> ... Okay straight to question/problem
      ... certain bytes that are in the range from 0x80 to 0xA0 won't match.
      Hmm seem to be a char encoding problem. In detail these are 27 chars: 0x80, 0x82~8C, 0x8E, 0x91~9C, 0x9E,0x9F
      Here's a small code snippet to explore / explain this problem:
      #include "StringConstants.au3" $TestData = BinaryToString("0x7E7F808182") ;Okay $match = StringRegExp( $TestData ,'\x7E' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;Okay $match = StringRegExp( $TestData ,'\x7F' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;Error no match $match = StringRegExp( $TestData ,'\x80' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;Okay $match = StringRegExp( $TestData ,'\x81' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;Error no match $match = StringRegExp( $TestData ,'\x82' ,$STR_REGEXPARRAYFULLMATCH) ConsoleWrite('@extended = ' & @extended & ' $match = ' & $match & @CRLF) ;~ output: ;~ @extended = 2 $match = ;~ @extended = 3 $match = ;~ @extended = 0 $match = 1 ;~ @extended = 5 $match = ;~ @extended = 0 $match = 1 Hmm what to do? Go back and use the 'numberstring monster' implementation or just omit that range of 'unsafe bytes'. What is the root of this problem?
      Any idea how to fix this?
       
      Update: Okay I know a byte is not a character.
      But StringRegExp operates on String and so character level.
      Okay as long as you stay at Ansi encoding and only use /x00 - /X7F in the search pattern using  StringRegExp works well to search for binary data.
      What bytes can be matched that are in the range from /X7F - /xFF is also depending on the code page.
      So this avoid to search for bytes in the range from 0x80-0xa0 only applies to Germany.
      I just change this country setting:

      to Thai and now near all bytes from /X7F - /xFF fails to match.
    • Carm01
      By Carm01
      Hello,
      I have spent the past day fooling with StringRegExp to no avail attempting to get what would be a simple solution to an issue using StringRegExp.
      I will post the code in a sec. The string 'Java x Update y' where x and y are numeric values ONLY if a letter is mixed in anywhere then it should fail. I have been able to successfully deal with the x value so if x = 1234 or a1234 or 1a234 or 1234a would result in a fail if 'a' was in the string. However, when y = 1a234 then I get an output of 1 and when y = 1234a then the output = 1234 when both should fail. I am probably overlooking something simple and in looking through all the material and experimenting I am unable to figure it out and my experience with stringregexp and trying to find examples of this proved difficult. If someone could assist or point me to a thread ? Here is my code ; prob a simple fix. I am also trying to avoid white spaces.
      Thanks in advance
      #include <array.au3> $aArray = StringRegExp('Java 3009 Update 1a21', '(?i)Java (\d+) Update (\d+)', $STR_REGEXPARRAYGLOBALMATCH) If @error Then Exit _ArrayDisplay($aArray)  
    • ViciousXUSMC
      By ViciousXUSMC
      So I ran into this crazy "program" that cant be uninstalled via WMI, MSIExec, etc.
      The only way to uninstall it was from Add/Remove programs manually... Or I found if you find it in the registry under HKCU and run the  uninstall string, it will also uninstall.
      However the string in the registry cant be run directly in a cmd window because of the format errors.
      It has spaces without quotations, it has invalid characters, etc, etc 
      I know things run different when executed in the registry, so maybe there is a way I can run the regsitry key just like how the system does?  If so chime in.
      Otherwise I did this a crude way using several stringregexpreplace() functions and have it working.
      The solution feels so barbaric and crude that I wanted to post it so some of you guys better than me can clean up the code, maybe offer alternative ways to do it, or reduce the number of times I process the string.
      Here is the string right out of the registry:
      c:\Program Files\Common Files\Microsoft Shared\VSTO\10.0\VSTOInstaller.exe /Uninstall file:///C:/Users/it022565/AppData/Local/Temp/OOBAXTOWordAddIn/ApplicationXtender.AXTO.Word.vsto Here is my cave man scripting to turn this into a run able string.
       
      Func _UninstallOld() For $i = 1 to 100 ;Enumerate Registry $sEnumBase = "HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\" ;Look in HKCU for the uninstall string for the old version $sEnum = RegEnumKey($sEnumBase, $i) If @Error Then Return If $iDebug = 1 Then MsgBox(0, "", $sEnum) If StringInStr(RegRead($sEnumBase & $sEnum, "DisplayName"), "Word Addin") Then ExitLoop Next If $iDebug = 1 Then MsgBox(0, "", $sEnum) $sKey = "HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall\" & $sEnum $sKey2 = RegRead($sKey, "UninstallString") If $iDebug = 1 Then MsgBox(0, "Original Install Location", $sKey2) $sKey3 = StringRegExpReplace($sKey2, "(?i)(c:.*exe)", '"$1"') If $iDebug = 1 Then MsgBox(0, "", $sKey3) $sKey4 = StringRegExpReplace($sKey3, "(?i)file:///", "") If $iDebug = 1 Then MsgBox(0, "", $sKey4) $sKey5 = StringRegExpReplace($sKey4, "%20", " ") If $iDebug = 1 Then MsgBox(0, "", $sKey5) $sKey6 = StringRegExpReplace($sKey5, '(?i)((?<!")c:.*vsto)', '"$1"') If $iDebug = 1 Then MsgBox(0, "", $sKey6) RunWait(@ComSpec & ' /c ' & '"' & $sKey6 & ' /s"', "", @SW_HIDE) EndFunc Basically step by step I add quotations, strip bad characters, etc.  Kind of proud for using look behind for once
      Looking forward to what you guys come up with.
    • VIP
      By VIP
      Need help to make function better  with full infomation
      #include <Array.au3> #include <File.au3> _TEST(@ScriptFullPath) _TEST("A:") _TEST("A:\B.c") _TEST("D:\E\F\") _TEST("G:\H/../J.k/") _TEST("M:\N\k..J.k") _TEST("D:\E\F\..\G\G\I..J.K.M") Func _TEST($sFilePath) Local $sDrive = "", $sFullPathDir = "", $sDirPath = "", $sDirName = "", $sFileName = "", $sFileNameExt = "", $sExtension = "", $sExt = "" Local $aPathSplit = _PathSplitByRef($sFilePath, $sDrive, $sFullPathDir, $sDirPath, $sDirName, $sFileName, $sFileNameExt, $sExtension, $sExt) ConsoleWrite("!Path IN : " & $sFilePath & @CRLF) ; C:\Windows\System32\etc\hosts.exe ConsoleWrite("- Driver : " & $sDrive & @CRLF) ; C: ConsoleWrite("- DirPath : " & $sFullPathDir & @CRLF) ; C:\Windows\System32\etc\etc ConsoleWrite("- DirPath : " & $sDirPath & @CRLF) ; \Windows\System32\etc\ ConsoleWrite("- DirName : " & $sDirName & @CRLF) ; etc ConsoleWrite("- FileName : " & $sFileName & @CRLF) ; hosts ConsoleWrite("- FileNameExt: " & $sFileNameExt & @CRLF) ; hosts.exe ConsoleWrite("- Extension : " & $sExtension & @CRLF) ; .exe ConsoleWrite("- Ext : " & $sExt & @CRLF & @CRLF) ; exe ;~ ConsoleWrite("!Path IN : " & $aPathSplit[0] & @CRLF) ; C:\Windows\System32\etc\hosts.exe ;~ ConsoleWrite("- Driver : " & $aPathSplit[1] & @CRLF) ; C: ;~ ConsoleWrite("- DirPath : " & $aPathSplit[2] & @CRLF) ; C:\Windows\System32\etc\etc ;~ ConsoleWrite("- DirPath : " & $aPathSplit[3] & @CRLF) ; \Windows\System32\etc\ ;~ ConsoleWrite("- DirName : " & $aPathSplit[4] & @CRLF) ; etc ;~ ConsoleWrite("- FileName : " & $aPathSplit[5] & @CRLF) ; hosts ;~ ConsoleWrite("- FileNameExt: " & $aPathSplit[6] & @CRLF) ; hosts.exe ;~ ConsoleWrite("- Extension : " & $aPathSplit[7] & @CRLF) ; .exe ;~ ConsoleWrite("- Ext : " & $aPathSplit[8] & @CRLF) ; exe ;~ _ArrayDisplay($aPathSplit, "_PathSplit of " & $sFilePath) EndFunc ;==>_TEST Func _PathSplitByRef($sFilePath, ByRef $sDrive, ByRef $sFullPathDir, ByRef $sDirPath, ByRef $sDirName, ByRef $sFileName, ByRef $sFileNameExt, ByRef $sExtension, ByRef $sExt) If StringInStr($sFilePath,"..") Then $sFilePath=_PathFull($sFilePath) Local $aPartOfPath=StringRegExp($sFilePath, "^\h*((?:\\\\\?\\)*(\\\\[^\?\/\\]+|[A-Za-z]:)?(.*[\/\\]\h*)?((?:[^\.\/\\]|(?(?=\.[^\/\\]*\.)\.))*)?([^\/\\]*))$", $STR_REGEXPARRAYMATCH) ;~ If @error Then ReDim $aPartOfPath[9] ;~ $aPartOfPath[0] = $sFilePath ;~ EndIf $aPartOfPath[0] = $sFilePath ; C:\Windows\System32\etc\hosts.exe $sDrive = $aPartOfPath[1] ; C: $sFullPathDir = $aPartOfPath[1] & $aPartOfPath[2] ; C:\Windows\System32\etc If StringLeft($aPartOfPath[2], 1) == "/" Then $sDirPath = StringRegExpReplace($aPartOfPath[2], "\h*[\/\\]+\h*", "\/") Else $sDirPath = StringRegExpReplace($aPartOfPath[2], "\h*[\/\\]+\h*", "\\") EndIf $aPartOfPath[2] = $sFullPathDir ; C:\Windows\System32\etc $sDirName=StringReplace($sDirPath,"\","") $sDirName=StringReplace($sDirPath,"/","") $sFileName = $aPartOfPath[3] ; hosts $aPartOfPath[5] = $sFileName ; hosts $sExtension = $aPartOfPath[4] ; .exe $aPartOfPath[7] = $sExtension ; .exe $aPartOfPath[3] = $sDirPath ; \Windows\System32\etc\ $aPartOfPath[4] = $sDirName ; etc $aPartOfPath[6] = $sFileName & $sExtension ; hosts.exe $sFileNameExt = $aPartOfPath[6] ; hosts.exe $sExt = StringReplace($sExtension,".","") ; exe $aPartOfPath[8] = $sExt ; exe Return $aPartOfPath EndFunc ;==>_PathSplitByRef