Sign in to follow this  
Followers 0
gruntydatsun

Cannot get simple regex to work

12 posts in this topic

can anyone help with why the below regex isn't working please?

#include <Array.au3>
$text = "worm4,3snake,8maggot,tapeworm9,politician"
$array = StringRegExp($text,'(.*?),(.*?),(.*?),(.*?),(.*?)',3)
_ArrayDisplay($array)

The regex can digest the first 4 words but it seems like the regex is gagging on or repulsed by the 5th word?

Seriously though, it's not working for me and I don't know why.

I'm getting an array with 5 elements with the fifth element being empty (ie with no discernable value, coincidence???? i think not)

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

I would put your problem down to the non-greediness of ".*?".  The minimum of ".*" in nothing.  The minimum of ".+" in one character.

#include <Array.au3>

local $text, $array
$text = "worm4,3snake,8maggot,tapeworm9,politician"

$array = StringRegExp($text,'(.*?),(.*?),(.*?),(.*?),(.*?)', 3) ; Does not work. The last question mark
;                                   after the "*" makes the last capture group so non-greedy that nothing is matched.

$array = StringRegExp($text,'(.*?),(.*?),(.*?),(.*?),(.+?)', 3) ; Does not work. The last question mark
;           after the "+" makes the last capture group so non-greedy that one character only is matched.

;$array = StringRegExp($text,'(.*?),(.*?),(.*?),(.*?),(.*?)$', 3) ; Works fine. Anchors end of string.

;$array = StringRegExp($text,'(.*?),(.*?),(.*?),(.*?),(.*)', 3) ; Works fine. The last question mark
;                                               makes the last capture group greedy to end of string.

;$array = StringRegExp($text,'(.*),(.*),(.*),(.*),(.*)', 3) ; Works fine. The question marks
;                                                    makes each capture group greedy to comas.

;$array = StringRegExp($text,'[^,]+', 3) ; Works fine. Capture all (sequences of) non-coma characters.
_ArrayDisplay($array)
Edited by Malkey

Share this post


Link to post
Share on other sites

Thanks for all the examples Malkey.  I got a good laugh out of the lack of greediness making the politician invisible.

I tried this and it worked:

$array = StringRegExp($text,'(?U)(.*?),(.*?),(.*?),(.*?),(.*?)', 3)

I've seen inverting greediness in the manual and still don't understand why this works.   I feel like a monkey who lit a fire by banging rocks together.  I made a fire but have no idea how i did it. lol

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

".*?" mean "smallest matching string"

You define what is ahead "(. *?)" I mean comma.
But you have not defined the end of the range, ie after the last group, there is no sign that limit the scope of the group.
 
Therefore, the smallest matching string is an empty string.
 
try this:
$array = StringRegExp($text,'(?s)(.*?),(.*?),(.*?),(.*?),(.*?)$', 3)
Edited by mlipok

Signature beginning:   Wondering who uses AutoIT and what it can be used for ?
* GHAPI UDF - modest begining - comunication with GitHub REST API *
ADO.au3 UDF     POP3.au3 UDF     XML.au3 UDF    How to use IE.au3  UDF with  AutoIt v3.3.14.x  for other useful stuff click the following button

Spoiler

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind. 

My contribution (my own projects): * Debenu Quick PDF Library - UDF * Debenu PDF Viewer SDK - UDF * Acrobat Reader - ActiveX Viewer * UDF for PDFCreator v1.x.x * XZip - UDF * AppCompatFlags UDF * CrowdinAPI UDF * _WinMergeCompare2Files() * _JavaExceptionAdd() * _IsBeta() * Writing DPI Awareness App - workaround * _AutoIt_RequiredVersion() * Chilkatsoft.au3 UDF * TeamViewer.au3 UDF * JavaManagement UDF * VIES over SOAP * WinSCP UDF * GHAPI UDF - modest begining - comunication with GitHub REST API *

My contribution to others projects or UDF based on  others projects: * _sql.au3 UDF  * POP3.au3 UDF *  RTF Printer - UDF * XML.au3 - BETA * ADO.au3 UDF SMTP Mailer UDF *

Useful links: * Forum Rules * Forum etiquette *  Forum Information and FAQs * How to post code on the forum * AutoIt Online Documentation * AutoIt Online Beta Documentation * SciTE4AutoIt3 getting started * Convert text blocks to AutoIt code * Games made in Autoit * Programming related sites * Polish AutoIt Tutorial * DllCall Code Generator * 

Wiki: Expand your knowledge - AutoIt Wiki * Collection of User Defined Functions * How to use HelpFile * Best coding practices * 

IE Related:  * How to use IE.au3  UDF with  AutoIt v3.3.14.x * Why isn't Autoit able to click a Javascript Dialog? * Clicking javascript button with no ID * IE document >> save as MHT file * IETab Switcher (by LarsJ ) * HTML Entities * _IEquerySelectorAll() (by uncommon) * 

I encourage you to read: * Global Vars * Best Coding Practices * Please explain code used in Help file for several File functions * OOP-like approach in AutoIt * UDF-Spec Questions *  EXAMPLE: How To Catch ConsoleWrite() output to a file or to CMD *

"Homo sum; humani nil a me alienum puto" - Publius Terentius Afer
"Program are meant to be read by humans and only incidentally for computers and execute" - Donald Knuth, "The Art of Computer Programming"
:naughty:  :ranting:, be  :) and       \\//_.

Anticipating Errors :  "Any program that accepts data from a user must include code to validate that data before sending it to the data store. You cannot rely on the data store, ...., or even your programming language to notify you of problems. You must check every byte entered by your users, making sure that data is the correct type for its field and that required fields are not empty."

Signature last update: 2017-06-04

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

I got a good laugh out of the lack of greediness making the politician invisible.

$array = StringRegExp($text,'(?U)(.*?),(.*?),(.*?),(.*?),(.*?)', 3)

I didn't realize the pun with the politician and greediness! Very funny indeed.

(?U) makes your pattern work and here is why: first, greediness affects only the rightmost part of subject (it doesn't change the current match point, and in particular the start of subject point). Since the four first (.*?) are followed by a hardwired comma, they are insensible to greediness. Hence greediness only affects what happens in the final sub-pattern. Either inverting it (by pattern-wide ?U or locally by not using the laziness ?) or following the pattern with a $ anchor is enough to make the politician reappear in the picture.

That it be a good thing or not is a distinct question.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Thanks for the explanations :) .  I think the monkey might be able to make sparks deliberately now.

Share this post


Link to post
Share on other sites

In general, avoid using .* and instead use the inverse delimiter character. In this case, your delimiter is a comma:

$array = StringRegExp($text,'([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)',3)

Gerard J. Pinzonegpinzone AT yahoo.com

Share this post


Link to post
Share on other sites

I've been doing some reading and playing with this more and came up with:

#include <Array.au3>
local $text = "worm4,3snake,8maggot,tapeworm9,politician"
$array = StringRegExp($text,'([^,]+)(?:,?)',3)
_ArrayDisplay($array)

I was trying to learn how to do it with repeating {4} and eventually felt myself slipping into insanity. 

Could someone please show me an example of how to do this using the repeating {} way?

Share this post


Link to post
Share on other sites

Yes I know this an older post but I have been trying to get my brain around this and.....ach what fun.

An insane monkey that can build a fire, pretty much says it all;-)

Well to use the repeater you need something that repeats ( I think :huh2:  ) so I changed your $text a bit so this would work:

Obviously I'm new at this so this a just a stab:

#include <Array.au3>
local $text = "worm4,3snake,8maggot,tape-worm9,politician"
$array = StringRegExp($text,'\bw\w{4}\b',3) ; 4 letter word that begins with "w"
_ArrayDisplay($array)

Share this post


Link to post
Share on other sites

That returns words that begin with w and have 4 word characters after it i.e. 5 chrs in total.


_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 04/09/2015

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

Or using positive look ahead (?=,)

$sInput = "worm4,3snake,8maggot,tapeworm9,politician"

$sInput = StringRegExp($sInput, '(\w*)(?=,|\Z)', 3)

For $i = 0 To UBound($sInput) - 1
ConsoleWrite($sInput[$i] &  @CRLF)
Next
Edited by Jury

Share this post


Link to post
Share on other sites

 

That returns words that begin with w and have 4 word characters after it i.e. 5 chrs in total.

So it is...

I was wondering why the output included the trailing 5th character, I assumed the /b excluded anything but letters?

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0