Jump to content
IanN1990

StringRegExp but replace non-matching lines

Recommended Posts

Hi all,

I am trying to read a csv and remove any line that does not contain "File Location" or "Date Submitted" in column 4

Using the helpfile I was about to produce this;

$string = StringRegExpReplace($String, '\r\n.*,.*,.*,("File Location"|Date Submitted").*', "")

Irony is, it works perfectly in the opposite way i need and removes any line with "File Location" / "Date Submitted" in colum 4.  I can't figure out how to invert it.
Reading the helpfile it mentions ?! but no matter of combination I try the outcome is the same.

Any help is appreciated!

Ian

Edited by IanN1990

Share this post


Link to post
Share on other sites

@IanN1990

You could use a pattern like this:

'^.*(?<!File Location|Date Submitted)"$'

:)


Click here to see my signature:

Spoiler

Thoughts:

  • I will always thank you for the time you spent for me.
    I'm here to ask, and from your response, I'd like to learn.
    By my knowledge, I can help someone else, and "that someone" could help in turn another, and so on.

/*--------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/

ALWAYS GOOD TO READ:

 

Share this post


Link to post
Share on other sites

Look-behind needs to have a fixed number of characters in the first-level alternatives, so that won't work as is.

'(?m)(^(?:(?!"File Location"|"Date Submitted").)*(?:\R|$))'

should work (untested)

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Always amazes me how people come up with these regular expressions! I wish a had a mind for it :(

@FrancescoDiMuro Unfortunately i couldn't get yours to work :(

@jchd and @mikell both worked though jcdh was slightly faster, processing 150k lines in <4 seconds vs 4.5! a photo finish :)

Though i noticed it was picking up some false positives. 

08/01/2019 17:08:54,"Test","IanN.1990","File Location","C:\temp\121312.log","File Location" <-- Correct
08/01/2019 17:08:54,"Test","IanN.1990","Date Reviewed","C:\temp\121312.log","File Location" <-- Incorrect as column 4 = Date Reviewed

I attempted to correct this by adding in my earlier code .*,.*, with my logic being;

Match anything until , then match anything until , then look for my phase but that would en-scope the false positive. 

I am guessing i need a way for it only to look between the second and third , ?

Ian

Edited by IanN1990

Share this post


Link to post
Share on other sites

Of course the pattern posted using .* doesn't fit a precise bill.  If you have text fields in between (before the 4th) then it complicate matters significantly.  One can't rely on comma separators (text may contain commas) nor double quotes (text may contain escaped double quotes.

OTOH if your data always has a fixed number of fields, type and structure --somehow like the two lines above-- then it's possible to filter false positives out.

Beware that by default, .* is greedy!  Make that .*? to turn it lazy or use (?U) wisely.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

@mikell Your code works perfectly! and it completes all 15k rows in 500ms! Having the power of RegEx is like being a wizard!  

@jchd You are correct in everything you said, in this example i am quite lucky where the first four columns are generated pragmatically . So there wouldn't be a risk of the issues you mentioned.

Thank you all for your help on this issue! It's much appricated

Edited by IanN1990

Share this post


Link to post
Share on other sites

@mikell

Your code has served me faithfully but has recently run into a slight issue.
The lastest CSV sent by management can now have dual-roles. Where each role is separated by a ,

Is there a way the Regex could be changed to ignore , inside of quotes?
The example below shows when i am looking match column 4 but an item is missed.

 

$CSVString = '1,Enabled,"Mr,Test",Location' & @CRLF & '2,Enabled,"Mr Test",Location' & @CRLF

$Result1 = StringRegExpReplace($CSVString, '(?m)^([^,]*,){' & 3 & '}(' & 'Location' & ').*(*SKIP)(*F)|^.*\R?', "")
ConsoleWrite($Result1 & @CRLF & @CRLF)

$Result2 = StringRegExpReplace($CSVString, '(?m)^([^,]*,){' & 4 & '}(' & 'Location' & ').*(*SKIP)(*F)|^.*\R?', "")
ConsoleWrite($Result2 & @CRLF)

 

 

Edited by IanN1990

Share this post


Link to post
Share on other sites

Here are a couple of RE patterns that appear to work.

$CSVString = '1,Enabled,"Mr,Test",Test,Location' & @CRLF & _
             '2,Enabled,"Mr,Test",Location,Test'

;$Result1 = StringRegExpReplace($CSVString, '(?m)^(("[^"]+",)|([^,]*,)){3}(Location).*(*SKIP)(*F)|^.*\R?', "")
$Result1 = StringRegExpReplace($CSVString, '(?m)^(("[^"]+",)|([^,]*,)){3}+(Location).*(*SKIP)(*F)|^.*\R?', "") ; Updated to overcome "unique trait", 2 posts down.
ConsoleWrite($Result1 & @CRLF & @CRLF)

;$Result2 = StringRegExpReplace($CSVString, '(?m)^(?!(("[^"]+",)|([^,]*,)){3}(Location).*)^.*\R?', "")
$Result2 = StringRegExpReplace($CSVString, '(?m)^(?!(("[^"]+",)|([^,]*,)){3}+(Location).*)^.*\R?', "") ; Updated to overcome "unique trait", 2 posts down.
ConsoleWrite($Result2 & @CRLF & @CRLF)

 

Edited by Malkey
Updated RE pattern to overcome unique trait.

Share this post


Link to post
Share on other sites

@Malkey

Hi, both of your examples work perfectly. I am very jealous of the wizardry others have in RegExp, it is a hard art! :)

Share this post


Link to post
Share on other sites
Posted (edited)

@Malkey

After a month of reliable work, i have discovered an unique trait! 

$CSVString = '1,Enabled,"Mr,Test",Test,Location' & @CRLF & '2,Enabled,"Mr,Test",Location,Test' & @CRLF

$Result1 = StringRegExpReplace($CSVString, '(?m)^(("[^"]+",)|([^,]*,)){4}(Location).*(*SKIP)(*F)|^.*\R?', "")
ConsoleWrite($Result1 & @CRLF & @CRLF)

$Result2 = StringRegExpReplace($CSVString, '(?m)^(?!(("[^"]+",)|([^,]*,)){4}(Location).*)^.*\R?', "")
ConsoleWrite($Result2 & @CRLF & @CRLF)

It is returning both lines when only one has "Location" in column four? 

 

Edited by IanN1990

Share this post


Link to post
Share on other sites

Added a "possessive quantifier", "+".

$CSVString = '1,Enabled,"Mr,Test",Test,Location' & @CRLF & _
             '2,Enabled,"Mr,Test",Location,Test' & @CRLF & _
             '3,Enabled,Test,"Mr,Test",Location' & @CRLF & _
             '4,"Mr,Test",Enabled,Location,Test' & @CRLF

$Result1 = StringRegExpReplace($CSVString, '(?m)^(("[^"]+",)|([^,]+,)){4}+Location.*(*SKIP)(*F)|^.*\R?', "")
ConsoleWrite($Result1 & @CRLF & @CRLF)

$Result2 = StringRegExpReplace($CSVString, '(?m)^(?!(("[^"]+",)|([^,]+,)){4}+Location.*)^.*\R?', "")
ConsoleWrite($Result2 & @CRLF & @CRLF)
ConsoleWrite("-------------------------------------------" & @CRLF)

$Result1 = StringRegExpReplace($CSVString, '(?m)^(("[^"]+",)|([^,]+,)){3}+Location.*(*SKIP)(*F)|^.*\R?', "")
ConsoleWrite($Result1 & @CRLF & @CRLF)

$Result2 = StringRegExpReplace($CSVString, '(?m)^(?!(("[^"]+",)|([^,]+,)){3}+Location.*)^.*\R?', "")
ConsoleWrite($Result2 & @CRLF & @CRLF)

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Recently Browsing   0 members

    No registered users viewing this page.

×
×
  • Create New...