Jump to content

StringRegExp but replace non-matching lines


IanN1990
 Share

Recommended Posts

Hi all,

I am trying to read a csv and remove any line that does not contain "File Location" or "Date Submitted" in column 4

Using the helpfile I was about to produce this;

$string = StringRegExpReplace($String, '\r\n.*,.*,.*,("File Location"|Date Submitted").*', "")

Irony is, it works perfectly in the opposite way i need and removes any line with "File Location" / "Date Submitted" in colum 4.  I can't figure out how to invert it.
Reading the helpfile it mentions ?! but no matter of combination I try the outcome is the same.

Any help is appreciated!

Ian

Edited by IanN1990
Link to comment
Share on other sites

Look-behind needs to have a fixed number of characters in the first-level alternatives, so that won't work as is.

'(?m)(^(?:(?!"File Location"|"Date Submitted").)*(?:\R|$))'

should work (untested)

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Always amazes me how people come up with these regular expressions! I wish a had a mind for it :(

@FrancescoDiMuro Unfortunately i couldn't get yours to work :(

@jchd and @mikell both worked though jcdh was slightly faster, processing 150k lines in <4 seconds vs 4.5! a photo finish :)

Though i noticed it was picking up some false positives. 

08/01/2019 17:08:54,"Test","IanN.1990","File Location","C:\temp\121312.log","File Location" <-- Correct
08/01/2019 17:08:54,"Test","IanN.1990","Date Reviewed","C:\temp\121312.log","File Location" <-- Incorrect as column 4 = Date Reviewed

I attempted to correct this by adding in my earlier code .*,.*, with my logic being;

Match anything until , then match anything until , then look for my phase but that would en-scope the false positive. 

I am guessing i need a way for it only to look between the second and third , ?

Ian

Edited by IanN1990
Link to comment
Share on other sites

Of course the pattern posted using .* doesn't fit a precise bill.  If you have text fields in between (before the 4th) then it complicate matters significantly.  One can't rely on comma separators (text may contain commas) nor double quotes (text may contain escaped double quotes.

OTOH if your data always has a fixed number of fields, type and structure --somehow like the two lines above-- then it's possible to filter false positives out.

Beware that by default, .* is greedy!  Make that .*? to turn it lazy or use (?U) wisely.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

@mikell Your code works perfectly! and it completes all 15k rows in 500ms! Having the power of RegEx is like being a wizard!  

@jchd You are correct in everything you said, in this example i am quite lucky where the first four columns are generated pragmatically . So there wouldn't be a risk of the issues you mentioned.

Thank you all for your help on this issue! It's much appricated

Edited by IanN1990
Link to comment
Share on other sites

  • 1 month later...

@mikell

Your code has served me faithfully but has recently run into a slight issue.
The lastest CSV sent by management can now have dual-roles. Where each role is separated by a ,

Is there a way the Regex could be changed to ignore , inside of quotes?
The example below shows when i am looking match column 4 but an item is missed.

 

$CSVString = '1,Enabled,"Mr,Test",Location' & @CRLF & '2,Enabled,"Mr Test",Location' & @CRLF

$Result1 = StringRegExpReplace($CSVString, '(?m)^([^,]*,){' & 3 & '}(' & 'Location' & ').*(*SKIP)(*F)|^.*\R?', "")
ConsoleWrite($Result1 & @CRLF & @CRLF)

$Result2 = StringRegExpReplace($CSVString, '(?m)^([^,]*,){' & 4 & '}(' & 'Location' & ').*(*SKIP)(*F)|^.*\R?', "")
ConsoleWrite($Result2 & @CRLF)

 

 

Edited by IanN1990
Link to comment
Share on other sites

Here are a couple of RE patterns that appear to work.

$CSVString = '1,Enabled,"Mr,Test",Test,Location' & @CRLF & _
             '2,Enabled,"Mr,Test",Location,Test'

;$Result1 = StringRegExpReplace($CSVString, '(?m)^(("[^"]+",)|([^,]*,)){3}(Location).*(*SKIP)(*F)|^.*\R?', "")
$Result1 = StringRegExpReplace($CSVString, '(?m)^(("[^"]+",)|([^,]*,)){3}+(Location).*(*SKIP)(*F)|^.*\R?', "") ; Updated to overcome "unique trait", 2 posts down.
ConsoleWrite($Result1 & @CRLF & @CRLF)

;$Result2 = StringRegExpReplace($CSVString, '(?m)^(?!(("[^"]+",)|([^,]*,)){3}(Location).*)^.*\R?', "")
$Result2 = StringRegExpReplace($CSVString, '(?m)^(?!(("[^"]+",)|([^,]*,)){3}+(Location).*)^.*\R?', "") ; Updated to overcome "unique trait", 2 posts down.
ConsoleWrite($Result2 & @CRLF & @CRLF)

 

Edited by Malkey
Updated RE pattern to overcome unique trait.
Link to comment
Share on other sites

  • 3 weeks later...

@Malkey

After a month of reliable work, i have discovered an unique trait! 

$CSVString = '1,Enabled,"Mr,Test",Test,Location' & @CRLF & '2,Enabled,"Mr,Test",Location,Test' & @CRLF

$Result1 = StringRegExpReplace($CSVString, '(?m)^(("[^"]+",)|([^,]*,)){4}(Location).*(*SKIP)(*F)|^.*\R?', "")
ConsoleWrite($Result1 & @CRLF & @CRLF)

$Result2 = StringRegExpReplace($CSVString, '(?m)^(?!(("[^"]+",)|([^,]*,)){4}(Location).*)^.*\R?', "")
ConsoleWrite($Result2 & @CRLF & @CRLF)

It is returning both lines when only one has "Location" in column four? 

 

Edited by IanN1990
Link to comment
Share on other sites

Added a "possessive quantifier", "+".

$CSVString = '1,Enabled,"Mr,Test",Test,Location' & @CRLF & _
             '2,Enabled,"Mr,Test",Location,Test' & @CRLF & _
             '3,Enabled,Test,"Mr,Test",Location' & @CRLF & _
             '4,"Mr,Test",Enabled,Location,Test' & @CRLF

$Result1 = StringRegExpReplace($CSVString, '(?m)^(("[^"]+",)|([^,]+,)){4}+Location.*(*SKIP)(*F)|^.*\R?', "")
ConsoleWrite($Result1 & @CRLF & @CRLF)

$Result2 = StringRegExpReplace($CSVString, '(?m)^(?!(("[^"]+",)|([^,]+,)){4}+Location.*)^.*\R?', "")
ConsoleWrite($Result2 & @CRLF & @CRLF)
ConsoleWrite("-------------------------------------------" & @CRLF)

$Result1 = StringRegExpReplace($CSVString, '(?m)^(("[^"]+",)|([^,]+,)){3}+Location.*(*SKIP)(*F)|^.*\R?', "")
ConsoleWrite($Result1 & @CRLF & @CRLF)

$Result2 = StringRegExpReplace($CSVString, '(?m)^(?!(("[^"]+",)|([^,]+,)){3}+Location.*)^.*\R?', "")
ConsoleWrite($Result2 & @CRLF & @CRLF)

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...