sdrs Posted December 7, 2009 Posted December 7, 2009 (edited) I need help with regular expressions.The quick way to ask the question:Find every occurrence of a number and insert a dollar sign in front of the first digit, not replace it:Orignial: "asdfasdf32asdfasdfasdfasdfasdf456asdfasdfasdfadf262adsfasdfaasdf123sdddf"Final: "asdfasdf$32asdfasdfasdfasdfasdf$456asdfasdfasdfadf$262adsfasdfaasdf$123sdddf"The actual problem I have:I have a long text I need to reformat. It is originally in this form: rows of data with a date and time heading. The number of date/time units are variable. For each date/time, the number of data points is variable but there is at least one.The date and time format are always: [0-9]{2}/[0-9]{2}( )[0-9]{4}There is a CR and LF at the end of each line.ORIGINAL:12/06 1325data1data2data3...datan12/05 1215data1data2data3...datan12/05 0930data1data2data3...datanI need to take the long form and compact it thus. Convert each row of data into a single line with the data/time as a header.FINAL:12/06 1325data1, data2, ... datan12/05 1215data1, data2, ... datan12/05 0930data1, data2, ... datanThe first thing I have done is:$rawtext = StringRegExpReplace($rawtext,"[\r][\n],", ")This turns the text into one single line:12/06 1325, data1, data2, ... datan, 12/05 1215, data1, data2, ... datan, 12/05 0930, data1, data2, ... datanNow I need to find each occurrence of a comma space followed by a date/time with: ((, )[0-9]{2}/[0-9]{2}( )[0-9]{4}) and (here is my problem) insert a chr(13) before the date and not replace the date.I know I can use a loop and the @extended parameter to find the offsets and then use another loop to insert the chr(13) but is there a way to do it with StringRegExpReplace as I would think this would be faster since the list can be long.Thanks for any help.SDRS Edited December 7, 2009 by sdrs
dantay9 Posted December 7, 2009 Posted December 7, 2009 (edited) I am not a pro at StringRegExp, but here is what I got. It isn't based fully on StringRegExp, but seems to be fast enough. #include <File.au3> $File1 = @ScriptDir & "\Test.txt" ;source file $File2 = @ScriptDir & "\Test2.txt" ;output file $NextLine = "" For $i = 0 To _FileCountLines($File1) $ReadLine = FileReadLine($File1, $i) If StringRegExp($ReadLine, "[0-9]{2}/[0-9]{2}\s[0-9]{4}") = 1 Then FileWrite($File2, StringTrimRight($NextLine, 1) & @CRLF & $ReadLine & @CRLF) $NextLine = "" Else $NextLine &= $ReadLine & "," EndIf Next FileWrite($File2, StringTrimRight($NextLine, 1)) ;write whatever is left Edit: The answer to your first question is the use of backreferences. It is documented in the help file, but isn't easy to understand without an example. The "\0" is the backreference to the string matched by StringRegExp. StringRegExpReplace($Orignial, "\d+", "$\0") Edited December 7, 2009 by dantay9
sdrs Posted December 7, 2009 Author Posted December 7, 2009 (edited) Thanks very much, dantay9.That will get the job done.It will be slow if the list is long as you have to use a loop. I am trying to do it without using a loop.Is there a way to use regular expressions to find a target string and insert a character in front of the found string?eg:Find every occurrence of a number and insert a dollar sign in front of the first digit, not replace it:Orignial: "asdfasdf32asdfasdfasdfasdfasdf456asdfasdfasdfadf262adsfasdfaasdf123sdddf"Final: "asdfasdf$32asdfasdfasdfasdfasdf$456asdfasdfasdfadf$262adsfasdfaasdf$123sdddf"Your option with be my backup.SDRSEDIT: just saw your backreference example. That will probably work much faster...just need to figure it out.Thanks again so much.SDRS Edited December 7, 2009 by sdrs
sdrs Posted December 7, 2009 Author Posted December 7, 2009 Thanks for the help again. Here is my solution based on backreference incase anyone else has questions. $rawtext=StringRegExpReplace($rawtext, "([0-9]{2}/[0-9]{2}( )[0-9]{4})", "!x!\0") $rawtext=StringRegExpReplace($rawtext, "!x!", chr(13)) This will insert a !x! before the date/time and the next line will replace the !x! with a carriage return. I chose the !x! as it is unlikely to be found in the data. SDRS
BugFix Posted December 7, 2009 Posted December 7, 2009 (edited) Needs only one step: $string = 'asdfasdf32asdfasdfasdfasdfasdf456asdfasdfasdfadf262adsfasdfaasdf123sdddf' ConsoleWrite(StringRegExpReplace($string, '\d+', '$\0') & @CRLF) result: asdfasdf$32asdfasdfasdfasdfasdf$456asdfasdfasdfadf$262adsfasdfaasdf$123sdddf Edited December 7, 2009 by BugFix Best Regards BugFix
jvanegmond Posted December 7, 2009 Posted December 7, 2009 I am not a pro at StringRegExp, but here is what I got. It isn't based fully on StringRegExp, but seems to be fast enough.#include <File.au3> $File1 = @ScriptDir & "\Test.txt" ;source file $File2 = @ScriptDir & "\Test2.txt" ;output file $NextLine = "" For $i = 0 To _FileCountLines($File1) $ReadLine = FileReadLine($File1, $i) If StringRegExp($ReadLine, "[0-9]{2}/[0-9]{2}\s[0-9]{4}") = 1 Then FileWrite($File2, StringTrimRight($NextLine, 1) & @CRLF & $ReadLine & @CRLF) $NextLine = "" Else $NextLine &= $ReadLine & "," EndIf Next FileWrite($File2, StringTrimRight($NextLine, 1)) ;write whatever is leftEdit: The answer to your first question is the use of backreferences. It is documented in the help file, but isn't easy to understand without an example. The "\0" is the backreference to the string matched by StringRegExp.StringRegExpReplace($Orignial, "\d+", "$\0")dantay, to read all the lines out of one file you should never use FileReadLine. The help file specifically says:From a performance standpoint it is a bad idea to read line by line specifying "line" parameter whose value is incrementing by one. This forces AutoIt to reread the file from the beginning until it reach the specified line. github.com/jvanegmond
skyboy Posted December 7, 2009 Posted December 7, 2009 (edited) i'm going to make a point that regular expressions are far more powerful than what people think, so programs end up over complicated and slow (for instance, you can read each line individually using ^ and $, each referring to after a \n and just before a \n, respectively) http://www.regular-expressions.info/ this gives you some good information on how to use regular expressions, and you can test some javascript regular expressions there, live; though you need to test the rest on your own Now I need to find each occurrence of a comma space followed by a date/time with: ((, )[0-9]{2}/[0-9]{2}( )[0-9]{4}) and (here is my problem) insert a chr(13) before the date and not replace the date. \d\d/\d\d(?:/\d{2,4})? will match any date, with and optional 2 or 4 digit year, and the year, if found, will not be in a back-reference group StringRegExpReplace(StringRegExpReplace($string, "((?:\r?\n)+)", ", ", 0), "(?:,\s*)*(\d\d/\d\d\s+\d{4})(?:,\s*)*", @LF&"\1", 0) that will return a line just how you would like for the second part Edited December 7, 2009 by skyboy
dantay9 Posted December 7, 2009 Posted December 7, 2009 (edited) @MandarThanks. I will keep that in mind for future code and maybe even change a few of my completed scripts.Here is a tool that I use to get the basis of my expressions. It really helps because everything is right there. Edited December 7, 2009 by dantay9
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now