Sign in to follow this  
Followers 0
sdrs

Help with StringRegExpReplace

8 posts in this topic

#1 ·  Posted (edited)

I need help with regular expressions.

The quick way to ask the question:

Find every occurrence of a number and insert a dollar sign in front of the first digit, not replace it:

Orignial: "asdfasdf32asdfasdfasdfasdfasdf456asdfasdfasdfadf262adsfasdfaasdf123sdddf"

Final: "asdfasdf$32asdfasdfasdfasdfasdf$456asdfasdfasdfadf$262adsfasdfaasdf$123sdddf"

The actual problem I have:

I have a long text I need to reformat. It is originally in this form: rows of data with a date and time heading.

The number of date/time units are variable. For each date/time, the number of data points is variable but there is at least one.

The date and time format are always: [0-9]{2}/[0-9]{2}( )[0-9]{4}

There is a CR and LF at the end of each line.

ORIGINAL:

12/06 1325

data1

data2

data3

...

datan

12/05 1215

data1

data2

data3

...

datan

12/05 0930

data1

data2

data3

...

datan

I need to take the long form and compact it thus. Convert each row of data into a single line with the data/time as a header.

FINAL:

12/06 1325

data1, data2, ... datan

12/05 1215

data1, data2, ... datan

12/05 0930

data1, data2, ... datan

The first thing I have done is:

$rawtext = StringRegExpReplace($rawtext,"[\r][\n],", ")

This turns the text into one single line:

12/06 1325, data1, data2, ... datan, 12/05 1215, data1, data2, ... datan, 12/05 0930, data1, data2, ... datan

Now I need to find each occurrence of a comma space followed by a date/time with: ((, )[0-9]{2}/[0-9]{2}( )[0-9]{4}) and (here is my problem) insert a chr(13) before the date and not replace the date.

I know I can use a loop and the @extended parameter to find the offsets and then use another loop to insert the chr(13) but is there a way to do it with StringRegExpReplace as I would think this would be faster since the list can be long.

Thanks for any help.

SDRS

Edited by sdrs

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

I am not a pro at StringRegExp, but here is what I got. It isn't based fully on StringRegExp, but seems to be fast enough.

#include <File.au3>

$File1 = @ScriptDir & "\Test.txt" ;source file
$File2 = @ScriptDir & "\Test2.txt" ;output file
$NextLine = ""

For $i = 0 To _FileCountLines($File1)
    $ReadLine = FileReadLine($File1, $i)

    If StringRegExp($ReadLine, "[0-9]{2}/[0-9]{2}\s[0-9]{4}") = 1 Then
        FileWrite($File2, StringTrimRight($NextLine, 1) & @CRLF & $ReadLine & @CRLF)
        $NextLine = ""
    Else
        $NextLine &= $ReadLine & ","
    EndIf
Next

FileWrite($File2, StringTrimRight($NextLine, 1)) ;write whatever is left

Edit: The answer to your first question is the use of backreferences. It is documented in the help file, but isn't easy to understand without an example. The "\0" is the backreference to the string matched by StringRegExp.

StringRegExpReplace($Orignial, "\d+", "$\0")
Edited by dantay9

[font="Verdana"] [size="2"]"[/size][/font]Failure is not an option -- it comes packaged with Windows"[font="Verdana"][size="2"] Gecko Web Browser[/size][/font][font="Verdana"][size="2"], [/size][/font][font="Verdana"][size="2"]Yahtzee![/size][/font][font="Verdana"][size="2"], Toolbar Launcher (like RocketDock)[/size][/font][font="Verdana"][size="2"]Internet Blocker, Simple Calculator, Local Weather, Easy GDI+ GUI [/size][/font][font="Verdana"][size="2"]Triangle Solver, TCP File Transfer, [/size][/font][font="Verdana"][size="2"]Valuater's Autoit Wrappers[/size][/font][font="Verdana"][size="3"][size="2"][size="2"]OOP In AutoIt[/size][/size][/size][/font][font="Verdana"][size="2"][size="1"]Using Windows XP SP3, 1GB RAM, AMD Athlon Processor @ 2.1 GHzCheck me out at gadgets.freehostrocket.com[/size][/size][/font]

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Thanks very much, dantay9.

That will get the job done.

It will be slow if the list is long as you have to use a loop. I am trying to do it without using a loop.

Is there a way to use regular expressions to find a target string and insert a character in front of the found string?

eg:

Find every occurrence of a number and insert a dollar sign in front of the first digit, not replace it:

Orignial: "asdfasdf32asdfasdfasdfasdfasdf456asdfasdfasdfadf262adsfasdfaasdf123sdddf"

Final: "asdfasdf$32asdfasdfasdfasdfasdf$456asdfasdfasdfadf$262adsfasdfaasdf$123sdddf"

Your option with be my backup.

SDRS

EDIT: just saw your backreference example. That will probably work much faster...just need to figure it out.

Thanks again so much.

SDRS

Edited by sdrs

Share this post


Link to post
Share on other sites

Thanks for the help again.

Here is my solution based on backreference incase anyone else has questions.

$rawtext=StringRegExpReplace($rawtext, "([0-9]{2}/[0-9]{2}( )[0-9]{4})", "!x!\0")

$rawtext=StringRegExpReplace($rawtext, "!x!", chr(13))

This will insert a !x! before the date/time and the next line will replace the !x! with a carriage return. I chose the !x! as it is unlikely to be found in the data.

SDRS

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

Needs only one step:

$string = 'asdfasdf32asdfasdfasdfasdfasdf456asdfasdfasdfadf262adsfasdfaasdf123sdddf'
ConsoleWrite(StringRegExpReplace($string, '\d+', '$\0') & @CRLF)

result: asdfasdf$32asdfasdfasdfasdfasdf$456asdfasdfasdfadf$262adsfasdfaasdf$123sdddf

Edited by BugFix

Best Regards BugFix  

Share this post


Link to post
Share on other sites

I am not a pro at StringRegExp, but here is what I got. It isn't based fully on StringRegExp, but seems to be fast enough.

#include <File.au3>

$File1 = @ScriptDir & "\Test.txt" ;source file
$File2 = @ScriptDir & "\Test2.txt" ;output file
$NextLine = ""

For $i = 0 To _FileCountLines($File1)
    $ReadLine = FileReadLine($File1, $i)

    If StringRegExp($ReadLine, "[0-9]{2}/[0-9]{2}\s[0-9]{4}") = 1 Then
        FileWrite($File2, StringTrimRight($NextLine, 1) & @CRLF & $ReadLine & @CRLF)
        $NextLine = ""
    Else
        $NextLine &= $ReadLine & ","
    EndIf
Next

FileWrite($File2, StringTrimRight($NextLine, 1)) ;write whatever is left

Edit: The answer to your first question is the use of backreferences. It is documented in the help file, but isn't easy to understand without an example. The "\0" is the backreference to the string matched by StringRegExp.

StringRegExpReplace($Orignial, "\d+", "$\0")

dantay, to read all the lines out of one file you should never use FileReadLine. The help file specifically says:

From a performance standpoint it is a bad idea to read line by line specifying "line" parameter whose value is incrementing by one. This forces AutoIt to reread the file from the beginning until it reach the specified line.

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

i'm going to make a point that regular expressions are far more powerful than what people think, so programs end up over complicated and slow (for instance, you can read each line individually using ^ and $, each referring to after a \n and just before a \n, respectively)

http://www.regular-expressions.info/

this gives you some good information on how to use regular expressions, and you can test some javascript regular expressions there, live; though you need to test the rest on your own

Now I need to find each occurrence of a comma space followed by a date/time with: ((, )[0-9]{2}/[0-9]{2}( )[0-9]{4}) and (here is my problem) insert a chr(13) before the date and not replace the date.

\d\d/\d\d(?:/\d{2,4})? will match any date, with and optional 2 or 4 digit year, and the year, if found, will not be in a back-reference group

StringRegExpReplace(StringRegExpReplace($string, "((?:\r?\n)+)", ", ", 0), "(?:,\s*)*(\d\d/\d\d\s+\d{4})(?:,\s*)*", @LF&"\1", 0)

that will return a line just how you would like for the second part

Edited by skyboy

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

@Mandar

Thanks. I will keep that in mind for future code and maybe even change a few of my completed scripts.

Here is a tool that I use to get the basis of my expressions. It really helps because everything is right there.

Edited by dantay9

[font="Verdana"] [size="2"]"[/size][/font]Failure is not an option -- it comes packaged with Windows"[font="Verdana"][size="2"] Gecko Web Browser[/size][/font][font="Verdana"][size="2"], [/size][/font][font="Verdana"][size="2"]Yahtzee![/size][/font][font="Verdana"][size="2"], Toolbar Launcher (like RocketDock)[/size][/font][font="Verdana"][size="2"]Internet Blocker, Simple Calculator, Local Weather, Easy GDI+ GUI [/size][/font][font="Verdana"][size="2"]Triangle Solver, TCP File Transfer, [/size][/font][font="Verdana"][size="2"]Valuater's Autoit Wrappers[/size][/font][font="Verdana"][size="3"][size="2"][size="2"]OOP In AutoIt[/size][/size][/size][/font][font="Verdana"][size="2"][size="1"]Using Windows XP SP3, 1GB RAM, AMD Athlon Processor @ 2.1 GHzCheck me out at gadgets.freehostrocket.com[/size][/size][/font]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0