Jump to content

StringRegExReplace in File.


Recommended Posts

I have a file (see attached file) with a string all line and this problem on here is I want to separate all $00:, $03:, $10:, $20:, $25:, $30:, $40:, $45:, $110:, $115:, $120: and $T. It's mean that each $ with value start a new line ( a new paragraph). I tried with Regular Expression in notepad++ ex:

Find ($00:, $01:, $03: and so on) with regex (\$)([0-9]+): and replace is \r\n\1\2 (I think \r\n is @CRLF (not sure :() )

Find $T with regex (\$T)(.*?)(\$T) and replace is \1\2\r\n\3

When I try these regex to replace in notepad on StringRegexReplace the results is incorrect :(. I have read some example simple about regex. Please advise me how to do that with some example on autoit :(. The result will be in attached photo. Thanks 

ahihi.txt

result.png

Link to comment
Share on other sites

  • Moderators

@tezhihi please wait 24 hours before bumping your thread. All of our forum members assist as they are able, and not all are in the same time zone; the person best able to assist you may not be online right now.

"Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball

How to get your question answered on this forum!

Link to comment
Share on other sites

11 minutes ago, JLogan3o13 said:

@tezhihi please wait 24 hours before bumping your thread. All of our forum members assist as they are able, and not all are in the same time zone; the person best able to assist you may not be online right now.

Sorry about that. Thanks you

Link to comment
Share on other sites

What about this ?

$txt = FileRead("ahihi.txt")

$output = StringRegExpReplace($txt, '(?<!^|:)(?=\$(?:00|03|10|20|25|30|40|45|110|115|120|T))', @crlf)
FileWrite("output.txt", $output)

Meaning :
Search positions (not preceded by start of text or a colon) and (followed by $00, $03 and so on)
and at these positions insert a crlf
:)

Edited by mikell
Link to comment
Share on other sites

10 hours ago, mikell said:

What about this ?

$txt = FileRead("ahihi.txt")

$output = StringRegExpReplace($txt, '(?<!^|:)(?=\$(?:00|03|10|20|25|30|40|45|110|115|120|T))', @crlf)
FileWrite("output.txt", $output)

Meaning :
Search positions (not preceded by start of text or a colon) and (followed by $00, $03 and so on)
and at these positions insert a crlf
:)

Awesome. I have a more question.

I want to link between 2 $ with text below:

$F$%$?$%- - - - - - - - - - - - - - - - - -Footnotes- - - - - - - - - - - - - - - - - -
$Tn1 Subsequent to oral argument by letter, dated December 16, 2005 Scrushy's attorney submitted to the Court a "Memo" dated 7/19/04 and apparently prepared by a Pamela Anderson. This "Memo" contains unverified, unsworn rank hearsay and the Court has not considered the contents of this document in reaching its decision.$%$?$%- - - - - - - - - - - - - - - - -End
 Footnotes- - - - - - - - - - - - - - - - -$E$=P1298*3

$F with $Tn1 and Delete $%$?$%- - - - - - - - - - - - - - - - -End Footnotes- - - - - - - - - - - - - - - - -             and the result should be 

$F$Tn1 Subsequent to oral argument by letter, dated December 16, 2005 Scrushy's attorney submitted to the Court a "Memo" dated 7/19/04 and apparently prepared by a Pamela Anderson. This "Memo" contains unverified, unsworn rank hearsay and the Court has not considered the contents of this document in reaching its decision.$E$=P1298*3

Kindly check and advise me. Thanks you

Edited by tezhihi
Link to comment
Share on other sites

Here it is

$str = '$F$%$?$%- - - - - - - - - - - - - - - - - -Footnotes- - - - - - - - - - - - - - - - - -' & @crlf & _ 
'$Tn1 Subsequent to oral argument by letter, dated December 16, 2005 Scrushy''s attorney submitted to the Court a "Memo" dated 7/19/04 and apparently prepared by a Pamela Anderson. This "Memo" contains unverified, unsworn rank hearsay and the Court has not considered the contents of this document in reaching its decision.$%$?$%- - - - - - - - - - - - - - - - -End' & @crlf & _ 
' Footnotes- - - - - - - - - - - - - - - - -$E$=P1298*3'

;msgbox(0,"", $str)

$str2 = StringRegExpReplace($str, '\$%\$\?\$%[^\$]+', "")
msgbox(0,"", $str2)

This removes any non-$ char from "$%$?$%" up to the next $ char
Please note that you have to first manage the single quote(s) included in the string
:)

Link to comment
Share on other sites

58 minutes ago, mikell said:

Here it is

$str = '$F$%$?$%- - - - - - - - - - - - - - - - - -Footnotes- - - - - - - - - - - - - - - - - -' & @crlf & _ 
'$Tn1 Subsequent to oral argument by letter, dated December 16, 2005 Scrushy''s attorney submitted to the Court a "Memo" dated 7/19/04 and apparently prepared by a Pamela Anderson. This "Memo" contains unverified, unsworn rank hearsay and the Court has not considered the contents of this document in reaching its decision.$%$?$%- - - - - - - - - - - - - - - - -End' & @crlf & _ 
' Footnotes- - - - - - - - - - - - - - - - -$E$=P1298*3'

;msgbox(0,"", $str)

$str2 = StringRegExpReplace($str, '\$%\$\?\$%[^\$]+', "")
msgbox(0,"", $str2)

This removes any non-$ char from "$%$?$%" up to the next $ char
Please note that you have to first manage the single quote(s) included in the string
:)

Can I search the string "$=<$=T3*1 $=L00180000845001096*001106" on regex

1.  "\$=<\$=T([0-9]+)\*([0-9]+)\s\$=L.+?\*(([0-9]+){6})"

or

2. "\$=<\$=T[^\$]+\$=L[^\$]+"

I want to remove it :)

Edited by tezhihi
Link to comment
Share on other sites

28 minutes ago, mikell said:

I saw that this string exists (with various flavours) several times in the file. To remove them all you might use a more selective expression to limit the risk of errors

StringRegExpReplace($str, '\$=<\$=T3\*1\s+\$=L\d+\*\d+', "")

:)

Oh thanks youuuu so much.

He received the following bonuses for the years shown:$M05,06,13,13,13,33$GFOR$JANNUAL$JTARGET$JTOTAL$JDESCRIPTION$GYEAR$JBONUS$JBONUS$JINCENTIVE$G$J$J$JBONUS$D$Q2002$B$ 10,000,000$B$ 1,200,000$Y$ 11,200,000$BAnnual Bonus in and for 2002$Q$B$B$Y$Balso paid in 2002. Net income$Q$B$B$Y$Bwas negative $ (467 million.)$Q$B$B$Y$BApproved by the Compensation$Q$B$B$Y$BCommittee on April 29, 2002.$Q$Q2001$B$ 6,500,000$B$ 2,400,000$Y$ 8,900,000$BNet income for 2001 was negative$Q$B$B$Y$B$ (191 million). Bonuses$Q$B$B$Y$Breported in the proxy filed$Q$B$B$Y$BApril 12, 2002.$Q$Q2000$BNone.$B$ 2,154,849$Y$ 2,154,849$BNet income for 2000 was a$Q$B$B$Y$Bnegative $ (364 million).$Q$Q1999$BNone.$B$ 134,031$Y$ 134, 031$BFinancials fraudulent but no$Q$B$B$Y$Baudited restatement.$Q$Q1998$BNone.$B$ 1,577,829$Y$ 1,577,829$BFinancials fraudulent but no$Q$B$B$Y$Baudited restatement.$Q$Q1997$B$ 10,000,000$B$ 2,400,000$Y$ 12,400,000$BAnnual Bonus was reported in$Q$B$B$Y$Bproxy dated April 17, 1998, as$Q$B$B$Y$Bbeing for and earned in 1997.$Q$Q1996$B$ 8,000,000$B$ 2,400,000$Y$ 10,400,000$BAnnual Bonus was reported in$Q$B$B$Y$Bproxy dated April 9, 1997, as$Q$B$B$Y$Bbeing for, and earned in, 1996.$Q$QTotal$B$ 34,500,000$B$ 12,266,709$Y$ 46,766,709$BBefore Prejudgment Interest$X$=P1298*7 $T5. In each annual proxy on Form 14A, HealthSouth disclosed the following criteria for Incentive Bonuses paid to its executives:$=S$%$?$%Incentive Compensation: In addition to base salary, the $(Compensation$) Committee recommends to the Board of Directors cash incentive compensation for HealthSouth's executives, based on each executive's success in meeting qualitative and quantitative performance goals on an annual basis.

Are I correct if i use these code below:

StringRegExpReplace(FileRead("ahihi.txt"), '(?=\$(?:M))', @crlf)     ;For create a new paragraph with $M (ex: $M05,06,13,13,13,33$GFOR$JA ......)
StringRegExpReplace(FileRead("ahihi.txt"), '\$=P\d+\*\d+', '')       ;For remove all $=P1298*x
StringRegExpReplace(FileRead("ahihi.txt"), '\$=>', '')               ;For remove all $=>

 

Link to comment
Share on other sites

Yes you are. They will all work correctly, some remarks yet :

The first one will insert a crlf just before any $M encountered, so it could be done a little simpler

StringRegExpReplace(FileRead("ahihi.txt"), '(?=\$M)', @crlf)

If the second one is made to remove all "$=P1298*x" precisely then it could be done more selective - which is always a good idea when using regex

StringRegExpReplace(FileRead("ahihi.txt"), '\$=P1298\*\d+', '')

Nothing to say about the third  :)

Link to comment
Share on other sites

On 5/6/2017 at 1:49 AM, mikell said:

Yes you are. They will all work correctly, some remarks yet :

The first one will insert a crlf just before any $M encountered, so it could be done a little simpler

StringRegExpReplace(FileRead("ahihi.txt"), '(?=\$M)', @crlf)

If the second one is made to remove all "$=P1298*x" precisely then it could be done more selective - which is always a good idea when using regex

StringRegExpReplace(FileRead("ahihi.txt"), '\$=P1298\*\d+', '')

Nothing to say about the third  :)

Case 1:

After I used the code below:

#include <File.au3>
$a = FileRead(@ScriptDir & '\ahihi.txt')
$b = StringRegExpReplace($a, '(\$%\$\?\$%)(?=\$=B)', '\1' & @CRLF & '\2')
$c = StringRegExpReplace($b, '\$%\$\?\$%[^\$]+', "")
$d = StringRegExpReplace($c, '(?<!^|:)(?=\$(?:00|01|03|10|20|25|30|40|45|110|115|120|T|F|200|220))', @crlf)
$e = StringRegExpReplace($d, '(?=\$M)', @crlf)
$f = StringRegExpReplace($e, '(\$F)(\R)(\$Tn)', '\1\3')
$g = StringRegExpReplace($f, '\$=<\$=T\d+\*\d+\s+\$=L\d+\*\d+', '')
$h = StringRegExpReplace($g, '\$=P\d+\*\d+', '')
$i = StringRegExpReplace($h, '(\$E)\$%\$\?\$%(\$=B)', '')
$j = StringRegExpReplace($i, '\$I\s+\$U', '$I$U')
$k = StringRegExpReplace($j, '\$=>', '')
FileWrite("output.txt", $k)

Output file has been appeared problem (please see in the red box on image ____ All $T with symbol ' : ' ). I think this problem will be solve when i use code below:

Untitled-2.png

StringRegExpReplace(FileRead("ahihi.txt"), '(\:)(\$T)(\$=B)', '\1' & @CRLF & '\2\3')

StringRegExpReplace(FileRead("ahihi.txt"), '([a-z]+)(\:)(\$T)', '\1\2' & @CRLF & '\3')

Otherwise, please advise me.

 

Case 2:

$str = '$T$F$%$?$%- - - - - - - - - - - - - - - - - -Footnotes- - - - - - - - - - - - - - - - - - n1 Liggett has been rewarded handsomely in the settlement with the Attorneys General for its historic cooperation. $%$?$%- - - - - - - - - - - - - - - - -End Footnotes- - - - - - - - - - - - - - - - -$E'   ;This is one line not include @CRLF

$str2 = StringRegExpReplace($str, '\$%\$\?\$%[^\$]+', "")

The results is ' $F$E ' not include text data inside.

I will resolve it as code below:

$str = '$T$F$%$?$%- - - - - - - - - - - - - - - - - -Footnotes- - - - - - - - - - - - - - - - - - n1 Liggett has been rewarded handsomely in the settlement with the Attorneys General for its historic cooperation. $%$?$%- - - - - - - - - - - - - - - - -End Footnotes- - - - - - - - - - - - - - - - -$E'

$str2 = StringRegExpReplace($str, '(\$%\$\?\$%[^\$]+)(n\d+)', '\2')

Otherwise, please advise me.

Edited by tezhihi
Link to comment
Share on other sites

The case 2 could be done like this

StringRegExpReplace($str, '\$%\$\?\$%.+?(?=n\d|\$)', "")

For case 1 this is possible

StringRegExpReplace($str, '(?<=:)(?=\$T\$=B)', @CRLF)

But it becomes a little difficult for me as I don't know exactly which "$x" things you want to keep and which ones you want to remove :sweating:
Maybe all this could be done simpler depending on what should be the final result after complete treatment of the initial file

Link to comment
Share on other sites

@tezhihi

6 hours ago, mikell said:

Maybe all this could be done simpler depending on what should be the final result

For instance could this be correct ?

$s = FileRead(@ScriptDir & '\ahihi.txt')
$s = StringRegExpReplace($s, '\R|(\$F)?\$%\$\?\$%[^\$]*', "")  ; footnotes
$s = StringRegExpReplace($s, '(?<!^|:)(?=\$(?|00|01|03|10|20|25|30|40|45|110|115|120|200|220|T|M))', @crlf)
$s = StringRegExpReplace($s, '(\$(?|[DXOENUI\?%]|=[BRIS<>]|=[PLT]\d+\*\d+))+', "")
$s = StringRegExpReplace($s, '(\$[MBJGQY])+', " ")

$s = StringRegExpReplace($s, '\$T', "")
FileWrite("output.txt", $s)

 

Link to comment
Share on other sites

13 hours ago, mikell said:

@tezhihi

For instance could this be correct ?

$s = FileRead(@ScriptDir & '\ahihi.txt')
$s = StringRegExpReplace($s, '\R|(\$F)?\$%\$\?\$%[^\$]*', "")  ; use this code will be deleted some text data
$s = StringRegExpReplace($s, '(?<!^|:)(?=\$(?|00|01|03|10|20|25|30|40|45|110|115|120|200|220|T|M))', @crlf) 
$s = StringRegExpReplace($s, '(\$(?|[DXOENUI\?%]|=[BRIS<>]|=[PLT]\d+\*\d+))+', "") ; use this code will delete all $ with value. Each $ corresponding with font or indent at start paragraph and some others data.
$s = StringRegExpReplace($s, '(\$[MBJGQY])+', " ") ; just make a new paragraph start with $M and end with $X. I will modify after replace all done

$s = StringRegExpReplace($s, '\$T', "") ; $T is segment required in text file = indent at start paragraph
FileWrite("output.txt", $s)

 

@mikell oh no :( the result of output will be same with the result from output.txt of my code :(. Please see remark on your code.

 

I will send you the list of data need to delete in text file. Please see  new 2.txt     

I have multiple file like that for modify. I need to make a tool for un string all of them. Please check the correctness of the code below

#include <File.au3>
$a = FileRead(@ScriptDir & '\ahihi.txt')
$a = StringRegExpReplace($a, '(\$%\$\?\$%)(?=\$=B)', '\1' & @CRLF & '\2')
$a = StringRegExpReplace($a, '(\$%\$\?\$%[^\$]+Footnotes[^\$]+)(n\d+)', '\2')
$a = StringRegExpReplace($a, '\$%\$\?\$%[^\$]+Footnotes[^\$]+', '')
$a = StringRegExpReplace($a, '(\$%\$\?\$%[^\$]+)(\$E)', '\2')
$a = StringRegExpReplace($a, '(?<!^|:)(?=\$(?:00|01|03|10|20|25|30|40|45|110|115|120|T|F|200|220))', @CRLF)
$a = StringRegExpReplace($a, '(?=\$M)', @CRLF)
$a = StringRegExpReplace($a, '(?=\$=S)', @CRLF)
$a = StringRegExpReplace($a, '(?=\$%\$\?\$%)', @CRLF)
$a = StringRegExpReplace($a, '(\$F)(\R)(\$Tn)', '\1\3')
$a = StringRegExpReplace($a, '(\$T)(\R)(\$F)', '\3\1')
$a = StringRegExpReplace($a, '(\s)(\$E)', '\2')
$a = StringRegExpReplace($a, '\$=<\$=T\d+\*\d+\s+\$=L\d+\*\d+', '')
$a = StringRegExpReplace($a, '\$=<\$=T\d+\*[^\$]+\$=L\d+\*\d+', '')
$a = StringRegExpReplace($a, '\$=P\d+\*\d+', '')
$a = StringRegExpReplace($a, '(\$E)\$%\$\?\$%(\$=B)', '')
$a = StringRegExpReplace($a, '\$I\s+\$U', '$I$U')
$a = StringRegExpReplace($a, '\$=>', '')
$a = StringRegExpReplace($a, '(?<=:)(?=\$T\$=B)', @CRLF)
$a = StringRegExpReplace($a, '([A-Z][a-z].+\:)(\$T)', '\1' & @CRLF & '\2')
$a = StringRegExpReplace($a, '(\$E)(\$=B)', '\1' & @CRLF & '\2')
$a = StringRegExpReplace($a, '(\$00:)(.+)', '\1')
$a = StringRegExpReplace($a, '\$03:.+\R', '')
$a = StringRegExpReplace($a, '\$30:.+\R', '')
$a = StringRegExpReplace($a, '(\$200:)(.+)', '\1')
$a = StringRegExpReplace($a, '(\$%\$\?\$%)(\R)(\$=B)', '\1\3')
$a = StringRegExpReplace($a, '(\$120:)(\R)(\$T)', '\1\3')
$a = StringRegExpReplace($a, '(\$120:)(\R)(\$%\$\?\$%)', '\1\3')
$a = StringRegExpReplace($a, '(\$120:\$%\$\?\$%)(\R)(\$=B)', '\1\3')
$a = StringRegExpReplace($a, '(\$=S)(\R)(\$%\$\?\$%)', '\1\3')
$a = StringRegExpReplace($a, '(\$%\$\?\$%)(\s)', '\1')
$a = StringRegExpReplace($a, '(\$01:.+)(\R\$%\$\?\$%.+)', '\1')
$a = StringRegExpReplace($a, '\$(?|=L\d+\*\d+)', '')
FileWrite("output.txt", $a)

 

Edited by tezhihi
Link to comment
Share on other sites

I do follow... my previous post was just a little play - because obviously you need a final file with a very particular formatting
I tried to guess the signification of the various $x (i.e. $F = Footnotes, and so on) but I quickly gave up :sweating:
Actually you have a bunch of SRER and some of them are redundant , but they are not so bad and they do the job
To help you I need precise infos... the best would be to build manually a final file, so when comparing I can know exactly what you want :)

 

Link to comment
Share on other sites

1 hour ago, mikell said:

I do follow... my previous post was just a little play - because obviously you need a final file with a very particular formatting
I tried to guess the signification of the various $x (i.e. $F = Footnotes, and so on) but I quickly gave up :sweating:
Actually you have a bunch of SRER and some of them are redundant , but they are not so bad and they do the job
To help you I need precise infos... the best would be to build manually a final file, so when comparing I can know exactly what you want :)

 

@mikell Ok i will send for you the completed file for compare when im in office. This file completed with the best person. I will send you the full information for processing with regex with all take note of them. :) 

 

 

Link to comment
Share on other sites

6 hours ago, mikell said:

Nice. The more I have infos, the more I can make suggestions  :)

Hi @mikell can you check the For Mikell.xls file with remark and check EXAMPLE.TXT (This is Results of ahihi.txt) and help me. If you need more information please advise me.

Oh I have a new one file for you to try processing: New For Process.txt

Edited by tezhihi
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...