ThomasPowers Posted October 12, 2012 Share Posted October 12, 2012 Hello All, I have an interesting one that I hope you guys may have some insight on. I have text files that are generated automatically by one of our systems. I need to write a script that looks for a line in the text fill that starts with a certain sequence....namely 799R Once found, I need to delete the whole line, the 2 lines above it, and the line below it. Now...the lines above it will start with a 5 for two lines above, and a 6 for the line directly above The line below will start with an 8. So the sample is as such: 5225HTY 3383693141WEBINSTALLATION 12073100010759011651425785 6261220000123456789 0000001747N/A TOUCH 1075901135123456 799R1012345678910123456789 50523569874587859 822512546589754879658965874587458 666999874587445878584 The Script would find the 799R in the 3rd line, and delete the lines above it and the one below it. Now...here's the catch.... The lines below it may repeat lines 6X and 7X....and the final 8 line may be a couple lines below like this: 5225HTY 3383693141WEBINSTALLATION 12073100010759011651425785 6261220000123456789 0000001747N/A TOUCH 1075901135123456 799R1012345678910123456789 50523569874587859 6261220000155555555 0000001747N/A TOUCH 1075901135125555 799R101234567891012345555 50523569874585555 822512546589754879658965874587458 666999874587445878584 So...the goal is this: 1. Find the 799R line 2. Delete the 2 lines above (starting with 5 and 6) 3. Delete the line or lines below (could be 6, 7, or 8 to start with) Final line must NOT start with a 5 (this denotes a new record entry) All help is appreciated...and I am willing to PayPal $ as payment for help with this issue. Thanks Tom P Link to comment Share on other sites More sharing options...
jchd Posted October 13, 2012 Share Posted October 13, 2012 (edited) Your specs are a little contradictory.You say first: "Once found, I need to delete the whole line, the 2 lines above it, and the line below it."Then, you go on explaining that a record consists of a 5* line, a series of {6*, 7*} lines and a 8* line that you all delete if we refer to the sentence above. Or do you keep the 799R* line?Also, we have no control over what the "final line" will be: only the program producing the data has. So "Final line must NOT start with a 5" seems irrelevant.Can you post a sample of all possible "record" configurations and say what you want to keep. Only the line headers matter.Do you have 7* lines (records) which are not 799R* ?Are multiple series of 6* and 7* in a record always in ordered pairs (e.g. 5* then one or more group of {6* then 7*} then 8*, or can we have 5*, 6*, 6*, 6*, 7*, 6*, 6*, 7*, 8* ?I'm wild guessing that a suitable regular exp<b></b>ression could do the job in one shot but we have to settle on very precise rules for that.EDIT: forgot to mention that forum rules preclude offering money for code. Edited October 13, 2012 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
ThomasPowers Posted October 13, 2012 Author Share Posted October 13, 2012 Sorry about that...didn't mean to violate the rules of the forum. I apologize for not being better at knowing the rules before posting. To answer the questions..... "Then, you go on explaining that a record consists of a 5* line, a series of {6*, 7*} lines and a 8* line that you all delete if we refer to the sentence above. Or do you keep the 799R* line?" - Each record in this file that we will be concerned with starts with a line with a 5*. We wish to keep all records in tact unless they have a 3rd line of 799R in the beginning. If the record has that line, then we wish to delete all lines associated with that record (from the 5* that started the record to the ending 8* line below the 799R line, including the 799R line)....we want all of the record gone. "Only the line headers matter.Do you have 7* lines (records) which are not 799R* ?" - Yes...there may be lines in the file that start with 7, but are not 799R lines...so we would want that to stay. "Are multiple series of 6* and 7* in a record always in ordered pairs (e.g. 5* then one or more group of {6* then 7*} then 8*, or can we have 5*, 6*, 6*, 6*, 7*, 6*, 6*, 7*, 8* ?" - The order would be 5* 6* 799r* then it could be 6*, 7* or 8*... so we could see 6* and 7* a few times before the final line of the record being an 8*, but we will see the beginning of the record starting with a 5* line, followed by a 6* line, then the 799R line, then any combination of 6* or 7*, then ending with the 8* line. I concur with your idea that a pretty wild exp will be the answer....it's just that I am in the dark as to the specifics to make it work, especially against the whole file. All help is greatly appreciated. If a sample of the file would be helpful, I could take parts of one and paste them up here. I will edit one up tonight and copy it into the forum post. Thank you for all your insight and help. TP Link to comment Share on other sites More sharing options...
Spiff59 Posted October 13, 2012 Share Posted October 13, 2012 (edited) An old-school method (untested): #include<array.au3> #include<file.au3> Global $array, $recordstart, $recordend, $deleteflag FileReadToArray($file, $array) _ArrayDisplay($array) Local $idx For $x = 1 to $array[0] Switch StringLeft($array[$x], 1) Case "5" $recordstart = $x Case "7" If StringLeft($array[$x], 4) = "799R" Then $deleteflag = 1 Case "8" $recordend = $x EndSwitch If $recordend Then If Not $deleteflag Then For $y = $recordstart to $recordend $idx += 1 $array[$idx] = $array[$y] Next Else $deleteflag = 0 EndIf $recordend = 0 EndIf Next ReDim $array[$idx + 1] $array[0] = $idx _ArrayDisplay($array) Edited October 13, 2012 by Spiff59 Link to comment Share on other sites More sharing options...
jchd Posted October 13, 2012 Share Posted October 13, 2012 (edited) Much clearer now. Does this work as you want in all cases? Local $s = FileRead('logs.txt') Local $t = StringRegExpReplace($s, "(?im)(^5.*R6.*R799R.*R(?:[67].*R)*8.*R)", "") ConsoleWrite($t) EDIT before I forget: if you're unsure that the last line of a 799R* record ends with a line break, append @CRLF to the data read or make that precise R optional, e.g. R? Edited October 13, 2012 by jchd This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
ThomasPowers Posted October 15, 2012 Author Share Posted October 15, 2012 (edited) This appears to be working as we hoped, and I appreciate all your insight. I was able to get my regexp to do the simple 4 line ones...but was stumped on the whole extra possible 6 and 7 lines after the 799R. We are testing this on multiple records and files now. What would I add to this to get the Records that we are removing from the file dumped off to another file so I could easily see what is being removed. Right now it's a line by line comparison of the old file to the new file...which can get cumbersome as some of these files have 10000 lines. You have really saved us on this and I cannot thank you enough TP Edited October 15, 2012 by ThomasPowers Link to comment Share on other sites More sharing options...
ThomasPowers Posted October 15, 2012 Author Share Posted October 15, 2012 OK...while reading through some other forum posts...I have gotten closer to the finished product. Many thanks to JCHD. My goal has shifted a bit now as we have a regexp string that works. I am looking to load all matches of the RegExp "(?im)(^5.*R6.*R799R.*R(?:[67].*R)*8.*R)" into an array and save them off to another file, then using the StringRegReplace that JHCD created to output to a different file (leaving the parent file untouched.) So Far...I have accomplished that with this code Local $s = FileRead('c:testfile.txt') Local $array = StringRegExp($s,"(?im)(^5.*R6.*R799R.*R(?:[67].*R)*8.*R)", 3) For $i = 0 To UBound($array) - 1 FileWrite ('c:outputoutput1.txt', $array[$i]) Next Local $t = StringRegExpReplace($s, "(?im)(^5.*R6.*R799R.*R(?:[67].*R)*8.*R)", "") FileWrite ('c:outputoutput2.txt', $t) This is working great...except for one small thing. The Filewrite command for output2.txt is appending a line at the end of the file and putting a 1 in it. An example is here: The last lines of the adjusted file in output2.txt should be: 5271ABCDEFDD 0075901134OUTPUTNOTREAL 1512101500010759011111111111 6220712324543234455555 000001234567SAMPLETEXT 0075901157898733 But instead the output2 file has this: 5271ABCDEFDD 0075901134OUTPUTNOTREAL 1512101500010759011111111111 6220712324543234455555 000001234567SAMPLETEXT 0075901157898733 1 All help is appreciated....we are almost there!! TP Link to comment Share on other sites More sharing options...
kylomas Posted October 15, 2012 Share Posted October 15, 2012 ThomasPowers, One thing to be aware of is that when you use filewrite in this fashion it APPENDS whatever you are writing to the end of the file. To make sure that you are getting what you expect from the regexpreplace use a consolewrite at the end of your code like thisconsolewrite('+> value of $t = ' & $t & @lf) kylomas Forum Rules        Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
ThomasPowers Posted October 15, 2012 Author Share Posted October 15, 2012 Kylomas,Thank you for pointing that out...I can see where I could run into trouble in the future on that. I'll make sure to use a fileopen command with a 2 parameter to denote overwrite.Following your suggestion...When we put theConsoleWrite($t)at the end of the code (instead of the Filewrite) it shows an output of the last 2 lines of 5271ABCDEFDD 0075901134OUTPUTNOTREAL 15121015000107590111111111116220712324543234455555 000001234567SAMPLETEXT 00759011578987331>Exit code: 0 Time: 0.230SInce it puts the Exit Code right after the 1 I didn't see that at first.So it's the StrinRegExpReplace line doing the additon of the 1 on the last line.Any idea why?TP Link to comment Share on other sites More sharing options...
kylomas Posted October 15, 2012 Share Posted October 15, 2012 ThomasPowers,As I suspected, however, prolonged exposure to regular expression may cause dizziness or vomiting so I cannot help you there. You will recieve help soon, be patient!Good Luck,kylomas Forum Rules        Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
jchd Posted October 16, 2012 Share Posted October 16, 2012 I can't reproduce this behavior with a dummy sample. Can you post a short sample input text that behaves the way you mention (adding a line containing '1')? This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
ThomasPowers Posted October 16, 2012 Author Share Posted October 16, 2012 You can disregard the line adding 1 to the end of the parse...turns out that the file we were using was corrupted in testing. When we used fresh files, everything worked great. When we rerun the file that we got the 1 on the end, it works great as well...it's just the original test copy of the file we were using was bad. The program we generate these from couldn't read our test file anymore either...so the file was the problem. Everyone here has been great and we really appreciate the help. We will continue to put this through it's paces this week and if anything comes up...we'll post back. Thanks again!! TP Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now