Jump to content

Recommended Posts

Posted

i have a text file with 1 million lines where i have to remove 800k lines. i want to remove the 800k lines and keep the 200k lines but the line order has to remain the same. lines removed should become empty lines so there should still be 1 million lines but only 200k with text

Posted

you're going to go way over the size limit for variables in autoit with that job unless you shuffle things around as you go.  Might be less hassle to just do it in a text editor like npp or sublime with find/replace and a regex.  If this doesn't need to be automated that is.

Posted
49 minutes ago, gruntydatsun said:

you're going to go way over the size limit for variables in autoit

It depends on the size of the lines ;)

;#cs
$txt = ""
For $i = 0 to 1500000
   $txt &= "this is the line of text #" & $i & @crlf
Next
FileWrite("1.txt", $txt)
;#ce

$txt = FileRead("1.txt")
; remove text from lines ending with 12, 14, 16
$new = StringRegExpReplace($txt, '(?m)^.*1[246]$', "")
FileWrite("new.txt", $new)

 

Posted

here's an example of the text file before and after

option: remove lines containing "items4"

there's 200k lines to be removed in 1 million lines.the lines removed have to stay empty (like in the screenshot), it's a must

toremovelinefrom.png

linesremoved.png

  • 1 year later...
Posted (edited)

If you don't want to grab the matches into an array using the usual StringRegExp, to get the result as a string you have to introduce in the StringRegExpReplace a kind of negation to say : If the lines do NOT contain "items4"  then fire them
Here is a way :

$txt = "line1,items1,testtext1" & @crlf & _ 
    "line2,items4,testtext2" & @crlf & _ 
    "line3,items3,testtext3" & @crlf & _ 
    "line4,items4,testtext4" & @crlf & _ 
    "line5,items5,testtext5" & @crlf & _ 
    "line6,items6,testtext6" & @crlf

$in = "items4"

$res = StringRegExpReplace($txt, '(?m)^(.*\Q' & $in & '\E.*(*SKIP)(*F)|.*)$\R?', "")
Msgbox(0,"", $res)

In this alternation, the left side first matches the lines containing "items4", then (*SKIP)(*F) says 'No no, I don't want this", then all other lines (not containing "items4") are matched by the right side of the alternation and replaced by ""

Edit
This example doesn't replace fired lines with blank lines. To get blank lines just remove \R? (which means optional newline sequence)

:)

Edited by mikell
Posted

Hi to both of you :)
Mikell, to blank all lines except the "items4", I just tried a "negative lookahead" (my 1st one !).  Do you think it's correct  ?
Based on your example :

$txt = "line1,items1,testtext1" & @crlf & _
    "line2,items4,testtext2" & @crlf & _
    "line3,items3,testtext3" & @crlf & _
    "line4,items4,testtext4" & @crlf & _
    "line5,items5,testtext5" & @crlf & _
    "line6,items6,testtext6" & @crlf

$res = StringRegExpReplace($txt, '(?m)^(?!.*items4).*$', "...")
Msgbox(0, "Dots dots", $res)

5c4876629a2fb_dotsdots.jpg.519a3cc39a72bed74f690446c222285b.jpg

The 3 dots "..." are here just to make blank lines clearly visible in the image. Replace with "" when desired

"I think you are searching a bug where there is no bug... don't listen to bad advice."

Posted (edited)

Hi Nine :)
Let's hope Mikell, Jchd or another RegExp guru will bring you the explanation you desire
I was lucky enough to have this "negative lookahead" working, after I read that "you can use any regular expression inside the lookahead (note that this is not the case with lookbehind)"

A complementary question could be, in the preceding example :
Why a negative lookahead (?!.* doesn't return the same results as .*(?!
when a positive lookahead (?=.* returns the same results as .*(?=

 

Edited by pixelsearch

"I think you are searching a bug where there is no bug... don't listen to bad advice."

Posted

 fun for adding mulitple criteria as well ( i think, this could be all wrong).

$txt = "line1,items1,testtext1" & @crlf & _
    "line2,items4,testtext2" & @crlf & _
    "line3,items3,testtext3" & @crlf & _
    "line4,items4,testtext4" & @crlf & _
    "line5,items5,testtext5" & @crlf & _
    "line6,items6,testtext6" & @crlf

;~ $in = "(items4)"
$in = "(items4|items5)"

$s = StringRegExpReplace($txt , "(line.*" & $in & ".*?)\s" , " ")

msgbox(0, '' , $s)

 

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Posted
7 hours ago, pixelsearch said:

Do you think it's correct  ?

Yes it is. Nice catch  :)
And if you know the reason why it works and how it operates, then you can easily answer to Nine  ;)
 

 

Posted (edited)

@mikell Thank you!!! Exactly what I was trying to do. Appreciate the explanation too.

Edit: you maybe interested to know with your help I was able to create a script that runs through 365 files, each with over 200,000 lines of data, and pinpoint all the key information into a separate file (about 300,000 lines long). With a program execution time of about 5mins. Man I love AutoIT and it's community!

Thanks everyone else as well :)

Edited by AnonymousX

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...