Sign in to follow this  
Followers 0
lostandconfused

Remove a line in a text document if it starts with certain word

11 posts in this topic

Hello,

Here is my situation. Most of the lines in the text document start with a word that is needed ("neededword"). Some lines start with another word ("nonneededword") and I would like to delete all of those lines.

Here is what I have so far:

#include <String.au3>
#include <MsgBoxConstants.au3>
#include <File.au3>
#include <AutoItConstants.au3>
#Include <Array.au3>

Local $read = FileRead(@ScriptDir & "\input.txt") ;read file
Local $string2 = StringRegExpReplace($read, "(?U)(?i)(?s)nonneededword.*neededword", "neededword")
FileWrite(@ScriptDir & '\output.txt', $string2)

At this time the script above eliminates most lines that start with nonneeded word, except for two situations:

1. When two lines in a row start with a nonneeded word, only the first one is deleted.

2. If the last line of the text file starts with a nonneeded word, it is not deleted.

Can anyone assist me by pointing in the right direction?

Thank you!

 

Share this post


Link to post
Share on other sites



My inclination, unless the file is huge, would be to read it in using _FileReadToArray.  Then loop through the array.  You can either delete the array entries with the bad word, then write the file using _FileWriteFromArray or just loop through the array writing to the file a line at a time skipping those array elements that have the bad word.  The first way is probably both more efficient and easier to code.  See help file for examples for those two functions.

 

Share this post


Link to post
Share on other sites
16 minutes ago, MilesAhead said:

My inclination, unless the file is huge, would be to read it in using _FileReadToArray.  Then loop through the array.  You can either delete the array entries with the bad word, then write the file using _FileWriteFromArray or just loop through the array writing to the file a line at a time skipping those array elements that have the bad word.  The first way is probably both more efficient and easier to code.  See help file for examples for those two functions.

 

Trying this now. I'm not really good with loops. Can someone show me an example please?

13 minutes ago, JohnOne said:

Is that regex removing whole lines, looks like it's just replacing words?

Might as well use StringReplace.

It is currently removing whole lines that start with "nonneededword" if the next line starts with "neededword". The issue is it is currently also deleting "neededword" from the start of the next line, so I had to tell it to re-add the "neededword" after the delete. Will try your solution as well.

 

Thank you both!

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

Newline syntax in regexp is quite confusing. Anyway this seems to work okay with CRLF as the newline. This can easily be modified.

Local $sText = 'never compromize principles' & @CRLF & _
    'keep this' & @CRLF & _
    'nope get shut' & @CRLF & _
    'nowhere else' & @CRLF & _
    'keep not delete' & @CRLF

Local $sAvoid = 'no|not|nope|never|nah'
Local $sRegExp = '((?m)^' & StringReplace($sAvoid, '|','\b.+(\r\n)?|^') & '\b.+(\r\n)?)'

MsgBox(0, "", StringRegExpReplace($sText, $sRegExp, ''))

 

Edited by czardas

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

similar

#include<array.au3>

Local $sText = 'keep never compromize principles' & @CRLF & _
    'keep this' & @CRLF & _
    'not keep' & @CRLF & _
    'keep' & @CRLF & _
    'keep not delete' & @CRLF

Local $sNeededWord = 'keep'

$aMatch = stringregexp($sText , "(?:\A|\n)(" & $sNeededWord & "\s*.*?)\r" , 3)

msgbox(0, '' , _ArrayToString($aMatch, @CRLF))

 

Edited by iamtheky

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites

This example deletes all the lines that do not start with any of the words in $sNeededWords.

Local $sText = 'Keep never compromize principles.' & @CRLF & _
        'And delete this line if "and" is not needed word.' & @CRLF & _
        'Keep this' & @CRLF & _
        'not keep this line' & @CRLF & _
        'nor this line' & @CRLF & _
        'keep' & @CRLF & _
        'keeper not keep, so delete this line.'; & @CRLF

Local $sNeededWords = 'keep|and' ; When more than one needed word, separate the words with "|".

$aMatch = StringRegExpReplace($sText, "(?im)^(?!(" & $sNeededWords & ")\b).*\R?", "") ; Delete the lines that do not start with $sNeededWords - "keep" and "and".

MsgBox(0, '"keep" & "and"', $aMatch)

 

1 person likes this

Share this post


Link to post
Share on other sites

If there is only one "nonneededword" the only change to do in your code from post #1 is this

Local $string2 = StringRegExpReplace($read, "(?im)^nonneededword.*\R?", "")

If there are several the regex can be adapted very easily using an alternation as showed in the previous codes

1 person likes this

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

The last two examples are much better than my attempt. :) When I used \R (for the first time mind), I couldn't get it to work.

Edited by czardas

Share this post


Link to post
Share on other sites

Yes, \R is very useful as it matches any newline sequence - though for the same reason (non fixed length) it can't be used in a lookbehind, limitation which should be pointed out in the helpfile  :)

Share this post


Link to post
Share on other sites

Thank you everyone for the awesome help! This forum is great!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0