Jump to content

Remove a line in a text document if it starts with certain word


Recommended Posts

Hello,

Here is my situation. Most of the lines in the text document start with a word that is needed ("neededword"). Some lines start with another word ("nonneededword") and I would like to delete all of those lines.

Here is what I have so far:

#include <String.au3>
#include <MsgBoxConstants.au3>
#include <File.au3>
#include <AutoItConstants.au3>
#Include <Array.au3>

Local $read = FileRead(@ScriptDir & "\input.txt") ;read file
Local $string2 = StringRegExpReplace($read, "(?U)(?i)(?s)nonneededword.*neededword", "neededword")
FileWrite(@ScriptDir & '\output.txt', $string2)

At this time the script above eliminates most lines that start with nonneeded word, except for two situations:

1. When two lines in a row start with a nonneeded word, only the first one is deleted.

2. If the last line of the text file starts with a nonneeded word, it is not deleted.

Can anyone assist me by pointing in the right direction?

Thank you!

 

Link to comment
Share on other sites

My inclination, unless the file is huge, would be to read it in using _FileReadToArray.  Then loop through the array.  You can either delete the array entries with the bad word, then write the file using _FileWriteFromArray or just loop through the array writing to the file a line at a time skipping those array elements that have the bad word.  The first way is probably both more efficient and easier to code.  See help file for examples for those two functions.

 

Link to comment
Share on other sites

16 minutes ago, MilesAhead said:

My inclination, unless the file is huge, would be to read it in using _FileReadToArray.  Then loop through the array.  You can either delete the array entries with the bad word, then write the file using _FileWriteFromArray or just loop through the array writing to the file a line at a time skipping those array elements that have the bad word.  The first way is probably both more efficient and easier to code.  See help file for examples for those two functions.

 

Trying this now. I'm not really good with loops. Can someone show me an example please?

13 minutes ago, JohnOne said:

Is that regex removing whole lines, looks like it's just replacing words?

Might as well use StringReplace.

It is currently removing whole lines that start with "nonneededword" if the next line starts with "neededword". The issue is it is currently also deleting "neededword" from the start of the next line, so I had to tell it to re-add the "neededword" after the delete. Will try your solution as well.

 

Thank you both!

Link to comment
Share on other sites

Newline syntax in regexp is quite confusing. Anyway this seems to work okay with CRLF as the newline. This can easily be modified.

Local $sText = 'never compromize principles' & @CRLF & _
    'keep this' & @CRLF & _
    'nope get shut' & @CRLF & _
    'nowhere else' & @CRLF & _
    'keep not delete' & @CRLF

Local $sAvoid = 'no|not|nope|never|nah'
Local $sRegExp = '((?m)^' & StringReplace($sAvoid, '|','\b.+(\r\n)?|^') & '\b.+(\r\n)?)'

MsgBox(0, "", StringRegExpReplace($sText, $sRegExp, ''))

 

Edited by czardas
Link to comment
Share on other sites

similar

#include<array.au3>

Local $sText = 'keep never compromize principles' & @CRLF & _
    'keep this' & @CRLF & _
    'not keep' & @CRLF & _
    'keep' & @CRLF & _
    'keep not delete' & @CRLF

Local $sNeededWord = 'keep'

$aMatch = stringregexp($sText , "(?:\A|\n)(" & $sNeededWord & "\s*.*?)\r" , 3)

msgbox(0, '' , _ArrayToString($aMatch, @CRLF))

 

Edited by iamtheky

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

This example deletes all the lines that do not start with any of the words in $sNeededWords.

Local $sText = 'Keep never compromize principles.' & @CRLF & _
        'And delete this line if "and" is not needed word.' & @CRLF & _
        'Keep this' & @CRLF & _
        'not keep this line' & @CRLF & _
        'nor this line' & @CRLF & _
        'keep' & @CRLF & _
        'keeper not keep, so delete this line.'; & @CRLF

Local $sNeededWords = 'keep|and' ; When more than one needed word, separate the words with "|".

$aMatch = StringRegExpReplace($sText, "(?im)^(?!(" & $sNeededWords & ")\b).*\R?", "") ; Delete the lines that do not start with $sNeededWords - "keep" and "and".

MsgBox(0, '"keep" & "and"', $aMatch)

 

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...