Jump to content

Removing duplicates from file


covaks
 Share

Recommended Posts

Is there a quicker way to do this? I want to read one file (1.8MB in size, 253k words, one per line) and remove all the words that are found in another file (which has about 12,000 words).

#include <File.au3>
#include <Array.au3>

Dim $In[_FileCountLines("C:\wordlist.txt") + 1]
Dim $Bad[_FileCountLines("C:\Badwords.txt") + 1]
Dim $Out[_FileCountLines("C:\wordlist.txt") + 1]

_FileReadToArray("C:\Badwords.txt",$Bad)
_FileReadToArray("C:\wordlist.txt",$In)

For $x = 1 to $Bad[0]
    For $y = 1 to $In[0]
        If $Bad[$x] = $In[$y] Then
        Else
            $Out[$y] = $In[$y]
        EndIf
    Next
Next

_FileWriteFromArray("C:\clean.txt",$Out)
Edited by covaks
Link to comment
Share on other sites

I wouldn't use array for output, unless you need that data in array for something else than just writing to file.

Concatenate into single string and FileWrite it in one go after the loop. Should be more efficient.

Also, StringRegExpReplace (or, if your data is strictly 1 word per line as you say, a StringInStr+StringReplace) instead of the inner loop could also be much faster than iterating through the whole of the larger array for each bad word. This means ditching the $In array too.

Edited by Siao

"be smart, drink your wine"

Link to comment
Share on other sites

Which would be something like this:

Global $aBad, $sIn

$sIn = FileRead("wordlist.txt")
_FileReadToArray("badwords.txt", $aBad)

For $x = 1 to $aBad[0]
    $sIn = StringRegExpReplace($sIn, '(\A|\n)' & $aBad[$x] & '(\r|\z)', '')
Next

FileWrite("clean.txt", $sIn)

or

Global $aBad, $sIn

$sIn = StringStripCR(FileRead("wordlist.txt")) & @LF
_FileReadToArray("badwords.txt", $aBad)

For $x = 1 to $aBad[0]
    $sIn = StringReplace($sIn, $aBad[$x] & @LF, '')
Next

If StringRight($sIn, 1) = @LF Then $sIn = StringTrimRight($sIn, 1)
$sIn = StringReplace($sIn, @LF, @CRLF)

FileWrite("clean.txt", $sIn)
Edited by Siao

"be smart, drink your wine"

Link to comment
Share on other sites

  • 1 year later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...