How do I delete redundant elements?

HyperChao · July 13, 2005

Say I have a large text file with a relatively short string on each line. These strings may not be unique. How can I efficiently process the file to delete repeated entries?

LxP · July 13, 2005

You would need to read through the file to develop an array of unique strings and then write the array back to that file. Here's some pseudo-code:

open the file

while (there's another line to read)
    read the line
    search the array for that line
    if (the array doesn't contain that line)
        add it
wEnd

delete original file
write array to that location
; (there's most probably a UDF for this
; or maybe AutoIt natively supports it via its FileXXX() commands

This shouldn't be too hard to code. Let us know if you get stuck!

buzz44 · July 13, 2005

Possibly each time you read a line add it into an array, but also check the array to see if the word has already been entered read.

blindwig · July 13, 2005

Like the others (Burrup, LxP) said, you need to read each line in to an array, then check for it's existence before writing it out to an output file.

But instead of using a flat array (that would get slower the bigger the file gets) I'd recomend you use my binary tree algorythm to store the strings:

http://www.autoitscript.com/forum/index.php?showtopic=13114

HyperChao · July 14, 2005

Thanks blindwig - that's what I was looking for. I knew I could just use a normal array, but I'm working with thousands of entries. I had read somewhere that binary trees are faster for sorting. Now I just need to figure out how to implement it...

blindwig · July 14, 2005

Well, if you end up using my BTree UDF, you'd do it something like this:

$btLines = _BTreeCreate()
;loop through the file
_BTreeSet($btLines, FileReadLine())
;loop

;Then treat the tree like a 1-based 2-d array (which it is):
;loop 1 to $btLines
FileWriteLine($btLines[$LoopVar,0])
;loop

The only issue is that currently my binary tree routine is not case-sensative.

HyperChao · July 14, 2005

OK. Thanks a bunch both in this thread and your own. Guess my understanding of trees wasn't way off - glad you cleared up some terms for me.

Sign In

How do I delete redundant elements?

Recommended Posts

HyperChao

LxP

buzz44

blindwig

HyperChao

blindwig

HyperChao

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Browse

AutoIt Resources

Release

Beta