Sign in to follow this  
Followers 0
HyperChao

How do I delete redundant elements?

7 posts in this topic

Say I have a large text file with a relatively short string on each line. These strings may not be unique. How can I efficiently process the file to delete repeated entries?

Share this post


Link to post
Share on other sites



You would need to read through the file to develop an array of unique strings and then write the array back to that file. Here's some pseudo-code:

open the file

while (there's another line to read)
    read the line
    search the array for that line
    if (the array doesn't contain that line)
        add it
wEnd

delete original file
write array to that location
; (there's most probably a UDF for this
; or maybe AutoIt natively supports it via its FileXXX() commands

This shouldn't be too hard to code. Let us know if you get stuck!

Share this post


Link to post
Share on other sites

Possibly each time you read a line add it into an array, but also check the array to see if the word has already been entered read.


[u]Old Projects:[/u]A3MORGB2Hex[u]Old Functions:[/u]_TimeAdd/_TimeSub_AddComma_BubbleSort _RippleSort "He who does not understand your silence will probably not understand your words." - Elbert Hubbard.

Share this post


Link to post
Share on other sites

Like the others (Burrup, LxP) said, you need to read each line in to an array, then check for it's existence before writing it out to an output file.

But instead of using a flat array (that would get slower the bigger the file gets) I'd recomend you use my binary tree algorythm to store the strings:

http://www.autoitscript.com/forum/index.php?showtopic=13114

Share this post


Link to post
Share on other sites

Thanks blindwig - that's what I was looking for. I knew I could just use a normal array, but I'm working with thousands of entries. I had read somewhere that binary trees are faster for sorting. Now I just need to figure out how to implement it...

Share this post


Link to post
Share on other sites

Well, if you end up using my BTree UDF, you'd do it something like this:

$btLines = _BTreeCreate()
;loop through the file
_BTreeSet($btLines, FileReadLine())
;loop

;Then treat the tree like a 1-based 2-d array (which it is):
;loop 1 to $btLines
FileWriteLine($btLines[$LoopVar,0])
;loop

The only issue is that currently my binary tree routine is not case-sensative.

Share this post


Link to post
Share on other sites

OK. Thanks a bunch both in this thread and your own. Guess my understanding of trees wasn't way off - glad you cleared up some terms for me.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0