Jump to content

Recommended Posts

Posted

Say I have a large text file with a relatively short string on each line. These strings may not be unique. How can I efficiently process the file to delete repeated entries?

Posted

You would need to read through the file to develop an array of unique strings and then write the array back to that file. Here's some pseudo-code:

open the file

while (there's another line to read)
    read the line
    search the array for that line
    if (the array doesn't contain that line)
        add it
wEnd

delete original file
write array to that location
; (there's most probably a UDF for this
; or maybe AutoIt natively supports it via its FileXXX() commands

This shouldn't be too hard to code. Let us know if you get stuck!

Posted

Possibly each time you read a line add it into an array, but also check the array to see if the word has already been entered read.

qq

Posted

Like the others (Burrup, LxP) said, you need to read each line in to an array, then check for it's existence before writing it out to an output file.

But instead of using a flat array (that would get slower the bigger the file gets) I'd recomend you use my binary tree algorythm to store the strings:

http://www.autoitscript.com/forum/index.php?showtopic=13114

Posted

Thanks blindwig - that's what I was looking for. I knew I could just use a normal array, but I'm working with thousands of entries. I had read somewhere that binary trees are faster for sorting. Now I just need to figure out how to implement it...

Posted

Well, if you end up using my BTree UDF, you'd do it something like this:

$btLines = _BTreeCreate()
;loop through the file
_BTreeSet($btLines, FileReadLine())
;loop

;Then treat the tree like a 1-based 2-d array (which it is):
;loop 1 to $btLines
FileWriteLine($btLines[$LoopVar,0])
;loop

The only issue is that currently my binary tree routine is not case-sensative.

Posted

OK. Thanks a bunch both in this thread and your own. Guess my understanding of trees wasn't way off - glad you cleared up some terms for me.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...