HyperChao Posted July 13, 2005 Posted July 13, 2005 Say I have a large text file with a relatively short string on each line. These strings may not be unique. How can I efficiently process the file to delete repeated entries?
LxP Posted July 13, 2005 Posted July 13, 2005 You would need to read through the file to develop an array of unique strings and then write the array back to that file. Here's some pseudo-code: open the file while (there's another line to read) read the line search the array for that line if (the array doesn't contain that line) add it wEnd delete original file write array to that location ; (there's most probably a UDF for this ; or maybe AutoIt natively supports it via its FileXXX() commands This shouldn't be too hard to code. Let us know if you get stuck!
buzz44 Posted July 13, 2005 Posted July 13, 2005 Possibly each time you read a line add it into an array, but also check the array to see if the word has already been entered read. qq
blindwig Posted July 13, 2005 Posted July 13, 2005 Like the others (Burrup, LxP) said, you need to read each line in to an array, then check for it's existence before writing it out to an output file.But instead of using a flat array (that would get slower the bigger the file gets) I'd recomend you use my binary tree algorythm to store the strings:http://www.autoitscript.com/forum/index.php?showtopic=13114 My UDF Threads:Pseudo-Hash: Binary Trees, Flat TablesFiles: Filter by Attribute, Tree List, Recursive Find, Recursive Folders Size, exported to XMLArrays: Nested, Pull Common Elements, Display 2dSystem: Expand Environment Strings, List Drives, List USB DrivesMisc: Multi-Layer Progress Bars, Binary FlagsStrings: Find Char(s) in String, Find String in SetOther UDF Threads I Participated:Base64 Conversions
HyperChao Posted July 14, 2005 Author Posted July 14, 2005 Thanks blindwig - that's what I was looking for. I knew I could just use a normal array, but I'm working with thousands of entries. I had read somewhere that binary trees are faster for sorting. Now I just need to figure out how to implement it...
blindwig Posted July 14, 2005 Posted July 14, 2005 Well, if you end up using my BTree UDF, you'd do it something like this: $btLines = _BTreeCreate() ;loop through the file _BTreeSet($btLines, FileReadLine()) ;loop ;Then treat the tree like a 1-based 2-d array (which it is): ;loop 1 to $btLines FileWriteLine($btLines[$LoopVar,0]) ;loop The only issue is that currently my binary tree routine is not case-sensative. My UDF Threads:Pseudo-Hash: Binary Trees, Flat TablesFiles: Filter by Attribute, Tree List, Recursive Find, Recursive Folders Size, exported to XMLArrays: Nested, Pull Common Elements, Display 2dSystem: Expand Environment Strings, List Drives, List USB DrivesMisc: Multi-Layer Progress Bars, Binary FlagsStrings: Find Char(s) in String, Find String in SetOther UDF Threads I Participated:Base64 Conversions
HyperChao Posted July 14, 2005 Author Posted July 14, 2005 OK. Thanks a bunch both in this thread and your own. Guess my understanding of trees wasn't way off - glad you cleared up some terms for me.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now