quickest remove duplicate lines method?

April 3, 2011

Hello, I am looking to find the quickest way to remove duplicate lines (but leaving 1 instance intact) from a set of text files. I have gleaned and modifed code I found on here but the current result I have is too slow. I have tested with a text file of 100,000 lines which take 40 seconds to complete. I tested with a file of 5 million and when it didn't crash with an "out of memory" error it used 1Gb of memory, 100% of processor and I had to stop it after 15 minutes as I had no idea when, or e

22 replies