the fastest way to find duplicate lines [solved]

I Have an txt file which contains 1000000 lines.

I want to delete the duplicate lines. 

I have try the code as follow,  however, it runs very slowly :

Local $array
Local $aArrayUnique  = _ArrayUnique($array)


If anyone can help me a faster way ? ( i want the code can  get the result  in  no more than 20 seconds   ) 



Edited by fenhanxue
You can probably somehow speed up de-dup of that many lines, but the main point is that a flat text file isn't suitable for a routine task like that. You'd benefit hugely from converting to a database file. FYI SQLite is pretty easy to use and well supported from AutoIt.

Whilst I completely agree with jchd, if you need a quick, one use, solution to this, you will find (quick Google) most of more robust text editors, e.g. notepad++ (free) and UltraEdit (commercial license) have ready made solutions for this problem.

Problem solving step 1: Write a simple, self-contained, running, replicator of your problem.

$oBuffer = ObjCreate('Scripting.Dictionary')
$h_File_Source = FileOpen("source.txt")
$h_File_Output = FileOpen("output.txt", 2)
While 1
    $sLine = FileReadLine($h_File_Source)
    If @error Then ExitLoop
    ; if in check buffer skip line
    If $oBuffer.Exists($sLine) Then ContinueLoop
    ; write line to output file
    FileWriteLine($h_File_Output, $sLine)
    ; Add to duplicate check buffer
    $oBuffer.Item($sLine) = 1


Here is a PowerShell method of removing duplicate lines from an unsorted file, from within AutoIt.

It may be faster on big files.  On small files, KaFu's example is faster.

; Remove Duplicate Rows From A Text File Using Powershell... unsorted file, where order is important.
; Command from:  http://www.secretgeek.net/ps_duplicates

Local $hTimer = TimerInit()
local $sFileIn =  @ScriptDir & '\temp-6.txt'
local $sFileOut = @ScriptDir & '\newlist.txt'
Local $sCmd = '$hash = @{}' & @CRLF & _
        'gc ' & $sFileIn & '| % {if ($hash.$_ -eq $null) {$_} $hash.$_ = 1;} > ' & $sFileOut

RunWait('"C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe" ' & $sCmd, "", @SW_HIDE, 2 + 4) ; Run command in PowerShell.
ConsoleWrite("Time Taken: " & round(TimerDiff($hTimer)/1000,4) & "Secs" & @CRLF)
ShellExecute($sFileOut) ; See unique file.


41 minutes ago, fenhanxue said:

i wonder if the code (    ObjCreate('Scripting.Dictionary')     )     will work in every computer ?

Afaik it's part of WSH and should be available from XP-SP3 on upwards by default:


