Sign in to follow this  
Followers 0
logcomptechs

Find Duplicate Lines in a Text File?

15 posts in this topic

Is their a way to read a text file by line and delete any duplicate lines? I was looking at _FileReadtoArray, but wasn't sure on how I would go about comparing, because I do not want to delete the original line, just duplicates of that line. Any help would be much appreciated, thanks!

Share this post


Link to post
Share on other sites



How do you want it got many functions, one uses the sort ability of the arrays and then does the find duplications. However got this one that find each one and searches the whole arrays then put it in a new array. Yes they both have there appeals but what do you need it for, something fast or you have to keep the array in the same order, ie....


0x576520616C6C206469652C206C697665206C69666520617320696620796F75207765726520696E20746865206C617374207365636F6E642E

Share this post


Link to post
Share on other sites

Is their a way to read a text file by line and delete any duplicate lines? I was looking at _FileReadtoArray, but wasn't sure on how I would go about comparing, because I do not want to delete the original line, just duplicates of that line. Any help would be much appreciated, thanks!

i think that in this case you can use _ArraySearch and if find on other palce diffrent then reading position (Index of array to start searching at) to delite.


TCP server and client - Learning about TCP servers and clients connection
Au3 oIrrlicht - Irrlicht project
Au3impact - Another 3D DLL game engine for autoit. (3impact 3Drad related)



460px-Thief-4-temp-banner.jpg
There are those that believe that the perfect heist lies in the preparation.
Some say that it’s all in the timing, seizing the right opportunity. Others even say it’s the ability to leave no trace behind, be a ghost.

 

Share this post


Link to post
Share on other sites

How do you want it got many functions, one uses the sort ability of the arrays and then does the find duplications. However got this one that find each one and searches the whole arrays then put it in a new array. Yes they both have there appeals but what do you need it for, something fast or you have to keep the array in the same order, ie....

No need to keep a order or anything like that. Just need something to find duplicated lines and delete. I am going to try _ArraySearch but not 100% sure how to still search for duplicates.

Share this post


Link to post
Share on other sites

Hi,

FileReadToArray, then ArrayUnique FileWriteFromArray done

Mega


Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

a brute force method:

#include <File.au3>
$OldPath = @ScriptDir & "\oldfile.txt"
$NewPath = @ScriptDir & "\newfile.txt"
Dim $a_NewFile[1]
_FileReadToArray($OldPath, $a_OldFile)
; loop through the array of the orriginal file
For $i = 1 To $a_OldFile[0]
    $line = $a_OldFile[$i]
    $count = 0
    ; check to see if the current line has existed in the old file
    For $x = 1 To $i
        If $line = $a_OldFile[$x] Then $count += 1
    Next
    ; if it only shows up once, then add it to a new array
    If $count = 1 Then
        $a_NewFile[0] = UBound($a_NewFile)
        ReDim $a_NewFile[$a_NewFile[0] + 1]
        $a_NewFile[$a_NewFile[0]] = $line
    EndIf
Next
_FileWriteFromArray($NewPath, $a_NewFile, 1)

but I like Xeno's method better

Edited by SpookMeister

[u]Helpful tips:[/u]If you want better answers to your questions, take the time to reproduce your issue in a small "stand alone" example script whenever possible. Also, make sure you tell us 1) what you tried, 2) what you expected to happen, and 3) what happened instead.[u]Useful links:[/u]BrettF's update to LxP's "How to AutoIt" pdfValuater's Autoit 1-2-3 Download page for the latest versions of Autoit and SciTE[quote]<glyph> For example - if you came in here asking "how do I use a jackhammer" we might ask "why do you need to use a jackhammer"<glyph> If the answer to the latter question is "to knock my grandmother's head off to let out the evil spirits that gave her cancer", then maybe the problem is actually unrelated to jackhammers[/quote]

Share this post


Link to post
Share on other sites

Hi,

FileReadToArray, then ArrayUnique FileWriteFromArray done

Mega

Thank You, that was extremely easy!

#Include <File.au3>
#include <Array.au3>
Dim $oFile,$nFile

_FileReadToArray("old_text.txt",$oFile)
$nFile = _ArrayUnique($oFile)
_FileWriteFromArray("new_text.txt",$nFile)

Share this post


Link to post
Share on other sites

Thank You, that was extremely easy!

You're welcome. That's AutoIt :D

Mega


Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites

Perhaps you could clear this, everytime I start up the script and end it it will find the number of entries in the file and use that as a holder.


0x576520616C6C206469652C206C697665206C69666520617320696620796F75207765726520696E20746865206C617374207365636F6E642E

Share this post


Link to post
Share on other sites

Perhaps you could clear this, everytime I start up the script and end it it will find the number of entries in the file and use that as a holder.

I don't know whether I got what you mean, but try this

#include <Array.au3>
#include <File.au3>
; Doppelte Zeilen löschen aus Datei

ConsoleWrite(_delelteDoubleLinesFromFile("test2.txt") & @CRLF)

Func _delelteDoubleLinesFromFile($filePath)
    Local $fileContent[1]
    If Not FileExists($filePath) Then Return -1
    If Not _FileReadToArray($filePath, $fileContent) Then Return -2
    $fileContent = _ArrayUnique($fileContent, 1, 1)
    _FileWriteFromArray($filePath, $fileContent, 1)
    Return 1
EndFunc  ;==>_delelteDoubleLinesFromFile

Mega


Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites

Thank you then


0x576520616C6C206469652C206C697665206C69666520617320696620796F75207765726520696E20746865206C617374207365636F6E642E

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

NM.

Edited by JustDoIt

Share this post


Link to post
Share on other sites

Probably sort array and check if next line is identical to the curent line?


TCP server and client - Learning about TCP servers and clients connection
Au3 oIrrlicht - Irrlicht project
Au3impact - Another 3D DLL game engine for autoit. (3impact 3Drad related)



460px-Thief-4-temp-banner.jpg
There are those that believe that the perfect heist lies in the preparation.
Some say that it’s all in the timing, seizing the right opportunity. Others even say it’s the ability to leave no trace behind, be a ghost.

 

Share this post


Link to post
Share on other sites

#14 ·  Posted

On 26/3/2009 at 0:31 AM, logcomptechs said:

 

Thank You, that was extremely easy!

 

 

#Include <File.au3>
#include <Array.au3>
Dim $oFile,$nFile

_FileReadToArray("old_text.txt",$oFile)
$nFile = _ArrayUnique($oFile)
_FileWriteFromArray("new_text.txt",$nFile)

Good idea!

But i dont  like the first and sencond lines  created by Your Code. How to delete it! Please hepl me, thankyou!

Share this post


Link to post
Share on other sites

#15 ·  Posted

20 minutes ago, trandunghcm said:

Good idea!

But i dont  like the first and sencond lines  created by Your Code. How to delete it! Please hepl me, thankyou!

i have done! i was edited 

$iBase =2
_FileWriteFromArray("new_text.txt",$nFile,2)

 

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0