Jump to content

Remove duplicate line in a set of data


tehte
 Share

Recommended Posts

I am doing AutoIT v3 script to remove duplicate lines in a set of data. Let say the data set A have format below.

Data Set A

-------------

1253-6856

3101-4011

1827-1356

1822-1157

1822-1157

1000-1410

1000-1410

1822-1231

1822-1231

3101-4011

1822-1157

1822-1231

........

and I want to simply it with no duplicate data set B as below.

Data set B

------

1253-6856

3101-4011

1827-1356

1822-1157

1000-1410

1822-1231

........

How can I do that in AutoIT v3?

Link to comment
Share on other sites

maybe you could use arrays? and do a check in the array if the data has been added, if not add the new data. then write the array as a file.

if it is a big file then it might be a bit slow, but still beats having to remove them manually :)

> there are 10 types of people in the world, those who understand binary and those who don't.

Link to comment
Share on other sites

#include <array.au3>
Local $data[12]
$data[0] = "1253-6856"
$data[1] = "3101-4011"
$data[2] = "1827-1356"
$data[3] = "1822-1157"
$data[4] = "1822-1157"
$data[5] = "1000-1410"
$data[6] = "1000-1410"
$data[7] = "1822-1231"
$data[8] = "1822-1231"
$data[9] = "3101-4011"
$data[10] = "1822-1157"
$data[11] = "1822-1231"
$data2 = RemoveDuplicates($data)
_ArrayDisplay($data2, "Removed Duplicates")

Func RemoveDuplicates($avData)
    Local $avData2 = $avData
    Local $iCount = 0
    For $i = 0 To UBound($avData) - 1
        $iCount = 0
        For $ii = 0 To UBound($avData) - 1
            If $ii > UBound($avData2) - 1 Then ExitLoop
            If $avData2[$ii] = $avData[$i] Then
                If $iCount > 0 Then _ArrayDelete($avData2, $ii)
                $iCount += 1
            EndIf
        Next
    Next
    Return $avData2
EndFunc   ;==>RemoveDuplicates

That should do

My Programs:AInstall - Create a standalone installer for your programUnit Converter - Converts Length, Area, Volume, Weight, Temperature and Pressure to different unitsBinary Clock - Hours, minutes and seconds have 10 columns each to display timeAutoIt Editor - Code Editor with Syntax Highlighting.Laserix Editor & Player - Create, Edit and Play Laserix LevelsLyric Syncer - Create and use Synchronised Lyrics.Connect 4 - 2 Player Connect 4 Game (Local or Online!, Formatted Chat!!)MD5, SHA-1, SHA-256, Tiger and Whirlpool Hash Finder - Dictionary and Brute Force FindCool Text Client - Create Rendered ImageMy UDF's:GUI Enhance - Enhance your GUIs visually.IDEA File Encryption - Encrypt and decrypt files easily! File Rename - Rename files easilyRC4 Text Encryption - Encrypt text using the RC4 AlgorithmPrime Number - Check if a number is primeString Remove - remove lots of strings at onceProgress Bar - made easySound UDF - Play, Pause, Resume, Seek and Stop.
Link to comment
Share on other sites

I just came up with this. It's basically untested.

#include <Array.au3> ; this include is needed for the arraydisplay function.

;Create a array with the data to be checked.
Dim $Data[12] = ["1253-6856", "3101-4011", "1827-1356", "1822-1157", "1822-1157", "1000-1410", "1000-1410", "1822-1231", "1822-1231", "3101-4011", "1822-1157", "1822-1231"]

; Create some program variables.
Global $ModifiedData[12], $i = 0, $Double

; Check every variable against another.
For $x = 0 to UBound($Data)-1
    For $y = 0 to UBound($Data)-1
        If (($x <> $y) And ($Data[$x] = $Data[$y])) Then
                #region code to perform when double is found
                    
                    
                #endregion
            $Double = True
            ExitLoop
        EndIf
    Next
    If Not $Double Then
        #region code to perform when no double is found
            
            
            
        #endregion
    EndIf
Next

_ArrayDisplay($ModifiedData, "Return value:")
Exit

Edit: Damn you beat me to it. :)

Edited by Manadar
Link to comment
Share on other sites

I am doing AutoIT v3 script to remove duplicate lines in a set of data. Let say the data set A have format below.

Data Set A

-------------

1253-6856

3101-4011

1827-1356

1822-1157

1822-1157

1000-1410

1000-1410

1822-1231

1822-1231

3101-4011

1822-1157

1822-1231

........

and I want to simply it with no duplicate data set B as below.

Data set B

------

1253-6856

3101-4011

1827-1356

1822-1157

1000-1410

1822-1231

........

How can I do that in AutoIT v3?

How will the data be input? Read data file, ini file, array from another part of the script, etc.?

How many elements max in the set? Hundreds, thousands, millions?

The appropriate technique will depend on the data source and scale of the project.

:)

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

Hi,

I changed it a bit to this:

#include <file.au3>
#include <Array.au3>
Dim $aLines

If Not _FileReadToArray("data.txt", $aLines) Then
    MsgBox(4096, "Error", " Error reading log to Array     error:" & @error)
    Exit
EndIf

Global $array = RemoveDuplicates($aLines)
_ArrayDisplay($array, "Removed Duplicates")

Func RemoveDuplicates($avData)
    Local $avData2 = $avData
    Local $iCount = 0
    For $i = 0 To UBound($avData) - 1
        $iCount = 0
        For $ii = 0 To UBound($avData) - 1
            If $ii > UBound($avData2) - 1 Then ExitLoop
            If $avData2[$ii] = $avData[$i] Then
                If $iCount > 0 Then _ArrayDelete($avData2, $ii)
                $iCount += 1
            EndIf
        Next
    Next
    $avData2[0] = UBound($avData2)
    Return $avData2
EndFunc   ;==>RemoveDuplicates

So long,

Mega

Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Link to comment
Share on other sites

Hi,

I changed it a bit to this:

#include <file.au3>
#include <Array.au3>
Dim $aLines

If Not _FileReadToArray("data.txt", $aLines) Then
    MsgBox(4096, "Error", " Error reading log to Array     error:" & @error)
    Exit
EndIf

Global $array = RemoveDuplicates($aLines)
_ArrayDisplay($array, "Removed Duplicates")

Func RemoveDuplicates($avData)
    Local $avData2 = $avData
    Local $iCount = 0
    For $i = 0 To UBound($avData) - 1
        $iCount = 0
        For $ii = 0 To UBound($avData) - 1
            If $ii > UBound($avData2) - 1 Then ExitLoop
            If $avData2[$ii] = $avData[$i] Then
                If $iCount > 0 Then _ArrayDelete($avData2, $ii)
                $iCount += 1
            EndIf
        Next
    Next
    $avData2[0] = UBound($avData2)
    Return $avData2
EndFunc   ;==>RemoveDuplicates oÝ÷ Ù*%¢xz¿ªê-xay-+"±ö¬µêÛyÆ®±è­­ê®·¬ë,׫jب©Ý¢yrv¬¢w¢jZ­ê)z¶­éz¸­z¶­ÚºÚ"µÍ[È[[ÝQXØ]Ê    ÌÍØ]]JBSØØ[    ÌÍØ]]LÌWHHÉÌÍØ]]VÌWBSØØ[ ÌÍÚK ÌÍÚZBQÜ ÌÍÚHHHÈPÝ[
    ÌÍØ]]JHHBBQÜ    ÌÍÚZHHÈPÝ[
    ÌÍØ]]LHHBBBRY    ÌÍØ]]VÉÌÍÚWHHPÝ[
    ÌÍØ]]LHHH[ÛÛ[YSÛÜBS^BWÐ^PY
    ÌÍØ]]L   ÌÍØ]]VÉÌÍÚWJBBS^T]   ÌÍØ]]L[[ÈÏOIÝÔ[[ÝQXØ]

But these array techniques will not be practical if the data set is huge, I'm still curious how large...

:)

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...