leuce

Remove duplicate records, based on one field

6 posts in this topic

#1 ·  Posted

Hello everyone

I have a tab-delimited text file.  Each line (i.e. record) has the same number of columns (i.e. fields), tab delimited (shown in this post using pipes). I would like to remove records based on whether the data in one field is a duplicate of that of any other line, so that only one record with that particular data in that field remains.  It does not matter to me whether the first occurrences or last occurrences are deleted, regardless of the data in other fields.

A    |    dog    |    some more text 1
B    |    cat    |    some more text 2
C    |    horse    |    some more text 3
D    |    cat    |    some more text 4
A    |    cat    |    some more text 5
B    |    turkey    |    some more text 6
C    |    dog    |    some more text 7
D    |    chicken    |    some more text 8

...then I want to end up with either this:

A    |    dog    |    some more text 1
B    |    cat    |    some more text 2
C    |    horse    |    some more text 3
B    |    turkey    |    some more text 6
D    |    chicken    |    some more text 8

...or this:

C    |    horse    |    some more text 3
A    |    cat    |    some more text 5
B    |    turkey    |    some more text 6
C    |    dog    |    some more text 7
D    |    chicken    |    some more text 8

...in which you can see that records that have "cat" or "dog" in the second field were reduced to just one instance each.

I suspect I should use _ArrayUnique, but this creates only a 1D array, not a 2D array:

#include <Array.au3>

Local $aArray[8][3] = [["A", "dog", "some more text 1"], ["B", "cat", "some more text 2"], ["C", "horse", "some more text 3"], ["D", "cat", "some more text 4"], ["A", "cat", "some more text 5"], ["B", "turkey", "some more text 6"], ["C", "dog", "some more text 7"], ["D", "chicken", "some more text 8"]]
_ArrayDisplay($aArray, "2D array")
$aArrayUnique = _ArrayUnique($aArray, 1)
_ArrayDisplay($aArrayUnique, "array second column")

What should the last _ArrayDislay line read in order to show the full records and not just the second column?

Thanks

Samuel

 

 

 

 

Share this post


Link to post
Share on other sites



#2 ·  Posted

You could use something like this:

#include <Array.au3>

Global $aArray[8][3] = [["A", "dog", "some more text 1"], ["B", "cat", "some more text 2"], ["C", "horse", "some more text 3"], ["D", "cat", "some more text 4"], ["A", "cat", "some more text 5"], ["B", "turkey", "some more text 6"], ["C", "dog", "some more text 7"], ["D", "chicken", "some more text 8"]]
_ArrayDisplay($aArray, "2D array")
_ArraySearchUnique('dog')
_ArrayDisplay($aArray, "2D array")
_ArraySearchUnique('cat')
_ArrayDisplay($aArray, "2D array")

Func _ArraySearchUnique($sSearch)
    Local $aFindAll = _ArrayFindAll($aArray, $sSearch, 0, 0, 0, 0, 1)
    If UBound($aFindAll) > 1 Then
        For $x = UBound($aFindAll) - 1 To 1 Step - 1
            _ArrayDelete($aArray, $aFindAll[$x])
        Next
    EndIf
EndFunc

You can use _ArraySort to sort the columns first.

Share this post


Link to post
Share on other sites

#3 ·  Posted

Something like this:

#include <Array.au3>

Local $avArray[8][3] = [["A", "dog", "some more text 1"], ["B", "cat", "some more text 2"], ["C", "horse", "some more text 3"], ["D", "cat", "some more text 4"], ["A", "cat", "some more text 5"], ["B", "turkey", "some more text 6"], ["C", "dog", "some more text 7"], ["D", "chicken", "some more text 8"]]

_ArrayDisplay($avArray)
$aArrayUnique = _ArrayUnique_2D($avArray, 1) ; filter on column 1 (0 based)
_ArrayDisplay($aArrayUnique)

Func _ArrayUnique_2D(ByRef $aTableData, $column) ; $column is 0 based
    $aUniques = _ArrayUnique($aTableData, $column) ; unique elements from column $column
    ; create a new array with room for only unique rows
    Local $aResultTable[$aUniques[0]][UBound($aTableData, 2)] ; create a new array for output

    For $i = 1 To $aUniques[0] ; loop all rows of unique elements
        ; search first occurence of element $i in main table ; $ndx is row where is found
        $ndx = _ArraySearch($aTableData, $aUniques[$i], 0, 0, 0, 0, 1, $column)
        ; copy data from all columns from source array to new array
        For $x = 0 To UBound($aResultTable, 2) - 1 ; loop all columns of row $ndx
            $aResultTable[$i - 1][$x] = $aTableData[$ndx][$x] ; copy data of row $ndx from main to new array
        Next
    Next
    Return $aResultTable
EndFunc   ;==>_ArrayUnique_2D

 


small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites

#4 ·  Posted

Another method, which will search for all unique items for a particular column:

nb: Use _ArraySearchUnique(Column Index : default = 1) to sort by another column

#include <Array.au3>

Global $aArray[8][3] = [["A", "dog", "some more text 1"], ["B", "cat", "some more text 2"], ["C", "horse", "some more text 3"], ["D", "cat", "some more text 4"], ["A", "cat", "some more text 5"], ["B", "turkey", "some more text 6"], ["C", "dog", "some more text 7"], ["D", "chicken", "some more text 8"]]

_ArrayDisplay($aArray, "2D array")
_ArraySearchUnique()
_ArrayDisplay($aArray, "2D array")

Func _ArraySearchUnique($iColumnSearch = 1)
    Local $aUnique = _ArrayUnique($aArray, $iColumnSearch)
    For $i = 1 To $aUnique[0]
        Local $aFindAll = _ArrayFindAll($aArray, $aUnique[$i], 0, 0, 0, 0, $iColumnSearch)
        If UBound($aFindAll) > 1 Then
            For $j = UBound($aFindAll) - 1 To 1 Step - 1
                _ArrayDelete($aArray, $aFindAll[$j])
            Next
        EndIf
    Next
EndFunc

 

Share this post


Link to post
Share on other sites

#5 ·  Posted

... or something like this (concept by water)

#include <Array.au3>

Local $aArray[8][3] = [["A", "dog", "some more text 1"], ["B", "cat", "some more text 2"], ["C", "horse", "some more text 3"], ["D", "cat", "some more text 4"], ["A", "cat", "some more text 5"], ["B", "turkey", "some more text 6"], ["C", "dog", "some more text 7"], ["D", "chicken", "some more text 8"]]
_ArrayDisplay($aArray, "2D array")

$iCol = 1
_ArraySort($aArray, 0, 0, 0, $iCol)
For $i = UBound($aArray)-1 To 1 Step -1
    If $aArray[$i][$iCol] = $aArray[$i-1][$iCol] Then _ArrayDelete($aArray, $i)
Next
_ArrayDisplay($aArray, "array second column")

 

Share this post


Link to post
Share on other sites

#6 ·  Posted

Thanks for the replies -- I'll check this out, thanks.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now