Jump to content

Find differences in Arrays, String; _Diff


ParoXsitiC
 Share

Recommended Posts

Note: I am in no way the author of this function; I just modified it from another.

The original function (_GetIntersection) by BugFix is located here: Compare Arrays, Strings; _GetIntersection

Benchmarks:

_Diff vs _ArrayDiff vs _Array1PullCommon

_ArrayDiff (Author: PsaltyDS, Download Here)

_Array1PullCommon (Author: blindwig, Download Here)

Note: _Array1PullCommon was edited to return only an array that had unique values from array 2, also the GUI statements were stripped

Small vs Big
506  : _Diff 
791  : ArrayDiff 
1775 : _Array1PullCommon 

Big vs Small
3141 : _Diff 
1922 : ArrayDiff 
1685 : _Array1PullCommon 

Big vs Big
6797    : _Diff 
*       : ArrayDiff 
3296    : _Array1PullCommon

Remarks: Big refers to an array with 20,000 elements, Small refers to an array with 20. Each benchmark ran 10 times and averaged.

* Took way too long and I got bored. Waited over a minute per run.

#include <array.au3>

Dim $Array1[19] =[1,2,3,4,5,6,7,8,9,10,9,8,7,6,5,4,3,2,1]
Dim $Array2[5] =[1,2,3,4,5]
$Differece = _Diff($Array1, $Array2, 1)
_ArrayDisplay($Differece) ; Shows a count from 1-10 and back down 10-1 without the 1,2,3,4,5


$String1="1,2,3,4,5,6,7,8,9,10,9,8,7,6,5,4,3,2,1"
$String2="6,7,8,9,10"
$Differece = _Diff($String1, $String2, 0, ",")
_ArrayDisplay($Differece) ; Shows a count up 1-10 and without the 6,7,8,9,10



;=================================================
; Function Name:   _Diff($Set1, $Set2 [, $GetAll=0 [, $Delim=Default]])
; Description::    Find values in $Set1 that do not occur in $Set2
; Parameter(s):    $Set1    set 1 (1D-array or delimited string)
;                  $Set2    set 2 (1D-array or delimited string)
;      optional:   $GetAll  0 - only one occurence of every difference are shown (Default)
;                           1 - all differences are shown, allowing duplicates
;      optional:   $Delim   Delimiter for strings (Default use the separator character set by Opt("GUIDataSeparatorChar") )
; Return Value(s): Succes   1D-array of values from $Set1 that do not occur in $Set2
;                  Failure  -1  @error  set, that was given as array, isn't 1D-array
; Note:            Comparison is case-sensitive! - i.e. Number 9 is different to string '9'!
; Author(s):       BugFix (bugfix@autoit.de) Modified by ParoXsitiC for Faster _Diff (Formally _GetIntersection)
;=================================================
Func _Diff(ByRef $Set1, ByRef $Set2, $GetAll = 0, $Delim = Default)
    Local $o1 = ObjCreate("System.Collections.ArrayList")
    Local $o2 = ObjCreate("System.Collections.ArrayList")
    Local $oUnion = ObjCreate("System.Collections.ArrayList")
    Local $oDiff1 = ObjCreate("System.Collections.ArrayList")
    Local $oDiff2 = ObjCreate("System.Collections.ArrayList")
    Local $tmp, $i
    If $GetAll <> 1 Then $GetAll = 0
    If $Delim = Default Then $Delim = Opt("GUIDataSeparatorChar")
    If Not IsArray($Set1) Then
        If Not StringInStr($Set1, $Delim) Then
            $o1.Add($Set1)
        Else
            $tmp = StringSplit($Set1, $Delim, 1)
            For $i = 1 To UBound($tmp) - 1
                $o1.Add($tmp[$i])
            Next
        EndIf
    Else
        If UBound($Set1, 0) > 1 Then Return SetError(1, 0, -1)
        For $i = 0 To UBound($Set1) - 1
            $o1.Add($Set1[$i])
        Next
    EndIf
    If Not IsArray($Set2) Then
        If Not StringInStr($Set2, $Delim) Then
            $o2.Add($Set2)
        Else
            $tmp = StringSplit($Set2, $Delim, 1)
            For $i = 1 To UBound($tmp) - 1
                $o2.Add($tmp[$i])
            Next
        EndIf
    Else
        If UBound($Set2, 0) > 1 Then Return SetError(1, 0, -1)
        For $i = 0 To UBound($Set2) - 1
            $o2.Add($Set2[$i])
        Next
    EndIf
    For $tmp In $o1
        If $o2.Contains($tmp) And Not $oUnion.Contains($tmp) Then $oUnion.Add($tmp)
    Next
    For $tmp In $o1
        If $GetAll Then
            If Not $oUnion.Contains($tmp) Then $oDiff1.Add($tmp)
        Else
            If Not $oUnion.Contains($tmp) And Not $oDiff1.Contains($tmp) Then $oDiff1.Add($tmp)
        EndIf
    Next


    If $oDiff1.Count <= 0 Then Return 0

    Local $aOut[$oDiff1.Count]
    $i = 0
    For $tmp In $oDiff1
        $aOut[$i] = $tmp
        $i += 1
    Next
    Return $aOut
EndFunc   ;==>_Diff
Edited by ParoXsitiC
Link to comment
Share on other sites

If you are just chopping up BugFix's routine to only return the elements in set1 that do not exist in set2, in order to speed it up, then I think you were too conservative with your scalpel. Does not this return the same result in considerably less time?

#include <array.au3>

Dim $Array1[19] =[1,2,3,4,5,6,7,8,9,10,9,8,7,6,5,4,3,2,1]
Dim $Array2[5] =[1,2,3,4,5]
$Differece = _Diff($Array1, $Array2, 1)
_ArrayDisplay($Differece) ; Shows a count from 1-10 and back down 10-1 without the 1,2,3,4,5


$String1="1,2,3,4,5,6,7,8,9,10,9,8,7,6,5,4,3,2,1"
$String2="6,7,8,9,10"
$Differece = _Diff($String1, $String2, 0, ",")
_ArrayDisplay($Differece) ; Shows a count up 1-10 and without the 6,7,8,9,10



;=================================================
; Function Name:   _Diff($Set1, $Set2 [, $GetAll=0 [, $Delim=Default]])
; Description::    Find values in $Set1 that do not occur in $Set2
; Parameter(s):    $Set1    set 1 (1D-array or delimited string)
;                  $Set2    set 2 (1D-array or delimited string)
;      optional:   $GetAll  0 - only one occurence of every difference are shown (Default)
;                           1 - all differences are shown, allowing duplicates
;      optional:   $Delim   Delimiter for strings (Default use the separator character set by Opt("GUIDataSeparatorChar") )
; Return Value(s): Succes   1D-array of values from $Set1 that do not occur in $Set2
;                  Failure  -1  @error  set, that was given as array, isn't 1D-array
; Note:            Comparison is case-sensitive! - i.e. Number 9 is different to string '9'!
; Author(s):       BugFix (bugfix@autoit.de) Modified by ParoXsitiC for Faster _Diff (Formally _GetIntersection)
;=================================================
Func _Diff(ByRef $Set1, ByRef $Set2, $GetAll = 0, $Delim = Default)
    Local $o1 = ObjCreate("System.Collections.ArrayList")
    Local $o2 = ObjCreate("System.Collections.ArrayList")
    Local $oDiff1 = ObjCreate("System.Collections.ArrayList")
    Local $tmp, $i
    If $GetAll <> 1 Then $GetAll = 0
    If $Delim = Default Then $Delim = Opt("GUIDataSeparatorChar")
    If Not IsArray($Set1) Then
        If Not StringInStr($Set1, $Delim) Then
            $o1.Add($Set1)
        Else
            $tmp = StringSplit($Set1, $Delim, 1)
            For $i = 1 To UBound($tmp) - 1
                $o1.Add($tmp[$i])
            Next
        EndIf
    Else
        If UBound($Set1, 0) > 1 Then Return SetError(1, 0, -1)
        For $i = 0 To UBound($Set1) - 1
            $o1.Add($Set1[$i])
        Next
    EndIf

    If Not IsArray($Set2) Then
        If Not StringInStr($Set2, $Delim) Then
            $o2.Add($Set2)
        Else
            $tmp = StringSplit($Set2, $Delim, 1)
            For $i = 1 To UBound($tmp) - 1
                $o2.Add($tmp[$i])
            Next
        EndIf
    Else
        If UBound($Set2, 0) > 1 Then Return SetError(1, 0, -1)
        For $i = 0 To UBound($Set2) - 1
            $o2.Add($Set2[$i])
        Next
    EndIf

    For $tmp In $o1
        If Not $o2.Contains($tmp) And ($GetAll Or Not $oDiff1.Contains($tmp)) Then $oDiff1.Add($tmp)
    Next

    If $oDiff1.Count <= 0 Then Return 0

    Local $aOut[$oDiff1.Count]
    $i = 0
    For $tmp In $oDiff1
        $aOut[$i] = $tmp
        $i += 1
    Next
    Return $aOut
EndFunc   ;==>_Diff
Link to comment
Share on other sites

If you are just chopping up BugFix's routine to only return the elements in set1 that do not exist in set2, in order to speed it up, then I think you were too conservative with your scalpel. Does not this return the same result in considerably less time?

Actually it was my understanding that _ArrayDiff was used a lot and I wanted it to have similar parameters where the first array had contents stripped that existed in the second array and a 1D array was returned. The original function did more than that and returned a 2D array, it was better for more wide spread use but if you just want the difference, not 2 differences and a union this function may help you out.

Thanks for your contributions, they actually did help quite a bit; here are updated benchmarks _Diff2 is your version.

;~ Small vs Big
;~ 462  : _Diff
;~ 727  : _ArrayDiff
;~ 1730 : _Array1PullCommon
;~ 488  : _Diff2

;~ Big vs Big
;~ 6897 : _Diff
;~ 0    : _ArrayDiff
;~ 3280 : _Array1PullCommon
;~ 4254 : _Diff2

;~ Big vs Small
;~ 3159 : _Diff
;~ 1895 : _ArrayDiff
;~ 1693 : _Array1PullCommon
;~ 2552 : _Diff2
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...