Jump to content
Sign in to follow this  
czardas

_ArrayUniqueConcatenate

Recommended Posts

czardas

After some insightful input by kylomas in >this topic, I decided to make a general resusable function using the same ideas. It handles up to 24 arrays. Zero based arrays go in, and the target array is returned ByRef. Before passing arrays to this function, you need to delete element 0 if it contains the item count; and you need to use Ubound to get the new size of the target array after using the function. If processing large arrays it is a good idea to delete the arrays you no longer need after the concatenation.

The function returns the number of removed duplicates. Set case sensitivity using the second parameter 0 = case insensitive, 1 = case sensitive.

;

Func _ArrayUniqueConcatenate(ByRef $aTarget, $iCasesense = 0, _ ; up to 23 more arrays can be included
    $a0 = 0, $a1 = 0, $a2 = 0, $a3 = 0, $a4 = 0, $a5 = 0, $a6 = 0, $a7 = 0, $a8 = 0, $a9 = 0, $a10 = 0, $a11 = 0, _
    $a12 = 0, $a13 = 0, $a14 = 0, $a15 = 0, $a16 = 0, $a17 = 0, $a18 = 0, $a19 = 0, $a20 = 0, $a21 = 0, $a22 = 0)

    #forceref $a0, $a1, $a2, $a3, $a4, $a5, $a6, $a7, $a8, $a9, $a10, $a11, $a12, $a13, $a14, $a15, $a16, $a17, $a18, $a19, $a20, $a21, $a22
    If Not IsArray($aTarget) Or UBound($aTarget, 0) <> 1 Then Return SetError(1)

    Local $iTotalSize = UBound($aTarget), $iItems = 0, $tVarName
    If $iCasesense Then
        For $i = 0 To $iTotalSize -1
            $tVarName = "_" & StringToBinary($aTarget[$i], 2)
            If IsDeclared($tVarName) = -1 Then ContinueLoop

            Assign($tVarName, "", 1)
            $aTarget[$iItems] = $aTarget[$i]
            $iItems += 1
        Next

    Else
        For $i = 0 To $iTotalSize -1
            $tVarName = "_" & StringToBinary(StringLower($aTarget[$i]), 2)
            If IsDeclared($tVarName) = -1 Then ContinueLoop

            Assign($tVarName, "", 1)
            $aTarget[$iItems] = $aTarget[$i]
            $iItems += 1
        Next
    EndIf

    Local $iParams = @NumParams
    If $iParams > 2 Then
        Local $aNextArray, $iBound
        For $i = 0 To $iParams -3
            $aNextArray = Eval('a' & $i)
            If Not IsArray($aNextArray) Or UBound($aNextArray, 0) <> 1 Then Return SetError(2, $i +3) ; Sets @Extended to the parameter which failed
            $iBound = UBound($aNextArray)

            $iTotalSize += $iBound
            ReDim $aTarget[$iItems + $iBound]

            If $iCasesense Then
                For $j = 0 To $iBound -1
                    $tVarName = "_" & StringToBinary($aNextArray[$j], 2)
                    If IsDeclared($tVarName) = -1 Then ContinueLoop

                    Assign($tVarName, "", 1)
                    $aTarget[$iItems] = $aNextArray[$j]
                    $iItems += 1
                Next

            Else
                For $j = 0 To $iBound -1
                    $tVarName = "_" & StringToBinary(StringLower($aNextArray[$j]), 2)
                    If IsDeclared($tVarName) = -1 Then ContinueLoop

                    Assign($tVarName, "", 1)
                    $aTarget[$iItems] = $aNextArray[$j]
                    $iItems += 1
                Next
            EndIf
            Execute('_FreeMemory($a' & $i & ')')
        Next
    EndIf
    ReDim $aTarget[$iItems]

    Return $iTotalSize - $iItems ; Return the number of duplicates removed
EndFunc ; _ArrayUniqueConcatenate

Func _FreeMemory(ByRef $vParam)
    $vParam = 0
EndFunc

;

In the following test, after randomly filling 24 arrays of 50000 elements (each with 2 ascii characters),  the function searches (case insensitive) through 1200000 elements removing all duplicates in just a few seconds. Filling the arrays takes a few seconds to begin with (watch the SciTE console). It should hit the expected limit of 38416 possible 2 case insensitive character combinations and remove 1161584 duplicates. It takes about 13 12 seconds on my machine. Also works with unicode.

;

#include <Array.au3>
#include <String.au3>

Global $a1[50000], $a2[50000], $a3[50000], $a4[50000], $a5[50000], $a6[50000], $a7[50000], $a8[50000], _
$a9[50000], $a10[50000], $a11[50000], $a12[50000], $a13[50000], $a14[50000], $a15[50000], $a16[50000], _
$a17[50000], $a18[50000], $a19[50000], $a20[50000], $a21[50000], $a22[50000], $a23[50000], $a24[50000]

ConsoleWrite("Populating Arrays" & @LF)
For $i = 1 To 24
    Execute('_Fill($a' & $i & ')')
Next

ConsoleWrite("Starting Timer" & @LF)
Local $iTimer = TimerInit()
Local $ret = _ArrayUniqueConcatenate($a1, 0, $a2, $a3, $a4, $a5, $a6, $a7, $a8, $a9, $a10, $a11, $a12, $a13, $a14, $a15, $a16, $a17, $a18, $a19, $a20, $a21, $a22, $a23, $a24)
ConsoleWrite("Error = " & @error & @lf & "Seconds = " & TimerDiff($iTimer)/1000 & @LF & "Unique Items = " & UBound($a1) & @LF & "Duplicates removed = " & $ret & @LF)

For $i = 2 To 24
    Execute('_FreeMemory($a' & $i & ')')
Next

_ArrayDisplay($a1)

Func _FreeMemory(ByRef $vParam)
    $vParam = 0
EndFunc

Func _Fill(ByRef $aArray)
    For $i = 0 To UBound($aArray) -1
        $aArray[$i] = _HexToString(_RandomHexStr(4))
    Next
EndFunc

Func _RandomHexStr($sLen)
    Local $sHexString = ""
    For $i = 1 To $sLen
        $sHexString &= StringRight(Hex(Random(0, 15, 1)), 1)
    Next
    Return $sHexString
EndFunc ;==> _RandomHexStr

Func _ArrayUniqueConcatenate(ByRef $aTarget, $iCasesense = 0, _ ; up to 23 more arrays can be included
    $a0 = 0, $a1 = 0, $a2 = 0, $a3 = 0, $a4 = 0, $a5 = 0, $a6 = 0, $a7 = 0, $a8 = 0, $a9 = 0, $a10 = 0, $a11 = 0, _
    $a12 = 0, $a13 = 0, $a14 = 0, $a15 = 0, $a16 = 0, $a17 = 0, $a18 = 0, $a19 = 0, $a20 = 0, $a21 = 0, $a22 = 0)

    #forceref $a0, $a1, $a2, $a3, $a4, $a5, $a6, $a7, $a8, $a9, $a10, $a11, $a12, $a13, $a14, $a15, $a16, $a17, $a18, $a19, $a20, $a21, $a22
    If Not IsArray($aTarget) Or UBound($aTarget, 0) <> 1 Then Return SetError(1)

    Local $iTotalSize = UBound($aTarget), $iItems = 0, $tVarName
    If $iCasesense Then
        For $i = 0 To $iTotalSize -1
            $tVarName = "_" & StringToBinary($aTarget[$i], 2)
            If IsDeclared($tVarName) = -1 Then ContinueLoop

            Assign($tVarName, "", 1)
            $aTarget[$iItems] = $aTarget[$i]
            $iItems += 1
        Next

    Else
        For $i = 0 To $iTotalSize -1
            $tVarName = "_" & StringToBinary(StringLower($aTarget[$i]), 2)
            If IsDeclared($tVarName) = -1 Then ContinueLoop

            Assign($tVarName, "", 1)
            $aTarget[$iItems] = $aTarget[$i]
            $iItems += 1
        Next
    EndIf

    Local $iParams = @NumParams
    If $iParams > 2 Then
        Local $aNextArray, $iBound
        For $i = 0 To $iParams -3
            $aNextArray = Eval('a' & $i)
            If Not IsArray($aNextArray) Or UBound($aNextArray, 0) <> 1 Then Return SetError(2, $i +3) ; Sets @Extended to the parameter which failed
            $iBound = UBound($aNextArray)

            $iTotalSize += $iBound
            ReDim $aTarget[$iItems + $iBound]

            If $iCasesense Then
                For $j = 0 To $iBound -1
                    $tVarName = "_" & StringToBinary($aNextArray[$j], 2)
                    If IsDeclared($tVarName) = -1 Then ContinueLoop

                    Assign($tVarName, "", 1)
                    $aTarget[$iItems] = $aNextArray[$j]
                    $iItems += 1
                Next

            Else
                For $j = 0 To $iBound -1
                    $tVarName = "_" & StringToBinary(StringLower($aNextArray[$j]), 2)
                    If IsDeclared($tVarName) = -1 Then ContinueLoop

                    Assign($tVarName, "", 1)
                    $aTarget[$iItems] = $aNextArray[$j]
                    $iItems += 1
                Next
            EndIf
            Execute('_FreeMemory($a' & $i & ')')
        Next
    EndIf
    ReDim $aTarget[$iItems]

    Return $iTotalSize - $iItems ; Return the number of duplicates removed
EndFunc ; _ArrayUniqueConcatenate
Edited by czardas
  • Like 1

Share this post


Link to post
Share on other sites
JohnOne

9.5 seconds here :)

What if, instead of taking a load of arrays as params, and checking how many were passed, you took an array of arrays by reference?

It would remove limit of amount of arrays that can be passed, but put onus on the caller to create the array of arrays.

I've done something like that before, and I'm certain it speeded it up too.


AutoIt Absolute Beginners    Require a serial    Pause Script    Video Tutorials by Morthawt   ipify 

Monkey's are, like, natures humans.

Share this post


Link to post
Share on other sites
czardas

Well using an array of arrays is generally not recomended, or at least it didn't used to be. It also looks as if the code can be simplified, but it might  well introduce a time penalty. Using a helper function is not feasible since it needs to test the existance of local variables created within the function. For these reasons part of the code repeats.

Edited by czardas

Share this post


Link to post
Share on other sites
BrewManNH

The only prohibition against using putting an array inside an array is that you have to know how to address it correctly, there's generally no problems actually using them if you do know how. It's an advanced feature not recommended for the faint of heart or a newbie.

If you're not using an array of arrays, you should probably use ByRef for all your arrays being passed if you want to limit the amount of memory used by the function. As long as you're not altering the incoming arrays, there shouldn't be any downside to doing it that way.


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites
czardas

Thanks for testing this.


There is no way to pass optional ByRef parameters in AutoIt, otherwise I would have done so. I thought of passing an array of arrays but decided that nobody ever does this for a reason. Sure it can be done. If you want to pass arrays of arrays then it's easy enough to modify. I've never needed to use an array of arrays, and I've seldom seen one used as a function parameter. I was under the impression that there are performance related issues when doing this. I've added a line to free up memory as you go. This should allow larger input.

The function could also be modified to return the item count in the first element. I may do this later, however I wanted to keep it simple and practical, so all input and output ended up 0-based. I think 24 arrays are enough for most practical purposes. Add as many extra parameters as you want to the function. It won't break anything. :)


Fixed => I got the case sensitivity working backwards. Updated first post. >_<

After increasing the length of the strings in the arrays, it just processed 146 MB of string data in 60 seconds. :D

Edited by czardas

Share this post


Link to post
Share on other sites
czardas

After further tests, I must warn anyone using this function that performance will degrade with very large arrays. The limitations are not clear: because it depends on the number of duplicates and available RAM. The results of one test showed that 2,400,000 elements containing random strings of between 1 and 64 characters (that's approx - 76,800,000 characters in total) returned 2,347,389 unique items after removing 52,611 duplicates in 5 minutes and 15 seconds using 2GB of RAM. Performance degrades because for each new item a local variable is created (more unique items = lower performance). Therefore the number of expected duplicates (more duplicates = better performance) affects the amount of data this function can handle.

Edited by czardas

Share this post


Link to post
Share on other sites
Cravin

:D Fantastic work, czardas/kylomas!  Glad I could inspire such collaborative effort through a single question!

Edited by Cravin
  • Like 1

Share this post


Link to post
Share on other sites
czardas

Initial attempts to simplify the code in the first post produced a 20% reduction in speed. I think this is interesting: because what appears to be an unecessary duplication of arguments is noticeably more efficient than the less bulky code in the spoiler below. The original function (in the 1st post) is 20% faster.

Func _ArrayUniqueConcatenate(ByRef $aTarget, $iCasesense = 0, _ ; up to 23 more arrays can be included
    $a0 = 0, $a1 = 0, $a2 = 0, $a3 = 0, $a4 = 0, $a5 = 0, $a6 = 0, $a7 = 0, $a8 = 0, $a9 = 0, $a10 = 0, $a11 = 0, _
    $a12 = 0, $a13 = 0, $a14 = 0, $a15 = 0, $a16 = 0, $a17 = 0, $a18 = 0, $a19 = 0, $a20 = 0, $a21 = 0, $a22 = 0)

    #forceref $a0, $a1, $a2, $a3, $a4, $a5, $a6, $a7, $a8, $a9, $a10, $a11, $a12, $a13, $a14, $a15, $a16, $a17, $a18, $a19, $a20, $a21, $a22
    If Not IsArray($aTarget) Or UBound($aTarget, 0) <> 1 Then Return SetError(1)

    Local $iTotalSize = UBound($aTarget), $iItems = 0, $tVarName, $aExpression[2]
    $aExpression[0] = "StringToBinary(StringLower($aTarget[$i]), 2)"
    $aExpression[1] = "StringToBinary($aTarget[$i], 2)"
    
    If $iCasesense <> 0 Then $iCasesense = 1

    For $i = 0 To $iTotalSize -1
        $tVarName = "_" & Execute($aExpression[$iCasesense])
        If IsDeclared($tVarName) = -1 Then ContinueLoop

        Assign($tVarName, "", 1)
        $aTarget[$iItems] = $aTarget[$i]
        $iItems += 1
    Next

    Local $iParams = @NumParams
    If $iParams > 2 Then
        $aExpression[0] = "StringToBinary(StringLower($aNextArray[$j]), 2)"
        $aExpression[1] = "StringToBinary($aNextArray[$j], 2)"
        
        Local $aNextArray, $iBound
        For $i = 0 To $iParams -3
            $aNextArray = Eval('a' & $i)
            If Not IsArray($aNextArray) Or UBound($aNextArray, 0) <> 1 Then Return SetError(2, $i +3) ; Sets @Extended to the parameter which failed
            $iBound = UBound($aNextArray)

            $iTotalSize += $iBound
            ReDim $aTarget[$iItems + $iBound]

            For $j = 0 To $iBound -1
                $tVarName = "_" & Execute($aExpression[$iCasesense])
                If IsDeclared($tVarName) = -1 Then ContinueLoop
                    
                Assign($tVarName, "", 1)
                $aTarget[$iItems] = $aNextArray[$j]
                $iItems += 1
            Next
            Execute('_FreeMemory($a' & $i & ')')
        Next
    EndIf
    ReDim $aTarget[$iItems]

    Return $iTotalSize - $iItems ; Return the number of duplicates removed
EndFunc ; _ArrayUniqueConcatenate

Func _FreeMemory(ByRef $vParam)
    $vParam = 0
EndFunc

This would appear to question the validity of some good coding practices (in certain situations):. ie the practice of using encapsulation instead of simply repeating the same arguments. I don't know of any more ways to encapsulate this function with the methods it uses. I don't believe using recursion will improve it. :unsure:

Edited by czardas

Share this post


Link to post
Share on other sites
czardas

After renaming one or two parameters, I could easily rewrite this function using more compact and better organized code. While the code in the previous post above is surprisingly sluggish, this version appears to have a slight edge on the original function.

;

Func _ArrayUniqueConcatenate(ByRef $a1, $iCasesense = 0, _ ; up to 23 more arrays can be included
    $a2 = 0, $a3 = 0, $a4 = 0, $a5 = 0, $a6 = 0, $a7 = 0, $a8 = 0, $a9 = 0, $a10 = 0, $a11 = 0, $a12 = 0, $a13 = 0, _
    $a14 = 0, $a15 = 0, $a16 = 0, $a17 = 0, $a18 = 0, $a19 = 0, $a20 = 0, $a21 = 0, $a22 = 0, $a23 = 0, $a24 = 0)
    #forceref $a1, $a2, $a3, $a4, $a5, $a6, $a7, $a8, $a9, $a10, $a11, $a12, $a13, $a14, $a15, $a16, $a17, $a18, $a19, $a20, $a21, $a22, $a23, $a24

    Local $aNextArray, $iBound, $tVarName, $iTotalSize = 0, $iItems = 0, $iParams = @NumParams

    If $iParams = 1 Then $iParams = 2
    For $i = 1 To $iParams -1
        $aNextArray = Eval('a' & $i)
        If Not IsArray($aNextArray) Or UBound($aNextArray, 0) <> 1 Then Return SetError(1, $i + ($i > 1)) ; Sets @Extended to the parameter which failed

        $iBound = UBound($aNextArray)
        If $i > 1 Then ReDim $a1[$iItems + $iBound]

        If $iCasesense Then
            For $j = 0 To $iBound -1
                $tVarName = "_" & StringToBinary($aNextArray[$j], 2)
                If IsDeclared($tVarName) = -1 Then ContinueLoop

                Assign($tVarName, "", 1)
                $a1[$iItems] = $aNextArray[$j]
                $iItems += 1
            Next

        Else
            For $j = 0 To $iBound -1
                $tVarName = "_" & StringToBinary(StringLower($aNextArray[$j]), 2)
                If IsDeclared($tVarName) = -1 Then ContinueLoop

                Assign($tVarName, "", 1)
                $a1[$iItems] = $aNextArray[$j]
                $iItems += 1
            Next
        EndIf
        If $i > 1 Then Execute('_FreeMemory($a' & $i & ')')
        $iTotalSize += $iBound
    Next
    ReDim $a1[$iItems]

    Return $iTotalSize - $iItems ; Return the number of duplicates removed
EndFunc ; _ArrayUniqueConcatenate

Func _FreeMemory(ByRef $vParam)
    $vParam = 0
EndFunc

;

Don't mean to bump threads, just letting you know the code has been improved.

Edited by czardas

Share this post


Link to post
Share on other sites
Cravin

Nice, I'll have to try this updated version in the data backup script I have written.  Thanks!

Share this post


Link to post
Share on other sites
czardas

Nice, I'll have to try this updated version in the data backup script I have written.  Thanks!

 

Your question and kylomas' idea inspired me. Having to wait so long for _ArrayUnique() has been an issue for me in the past too, so creating this brings rewards for me also. The new version is just more compact and I think the code is neater. Time for some proper documentation after all this testing. :)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Similar Content

    • Skeletor
      By Skeletor
      Hi Virtual People,
      My array works perfectly fine. However, what is the best practice if the line in the array doesn't have the correct amount of columns and if I can add a placeholder?

       
      For $count = 1 To _FileCountLines($FileRead1) Step 1 $string = FileReadLine($FileRead1, $count) $input = StringSplit($string, ",", 1) $value1 = $input[1] $value2 = $input[2] $value3 = $input[3] _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value2, "A1") _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value1, "B1") _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value3, "C1") Next  
    • MrCheese
      By MrCheese
      hi all,
      reviewing the forum, this thread is applicable: 
       
       
      I wanted to know if there is now a better way to do this?
      In essence, I load a tab delimited txt file into an array (works well). I used tab, as some fields in the original csv contains commas.
      However, I needed autoit to manipulate this array, and output it as a csv.
      IF my array contains items with a comma, without double quotes around the field, then how best do I get a csv out of this?
      My current workaround is to filewritefromarray tab delimited, then open it in excel and save as a csv. I will need to check this to see how the address fields behave that contain a comma.
       
      Any thoughts would be appreciated.
       
    • Skeletor
      By Skeletor
      Hi All,

      I would like to know how you would take a FileLineRead and insert it into an array which then inserts it into Excel?
      One thing to know is the files content is broken up, so I only use half of the content within $FileRead1.
      So its imperative that the $value1, $value2, etc variables be used. 
      Code below:
      $FileRead1 = FileReadLine("C:\temp\sample.txt",1) For $count = 1 To _FileCountLines($FileRead1) Step 1 $string = FileReadLine($FileRead1, $count) $input = StringSplit($string, ",", 1) $value1 = $input[1] $value2 = $input[2] $value3 = $input[3] $value4 = $input[4] _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value1, "A1") _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value2, "B1") _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value3, "C1") _Excel_RangeWrite($oWorkbook, $oWorkbook.Activesheet, $value4, "D1") Next  
    • AnonymousX
      By AnonymousX
      Hello,
      I'm trying to write a script that moves copies excel cells into an array. I'll than manipulate the values and send array into another program. 
      I don't want range to be specific to a workbook, or sheet, or set of cells.
      I want user to be able to highlight desired cells and to copy either normally ("Ctrl+C") or by a hotkey ("Alt+C"). 
      Could someone help me with this?
      Thank you,
      I've tried to write the framework: (edited)
      #include <MsgBoxConstants.au3> #include <Array.au3> #include <Excel.au3> HotKeySet("!v", "Pastedata") While True Sleep(1000) WEnd func Makearray() local $bArray ;User has cells already copied ;Convert clipboard into an array ;I don;t know how excel stores data to clipboard so don;t know how to bring it into array _Arraydisplay($bArray) MsgBox(0,0,$bArray) return $bArray endfunc func Pastedata() Local $aArray MsgBox(0,0,"wait",1) ;make array based on assumption user has already copied a range to clipboard $aArray = Makearray() ;paste code ;don;t worry about this I got the rest endfunc  
    • Dzenan03
      By Dzenan03
      I want to make a while loop, that creates variables based on a array. For thist I created the array $iDsO with the number and the name of folders in an other folder. Every folder has a different name an I want to create variables(arrays) for each folder that show me all the files in that folder. For example: I have the Folder \Folder1. In it there are the Folders \1, \2, \3. In 1, 2 and 3 there are some files(.png). The array for Folder1 is $iDsO and now I want to crate the arrays $iDsO1, $iDsO2 and $iDsO3 with the files in them can I make something like this:
      While $iDs > 0 ;$iDs is the number of files in Folder1>> $iDsO[0] $iDs#here should come the Foldername for example '1'# = _FileListtoArray(@ProgramFilesDir&"\Folder1\"&$iDsO[$iDs]) $iDs = $iDs - 1 Wend So that in the End I have three variabels ($iDs1, $iDs2 and $iDs3)
       
      Is this posible or if not what could I do instead ( I don´t know the number of folders in Folder1 in the begining).
×