czardas Posted September 8, 2013 Posted September 8, 2013 (edited) After some insightful input by kylomas in >this topic, I decided to make a general resusable function using the same ideas. It handles up to 24 arrays. Zero based arrays go in, and the target array is returned ByRef. Before passing arrays to this function, you need to delete element 0 if it contains the item count; and you need to use Ubound to get the new size of the target array after using the function. If processing large arrays it is a good idea to delete the arrays you no longer need after the concatenation. The function returns the number of removed duplicates. Set case sensitivity using the second parameter 0 = case insensitive, 1 = case sensitive. ; expandcollapse popupFunc _ArrayUniqueConcatenate(ByRef $aTarget, $iCasesense = 0, _ ; up to 23 more arrays can be included $a0 = 0, $a1 = 0, $a2 = 0, $a3 = 0, $a4 = 0, $a5 = 0, $a6 = 0, $a7 = 0, $a8 = 0, $a9 = 0, $a10 = 0, $a11 = 0, _ $a12 = 0, $a13 = 0, $a14 = 0, $a15 = 0, $a16 = 0, $a17 = 0, $a18 = 0, $a19 = 0, $a20 = 0, $a21 = 0, $a22 = 0) #forceref $a0, $a1, $a2, $a3, $a4, $a5, $a6, $a7, $a8, $a9, $a10, $a11, $a12, $a13, $a14, $a15, $a16, $a17, $a18, $a19, $a20, $a21, $a22 If Not IsArray($aTarget) Or UBound($aTarget, 0) <> 1 Then Return SetError(1) Local $iTotalSize = UBound($aTarget), $iItems = 0, $tVarName If $iCasesense Then For $i = 0 To $iTotalSize -1 $tVarName = "_" & StringToBinary($aTarget[$i], 2) If IsDeclared($tVarName) = -1 Then ContinueLoop Assign($tVarName, "", 1) $aTarget[$iItems] = $aTarget[$i] $iItems += 1 Next Else For $i = 0 To $iTotalSize -1 $tVarName = "_" & StringToBinary(StringLower($aTarget[$i]), 2) If IsDeclared($tVarName) = -1 Then ContinueLoop Assign($tVarName, "", 1) $aTarget[$iItems] = $aTarget[$i] $iItems += 1 Next EndIf Local $iParams = @NumParams If $iParams > 2 Then Local $aNextArray, $iBound For $i = 0 To $iParams -3 $aNextArray = Eval('a' & $i) If Not IsArray($aNextArray) Or UBound($aNextArray, 0) <> 1 Then Return SetError(2, $i +3) ; Sets @Extended to the parameter which failed $iBound = UBound($aNextArray) $iTotalSize += $iBound ReDim $aTarget[$iItems + $iBound] If $iCasesense Then For $j = 0 To $iBound -1 $tVarName = "_" & StringToBinary($aNextArray[$j], 2) If IsDeclared($tVarName) = -1 Then ContinueLoop Assign($tVarName, "", 1) $aTarget[$iItems] = $aNextArray[$j] $iItems += 1 Next Else For $j = 0 To $iBound -1 $tVarName = "_" & StringToBinary(StringLower($aNextArray[$j]), 2) If IsDeclared($tVarName) = -1 Then ContinueLoop Assign($tVarName, "", 1) $aTarget[$iItems] = $aNextArray[$j] $iItems += 1 Next EndIf Execute('_FreeMemory($a' & $i & ')') Next EndIf ReDim $aTarget[$iItems] Return $iTotalSize - $iItems ; Return the number of duplicates removed EndFunc ; _ArrayUniqueConcatenate Func _FreeMemory(ByRef $vParam) $vParam = 0 EndFunc ; In the following test, after randomly filling 24 arrays of 50000 elements (each with 2 ascii characters), the function searches (case insensitive) through 1200000 elements removing all duplicates in just a few seconds. Filling the arrays takes a few seconds to begin with (watch the SciTE console). It should hit the expected limit of 38416 possible 2 case insensitive character combinations and remove 1161584 duplicates. It takes about 13 12 seconds on my machine. Also works with unicode. ; expandcollapse popup#include <Array.au3> #include <String.au3> Global $a1[50000], $a2[50000], $a3[50000], $a4[50000], $a5[50000], $a6[50000], $a7[50000], $a8[50000], _ $a9[50000], $a10[50000], $a11[50000], $a12[50000], $a13[50000], $a14[50000], $a15[50000], $a16[50000], _ $a17[50000], $a18[50000], $a19[50000], $a20[50000], $a21[50000], $a22[50000], $a23[50000], $a24[50000] ConsoleWrite("Populating Arrays" & @LF) For $i = 1 To 24 Execute('_Fill($a' & $i & ')') Next ConsoleWrite("Starting Timer" & @LF) Local $iTimer = TimerInit() Local $ret = _ArrayUniqueConcatenate($a1, 0, $a2, $a3, $a4, $a5, $a6, $a7, $a8, $a9, $a10, $a11, $a12, $a13, $a14, $a15, $a16, $a17, $a18, $a19, $a20, $a21, $a22, $a23, $a24) ConsoleWrite("Error = " & @error & @lf & "Seconds = " & TimerDiff($iTimer)/1000 & @LF & "Unique Items = " & UBound($a1) & @LF & "Duplicates removed = " & $ret & @LF) For $i = 2 To 24 Execute('_FreeMemory($a' & $i & ')') Next _ArrayDisplay($a1) Func _FreeMemory(ByRef $vParam) $vParam = 0 EndFunc Func _Fill(ByRef $aArray) For $i = 0 To UBound($aArray) -1 $aArray[$i] = _HexToString(_RandomHexStr(4)) Next EndFunc Func _RandomHexStr($sLen) Local $sHexString = "" For $i = 1 To $sLen $sHexString &= StringRight(Hex(Random(0, 15, 1)), 1) Next Return $sHexString EndFunc ;==> _RandomHexStr Func _ArrayUniqueConcatenate(ByRef $aTarget, $iCasesense = 0, _ ; up to 23 more arrays can be included $a0 = 0, $a1 = 0, $a2 = 0, $a3 = 0, $a4 = 0, $a5 = 0, $a6 = 0, $a7 = 0, $a8 = 0, $a9 = 0, $a10 = 0, $a11 = 0, _ $a12 = 0, $a13 = 0, $a14 = 0, $a15 = 0, $a16 = 0, $a17 = 0, $a18 = 0, $a19 = 0, $a20 = 0, $a21 = 0, $a22 = 0) #forceref $a0, $a1, $a2, $a3, $a4, $a5, $a6, $a7, $a8, $a9, $a10, $a11, $a12, $a13, $a14, $a15, $a16, $a17, $a18, $a19, $a20, $a21, $a22 If Not IsArray($aTarget) Or UBound($aTarget, 0) <> 1 Then Return SetError(1) Local $iTotalSize = UBound($aTarget), $iItems = 0, $tVarName If $iCasesense Then For $i = 0 To $iTotalSize -1 $tVarName = "_" & StringToBinary($aTarget[$i], 2) If IsDeclared($tVarName) = -1 Then ContinueLoop Assign($tVarName, "", 1) $aTarget[$iItems] = $aTarget[$i] $iItems += 1 Next Else For $i = 0 To $iTotalSize -1 $tVarName = "_" & StringToBinary(StringLower($aTarget[$i]), 2) If IsDeclared($tVarName) = -1 Then ContinueLoop Assign($tVarName, "", 1) $aTarget[$iItems] = $aTarget[$i] $iItems += 1 Next EndIf Local $iParams = @NumParams If $iParams > 2 Then Local $aNextArray, $iBound For $i = 0 To $iParams -3 $aNextArray = Eval('a' & $i) If Not IsArray($aNextArray) Or UBound($aNextArray, 0) <> 1 Then Return SetError(2, $i +3) ; Sets @Extended to the parameter which failed $iBound = UBound($aNextArray) $iTotalSize += $iBound ReDim $aTarget[$iItems + $iBound] If $iCasesense Then For $j = 0 To $iBound -1 $tVarName = "_" & StringToBinary($aNextArray[$j], 2) If IsDeclared($tVarName) = -1 Then ContinueLoop Assign($tVarName, "", 1) $aTarget[$iItems] = $aNextArray[$j] $iItems += 1 Next Else For $j = 0 To $iBound -1 $tVarName = "_" & StringToBinary(StringLower($aNextArray[$j]), 2) If IsDeclared($tVarName) = -1 Then ContinueLoop Assign($tVarName, "", 1) $aTarget[$iItems] = $aNextArray[$j] $iItems += 1 Next EndIf Execute('_FreeMemory($a' & $i & ')') Next EndIf ReDim $aTarget[$iItems] Return $iTotalSize - $iItems ; Return the number of duplicates removed EndFunc ; _ArrayUniqueConcatenate Edited September 9, 2013 by czardas Cravin 1 operator64 ArrayWorkshop
JohnOne Posted September 8, 2013 Posted September 8, 2013 9.5 seconds here What if, instead of taking a load of arrays as params, and checking how many were passed, you took an array of arrays by reference? It would remove limit of amount of arrays that can be passed, but put onus on the caller to create the array of arrays. I've done something like that before, and I'm certain it speeded it up too. AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans.
czardas Posted September 8, 2013 Author Posted September 8, 2013 (edited) Well using an array of arrays is generally not recomended, or at least it didn't used to be. It also looks as if the code can be simplified, but it might well introduce a time penalty. Using a helper function is not feasible since it needs to test the existance of local variables created within the function. For these reasons part of the code repeats. Edited September 9, 2013 by czardas operator64 ArrayWorkshop
BrewManNH Posted September 9, 2013 Posted September 9, 2013 The only prohibition against using putting an array inside an array is that you have to know how to address it correctly, there's generally no problems actually using them if you do know how. It's an advanced feature not recommended for the faint of heart or a newbie. If you're not using an array of arrays, you should probably use ByRef for all your arrays being passed if you want to limit the amount of memory used by the function. As long as you're not altering the incoming arrays, there shouldn't be any downside to doing it that way. If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag GudeHow to ask questions the smart way! I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from. Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays. - ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script. - Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label. - _FileGetProperty - Retrieve the properties of a file - SciTE Toolbar - A toolbar demo for use with the SciTE editor - GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI. - Latin Square password generator
JohnOne Posted September 9, 2013 Posted September 9, 2013 I just tested with all 20+ array params ByRef individually, and there was no speed increase. I suppose arrays are passed by ref regardless of keyword. AutoIt Absolute Beginners Require a serial Pause Script Video Tutorials by Morthawt ipify Monkey's are, like, natures humans.
czardas Posted September 9, 2013 Author Posted September 9, 2013 (edited) Thanks for testing this. There is no way to pass optional ByRef parameters in AutoIt, otherwise I would have done so. I thought of passing an array of arrays but decided that nobody ever does this for a reason. Sure it can be done. If you want to pass arrays of arrays then it's easy enough to modify. I've never needed to use an array of arrays, and I've seldom seen one used as a function parameter. I was under the impression that there are performance related issues when doing this. I've added a line to free up memory as you go. This should allow larger input. The function could also be modified to return the item count in the first element. I may do this later, however I wanted to keep it simple and practical, so all input and output ended up 0-based. I think 24 arrays are enough for most practical purposes. Add as many extra parameters as you want to the function. It won't break anything. Fixed => I got the case sensitivity working backwards. Updated first post. After increasing the length of the strings in the arrays, it just processed 146 MB of string data in 60 seconds. Edited September 9, 2013 by czardas operator64 ArrayWorkshop
czardas Posted September 9, 2013 Author Posted September 9, 2013 (edited) After further tests, I must warn anyone using this function that performance will degrade with very large arrays. The limitations are not clear: because it depends on the number of duplicates and available RAM. The results of one test showed that 2,400,000 elements containing random strings of between 1 and 64 characters (that's approx - 76,800,000 characters in total) returned 2,347,389 unique items after removing 52,611 duplicates in 5 minutes and 15 seconds using 2GB of RAM. Performance degrades because for each new item a local variable is created (more unique items = lower performance). Therefore the number of expected duplicates (more duplicates = better performance) affects the amount of data this function can handle. Edited September 9, 2013 by czardas operator64 ArrayWorkshop
Cravin Posted September 10, 2013 Posted September 10, 2013 (edited) Fantastic work, czardas/kylomas! Glad I could inspire such collaborative effort through a single question! Edited September 10, 2013 by Cravin czardas 1
czardas Posted September 13, 2013 Author Posted September 13, 2013 (edited) Initial attempts to simplify the code in the first post produced a 20% reduction in speed. I think this is interesting: because what appears to be an unecessary duplication of arguments is noticeably more efficient than the less bulky code in the spoiler below. The original function (in the 1st post) is 20% faster. expandcollapse popupFunc _ArrayUniqueConcatenate(ByRef $aTarget, $iCasesense = 0, _ ; up to 23 more arrays can be included $a0 = 0, $a1 = 0, $a2 = 0, $a3 = 0, $a4 = 0, $a5 = 0, $a6 = 0, $a7 = 0, $a8 = 0, $a9 = 0, $a10 = 0, $a11 = 0, _ $a12 = 0, $a13 = 0, $a14 = 0, $a15 = 0, $a16 = 0, $a17 = 0, $a18 = 0, $a19 = 0, $a20 = 0, $a21 = 0, $a22 = 0) #forceref $a0, $a1, $a2, $a3, $a4, $a5, $a6, $a7, $a8, $a9, $a10, $a11, $a12, $a13, $a14, $a15, $a16, $a17, $a18, $a19, $a20, $a21, $a22 If Not IsArray($aTarget) Or UBound($aTarget, 0) <> 1 Then Return SetError(1) Local $iTotalSize = UBound($aTarget), $iItems = 0, $tVarName, $aExpression[2] $aExpression[0] = "StringToBinary(StringLower($aTarget[$i]), 2)" $aExpression[1] = "StringToBinary($aTarget[$i], 2)" If $iCasesense <> 0 Then $iCasesense = 1 For $i = 0 To $iTotalSize -1 $tVarName = "_" & Execute($aExpression[$iCasesense]) If IsDeclared($tVarName) = -1 Then ContinueLoop Assign($tVarName, "", 1) $aTarget[$iItems] = $aTarget[$i] $iItems += 1 Next Local $iParams = @NumParams If $iParams > 2 Then $aExpression[0] = "StringToBinary(StringLower($aNextArray[$j]), 2)" $aExpression[1] = "StringToBinary($aNextArray[$j], 2)" Local $aNextArray, $iBound For $i = 0 To $iParams -3 $aNextArray = Eval('a' & $i) If Not IsArray($aNextArray) Or UBound($aNextArray, 0) <> 1 Then Return SetError(2, $i +3) ; Sets @Extended to the parameter which failed $iBound = UBound($aNextArray) $iTotalSize += $iBound ReDim $aTarget[$iItems + $iBound] For $j = 0 To $iBound -1 $tVarName = "_" & Execute($aExpression[$iCasesense]) If IsDeclared($tVarName) = -1 Then ContinueLoop Assign($tVarName, "", 1) $aTarget[$iItems] = $aNextArray[$j] $iItems += 1 Next Execute('_FreeMemory($a' & $i & ')') Next EndIf ReDim $aTarget[$iItems] Return $iTotalSize - $iItems ; Return the number of duplicates removed EndFunc ; _ArrayUniqueConcatenate Func _FreeMemory(ByRef $vParam) $vParam = 0 EndFunc This would appear to question the validity of some good coding practices (in certain situations):. ie the practice of using encapsulation instead of simply repeating the same arguments. I don't know of any more ways to encapsulate this function with the methods it uses. I don't believe using recursion will improve it. Edited September 13, 2013 by czardas operator64 ArrayWorkshop
czardas Posted September 13, 2013 Author Posted September 13, 2013 (edited) After renaming one or two parameters, I could easily rewrite this function using more compact and better organized code. While the code in the previous post above is surprisingly sluggish, this version appears to have a slight edge on the original function. ; expandcollapse popupFunc _ArrayUniqueConcatenate(ByRef $a1, $iCasesense = 0, _ ; up to 23 more arrays can be included $a2 = 0, $a3 = 0, $a4 = 0, $a5 = 0, $a6 = 0, $a7 = 0, $a8 = 0, $a9 = 0, $a10 = 0, $a11 = 0, $a12 = 0, $a13 = 0, _ $a14 = 0, $a15 = 0, $a16 = 0, $a17 = 0, $a18 = 0, $a19 = 0, $a20 = 0, $a21 = 0, $a22 = 0, $a23 = 0, $a24 = 0) #forceref $a1, $a2, $a3, $a4, $a5, $a6, $a7, $a8, $a9, $a10, $a11, $a12, $a13, $a14, $a15, $a16, $a17, $a18, $a19, $a20, $a21, $a22, $a23, $a24 Local $aNextArray, $iBound, $tVarName, $iTotalSize = 0, $iItems = 0, $iParams = @NumParams If $iParams = 1 Then $iParams = 2 For $i = 1 To $iParams -1 $aNextArray = Eval('a' & $i) If Not IsArray($aNextArray) Or UBound($aNextArray, 0) <> 1 Then Return SetError(1, $i + ($i > 1)) ; Sets @Extended to the parameter which failed $iBound = UBound($aNextArray) If $i > 1 Then ReDim $a1[$iItems + $iBound] If $iCasesense Then For $j = 0 To $iBound -1 $tVarName = "_" & StringToBinary($aNextArray[$j], 2) If IsDeclared($tVarName) = -1 Then ContinueLoop Assign($tVarName, "", 1) $a1[$iItems] = $aNextArray[$j] $iItems += 1 Next Else For $j = 0 To $iBound -1 $tVarName = "_" & StringToBinary(StringLower($aNextArray[$j]), 2) If IsDeclared($tVarName) = -1 Then ContinueLoop Assign($tVarName, "", 1) $a1[$iItems] = $aNextArray[$j] $iItems += 1 Next EndIf If $i > 1 Then Execute('_FreeMemory($a' & $i & ')') $iTotalSize += $iBound Next ReDim $a1[$iItems] Return $iTotalSize - $iItems ; Return the number of duplicates removed EndFunc ; _ArrayUniqueConcatenate Func _FreeMemory(ByRef $vParam) $vParam = 0 EndFunc ; Don't mean to bump threads, just letting you know the code has been improved. Edited October 17, 2013 by czardas operator64 ArrayWorkshop
Cravin Posted September 13, 2013 Posted September 13, 2013 Nice, I'll have to try this updated version in the data backup script I have written. Thanks!
czardas Posted September 13, 2013 Author Posted September 13, 2013 Nice, I'll have to try this updated version in the data backup script I have written. Thanks! Your question and kylomas' idea inspired me. Having to wait so long for _ArrayUnique() has been an issue for me in the past too, so creating this brings rewards for me also. The new version is just more compact and I think the code is neater. Time for some proper documentation after all this testing. operator64 ArrayWorkshop
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now