Nutster

Associative Array functions

78 posts in this topic

#1 ·  Posted (edited)

An associative array is an array where the keys (index numbers) are string instead of integer, as they are in AutoIt. Linux / Unix awk and perl use associative arrays by default. The following functions can be used to manage a version of associative arrays in single AutoIt variables.

A hash table is used to store the values because it is faster to insert than other options I can think of and pretty much as fast as them to retrieve the data. The memory requirements are a little higher than other methods, but I think it is a reasonable trade-off. It uses a simple hash function that can probably be easily improved upon and while it can resize the hash table, the resize is slow so try to make your initial size large enough for all your elements.

Attached please find an Include file and an exerciser that demonstrates how to use my associative array functions. Read the comments at the top of each function for more details on their use.

Put #Include "AssocArray.au3" at the beginning of your script, and then call the functions.

Functions:

AssocArrayCreate: Use to turn an existing variable into an associative array.

AssocArrayAssign: Assign a value into an associative array.

AssocArrayGet: Get a value in an associative array. Sets @Error to 1 if not found.

AssocArrayDelete (New): Removes a key-value pair from the associative array.

AssocArrayExist (New): Determines if a given key exists in an associative array.

AssocArrayVerify (New): Verifies that the given array contains an associative array.

AssocArrayKeys (New): Returns an array with the active keys of the associative array.

AssocArraySave: Save the hash table to a CSV text file.

AssocArrayLoad: Load a CSV file into a hash table.

Support Functions (not called directly, used by the above functions):

HashPos: Hash function that determines where in the hash table to locate the key/value pair.

NotPrime: Used while determining size of the hash table.

Rehash_Grow (New): Causes the associative array to grow according to its growth rate.

_UBitShift: Used by HashPos for bit-shifting.

QuoteString: Put quotes around strings and leave everything else alone.

CSVParse: Converts a line of CSV text into a 1-dimensional array.

Edit: Added more detail and fix spelling arrears.

Edit: Updated files with AssocArraySave and AssocArrayLoad

Edit: Another update, including AssocArrayDelete, AssocArrayExist, AssocArrayVerify, Rehash_Grow

Edit: Bug fixes

AssocArrays.au3

AA_Exerciser.au3

Edited by Nutster

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Share this post


Link to post
Share on other sites



I was just building on my script when I got to the situation when i needed something just like associative arrays and what did I see when I was on my way to the forum??

I saw that you just released this!! ;)

You must be a mind-reader or something cause the timing was precise!!

You saved my script!! :)^_^ this is fu :) ing perfect!!

THANKS!!

Share this post


Link to post
Share on other sites

The topic of associative arrays and hashes is widely covered in numerous posts.

One of which is here: http://www.autoitscript.com/forum/index.php?showtopic=21709 in which the speed measurments were tested a bit :)

Also elTorro's topic from new ones is interesting http://www.autoitscript.com/forum/index.php?showtopic=58982

I didn't yet tested Nutster or elTorro's solution but would be nice to check how they compare (speed wise) + those that were presented in the first thread.


My little company: Evotec (PL version: Evotec)

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

An associative array is an array where the keys (index numbers) are string instead of integer, as they are in AutoIt. Linux / Unix awk and perl use associative arrays by default. The following functions can be used to manage a version of associative arrays in single AutoIt variables.

A hash table is used to store the values because it is faster to insert than other options I can think of and pretty much as fast as them to retrieve the data. The memory requirements are a little higher than other methods, but I think it is a reasonable trade-off. It uses a simple hash function that can probably be easily improved upon and while it can resize the hash table, the resize is slow so try to make your initial size large enough for all your elements.

Attached please find an Include file and an exerciser that demonstrates how to use my associative array functions. Read the comments at the top of each function for more details on their use.

Put #Include "AssocArray.au3" at the beginning of your script, and then call the functions.

Functions:

AssocArrayCreate: Use to turn an existing variable into an associative array.

AssocArrayAssign: Assign a value into an associative array.

AssocArrayGet: Get a value in an associative array. Sets @Error to 1 if not found.

Support Functions (not called directly, used by the above functions):

HashPos: Hash function that determines where in the hash table to locate the key/value pair.

NotPrime: Used while determining size of the hash table.

Edit: Added more detail and fix spelling arrears.

The hash idea is a bit complicated and I don't see what the advantage is. Isn't it simpler, and faster, to do something like this. (It's only shown as an idea and obviously not fully developed.)

;AssocArray3.au3 - replaces AssocArray2.au3
Func AssocArrayCreate(ByRef $aArray, Const $nSize, Const $nGrowth = 30)
    If IsNumber($nSize) = 0 Then
; Need a number here
        Return False
    ElseIf $nSize < 1 Then
; Too small
        Return False
    EndIf
    Local $Temp[2][$nSize]

    $aArray = $Temp
    $aArray[0][0] = 1;the next element to be filled
    Local $itic = DllCall("Kernel32.dll", "int", "GetTickCount")
    Local $Prefix = "Assoc1_" & @YEAR & @YDAY & String($itic[0])

;commented out the next lines which should not be there 15th Dec 2007
;   Local $set = 1
;   While IsDeclared($Prefix)
;       $set += 1
;       $Prefix = "Assoc" & $set & "_"
;   WEnd
    $aArray[1][0] = $Prefix
    Return True
EndFunc;==>AssocArrayCreate

Func AssocArrayAssign(ByRef $aArray, Const $sKey, Const $vValue)
;comment out next line untill I work out what's wrong with it
      ;If Not IsDeclared($aArray[1][0]) Then _arrayReset($aArray)
    Local $ConstVal = AssignNum($aArray, $sKey)
    If $ConstVal >= UBound($aArray, 2) - 1 Then
        ReDim $aArray[2][$ConstVal + 100]
    EndIf
    $aArray[1][$ConstVal] = $vValue
    $aArray[0][$ConstVal] = $sKey
    Return True
EndFunc;==>AssocArrayAssign

Func AssocArrayGet(ByRef Const $aArray, Const $sKey)
;comment out next line untill I work out what's wrong with it
      ;If Not IsDeclared($aArray[1][0]) Then _arrayReset($aArray)
    If Not IsDeclared($aArray[1][0] & $sKey) Then
        SetError(1)
        Return False
    EndIf
    Return $aArray[1][Eval($aArray[1][0] & $sKey) ]
EndFunc;==>AssocArrayGet



Func AssignNum(ByRef $aRay, $const)
    Local $sConst = $aRay[1][0] & String($const)
    If Not IsDeclared($sConst) Then
        Assign($sConst, $aRay[0][0], 2);global scope
        $aRay[0][0] += 1
    EndIf
    Return (Eval($sConst))
EndFunc;==>AssignNum

;If the array is saved then opened in a new script we will need this
Func _arrayReset($aArray)
    Local $n
    
    For $n = 1 To $aArray[0][0] - 1
        If $aArray[0][$n] <> '' Then
            If Not IsDeclared($aArray[1][0] & $aArray[0][$n]) Then
                Assign($aArray[1][0] & $aArray[0][$n], $n, 2)
            EndIf
                
        EndIf
    Next

    
EndFunc;==>_arrayInitialise

EDIT 13th Dec:

Replaced by version which doesn't have the initial global variables, and ensured a newly created array will not have keys that clash with another array.

EDIT 15th Dec - commented out some lines left by mistake.

Edited by martin

Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.

Share this post


Link to post
Share on other sites

I've used this option with success in AutoIt2.64 (~2002 year) and AutoIt3, too. You can see current application of this here as basic way to control drag of buttons in AJump game.


The point of world view

Share this post


Link to post
Share on other sites

I've used this option with success in AutoIt2.64 (~2002 year) and AutoIt3, too. You can see current application of this here as basic way to control drag of buttons in AJump game.

So you have. (Nice game too.)


Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.

Share this post


Link to post
Share on other sites

I've used this option with success in AutoIt2.64 (~2002 year) and AutoIt3, too. You can see current application of this here as basic way to control drag of buttons in AJump game.

Well the problem is not with existing Hash Functions or Associative Arrays since we've seen them in the forum since long time. Problem is with speed of those. For 17000 entries it's not as fast as i would love it to be :) So every new solution is welcome ^_^


My little company: Evotec (PL version: Evotec)

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

hi,

Thanks..

What am I doing wrong here? (it always stalls at item 62)

Can you please correct this as an example script?

Thanks, Randall

; HashTestAssoc.au3
#include "AssocArrays.au3"
Global $iCount, $iSize, $iSize1=170
Global $nTmp
Global $sName
Global $avArray
Local $aArrayfile[$iSize1], $sStringFile, $aArray1 ;= StringSplit($data, ":")
;======================================================
AssocArrayCreate ($avArray, $iSize1)    ; Create the associative array with storage for $iSize1 elements.
For $i = 1 To $iSize1-1
    ConsoleWrite("Value&$i=" & "Value"&$i & @LF)
    If AssocArrayAssign ($avArray, "Key"&$i,"Value"&$i)= False Then ExitLoop
    ConsoleWrite("Value&$i=" & "Value"&$i & @LF)
Next
;~ ;======================================================
$timer2 = TimerInit()
$sValue = AssocArrayGet ($avArray, "Key13")
ConsoleWrite("$sValue=" & $sValue & @LF)
ConsoleWrite("Just Find AssocArrayGet;" & Round(TimerDiff($timer2), 2) & "msecs" & @LF)
;~ ;======================================================
Edited by randallc

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

It's indeed not working correctly. But i have now checked code and it seems like Nutster is using Ubound every value assign which with more elements will be slowing things down ;|

Edited by MadBoy

My little company: Evotec (PL version: Evotec)

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

hi,

Thanks..

What am I doing wrong here? (it always stalls at item 62)

Can you please correct this as an example script?

Thanks, Randall

; HashTestAssoc.au3
#include "AssocArrays.au3"
Global $iCount, $iSize, $iSize1=170
Global $nTmp
Global $sName
Global $avArray
Local $aArrayfile[$iSize1], $sStringFile, $aArray1 ;= StringSplit($data, ":")
;======================================================
AssocArrayCreate ($avArray, $iSize1)    ; Create the associative array with storage for $iSize1 elements.
For $i = 1 To $iSize1-1
    ConsoleWrite("Value&$i=" & "Value"&$i & @LF)
    If AssocArrayAssign ($avArray, "Key"&$i,"Value"&$i)= False Then ExitLoop
    ConsoleWrite("Value&$i=" & "Value"&$i & @LF)
Next
;~ ;======================================================
$timer2 = TimerInit()
$sValue = AssocArrayGet ($avArray, "Key13")
ConsoleWrite("$sValue=" & $sValue & @LF)
ConsoleWrite("Just Find AssocArrayGet;" & Round(TimerDiff($timer2), 2) & "msecs" & @LF)
;~ ;======================================================
I have changed the code in post #5 and that seems to work OK now with your code.

I would expect that the method I used would be faster for AssocArrayGet but I haven't done any test to compare.

Edited by martin

Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.

Share this post


Link to post
Share on other sites

The hash idea is a bit complicated and I don't see what the advantage is. Isn't it simpler, and faster, to do something like this? It's only shown as an idea and obviously not fully developed.

Not everybody is comfortable with hash tables. I can appreciate that. Not all of us have been through computer science classes at university yet.

Hash tables do add a layer of complexity but with a good design, a hash table can be a lot faster than other searches, including other fast searches, like binary search. Hash tables work by using a hash function which takes a given key and generates an index into an array for the location of the data and can jump directly to that location. If the hashing function is fast and distributive enough, search speeds can be comparable with other fast searches, like binary search. In either case, fast searches are much faster than linear searches. For example, on 1000 item list, assuming the hash function takes the time of six direct searches, with no more than 3 hash collisions:

  • Balanced Binary search - Min: 1, Max: 10, Avg: 9
  • Linear search - Min: 1, Max: 1000, Avg: 500
  • Hash table search - Min: 6, Max: 9, Avg: 7

Insertion speed of a random key in a hash table is just about as fast as reading it. The binary search works by having a sorted list to search through. This is why it can quickly find the element, but inserting an element is not a trivial task. For the linear search, just drop the element at the end of the list. The insertion comparison speed with the same conditions as above to add a random element 1001 would be:

  • Balanced Binary search - Min: 1, Max: 1000, Avg: 500
  • Linear search - Min: 1, Max: 1, Avg: 1
  • Hash table search - Min: 6, Max: 9, Avg: 7

As a comparison, hash tables can be a lot faster for a large data set than using all those built-in linear searches that are employed in StringInStr and the AutoIt variable system. I should know, I wrote the variable storage system. Assign and Eval are nice functions to use, but I expect they are actually slower by comparison to what I am doing.

Part of my design strategy was to store all the keys and values in one array (variable). This way you could have several associative arrays running at the same time without having to worry about key collisions between different arrays. You clearly are using several global variables to store your results, which is not what I was hoping to achieve.


David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Share this post


Link to post
Share on other sites

Not everybody is comfortable with hash tables. I can appreciate that. Not all of us have been through computer science classes at university yet.

Hash tables do add a layer of complexity but with a good design, a hash table can be a lot faster than other searches, including other fast searches, like binary search. Hash tables work by using a hash function which takes a given key and generates an index into an array for the location of the data and can jump directly to that location. If the hashing function is fast and distributive enough, search speeds can be comparable with other fast searches, like binary search. In either case, fast searches are much faster than linear searches. For example, on 1000 item list, assuming the hash function takes the time of six direct searches, with no more than 3 hash collisions:

  • Balanced Binary search - Min: 1, Max: 10, Avg: 9
  • Linear search - Min: 1, Max: 1000, Avg: 500
  • Hash table search - Min: 6, Max: 9, Avg: 7

Insertion speed of a random key in a hash table is just about as fast as reading it. The binary search works by having a sorted list to search through. This is why it can quickly find the element, but inserting an element is not a trivial task. For the linear search, just drop the element at the end of the list. The insertion comparison speed with the same conditions as above to add a random element 1001 would be:

  • Balanced Binary search - Min: 1, Max: 1000, Avg: 500
  • Linear search - Min: 1, Max: 1, Avg: 1
  • Hash table search - Min: 6, Max: 9, Avg: 7

As a comparison, hash tables can be a lot faster for a large data set than using all those built-in linear searches that are employed in StringInStr and the AutoIt variable system. I should know, I wrote the variable storage system. Assign and Eval are nice functions to use, but I expect they are actually slower by comparison to what I am doing.

Part of my design strategy was to store all the keys and values in one array (variable). This way you could have several associative arrays running at the same time without having to worry about key collisions between different arrays. You clearly are using several global variables to store your results, which is not what I was hoping to achieve.

I'm afraid the relevance of going to University or not is lost on me, but no matter.

Yes I used global variables but the obvious ones declared were only done like that to make my example compatible with your test program.

The main idea of my approach was to have a unique global variable to represent each key. So a key "abcd" is used to create a global constant called "Assoc_abcd" which has the value of one more than the last one created. Then when you want to look up the value of the key you only need to use the value of $Assoc_abcd and you have the index. So I expected it to be faster.

I accept that global constants might well be a disadvantage, although I don't think that there needs to be any problem with key clashes.

It looks like your method needs a much bigger array to store the data, although I might have misunderstood what should be happening.

I have no doubt that you have a great deal more expertise than I have, that is not so difficult, but I would be interested to see a time comparison in searching for say 1,000 or 100,000 keys. I have tested for 60 keys and my method is about 40% faster. I would expect it to be faster for large numbers of keys like 100,000 as well, and when you got your UDF fixed I will be interested try it out.


Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.

Share this post


Link to post
Share on other sites

Hi,

Thanks for the detail; but can you please fix the example script I have posted in post #9?

I cannot see why it hangs at item 62?

Thanks, Randall

I will look into this today and report back.

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Share this post


Link to post
Share on other sites

#16 ·  Posted (edited)

I'm afraid the relevance of going to University or not is lost on me, but no matter.

Yes I used global variables but the obvious ones declared were only done like that to make my example compatible with your test program.

The main idea of my approach was to have a unique global variable to represent each key. So a key "abcd" is used to create a global constant called "Assoc_abcd" which has the value of one more than the last one created. Then when you want to look up the value of the key you only need to use the value of $Assoc_abcd and you have the index. So I expected it to be faster.

I accept that global constants might well be a disadvantage, although I don't think that there needs to be any problem with key clashes.

It looks like your method needs a much bigger array to store the data, although I might have misunderstood what should be happening.

I have no doubt that you have a great deal more expertise than I have, that is not so difficult, but I would be interested to see a time comparison in searching for say 1,000 or 100,000 keys. I have tested for 60 keys and my method is about 40% faster. I would expect it to be faster for large numbers of keys like 100,000 as well, and when you got your UDF fixed I will be interested try it out.

I mentioned university because most people's first exposure to creating hash tables is in a first-year university computer science course. It was for me as well.

I have never been a fan of creating multiple variables to store an array. That is one of the reasons I did a lot of work on arrays when they first started. I know others have done stuff on arrays too; I am not discounting their work, just giving a reason why I did as much as I did. Also, I am of the school of thought that generally wants to reduce the number of global variables if possible. So your solution is not desired on that principle as well.

I would like to see the time comparison between our methods as well. I believe the larger the number of keys (and subsequently, the array) the faster the AutoIt-written hash table will be compared to using groups of global variables.

Edited by Nutster

David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Share this post


Link to post
Share on other sites

I mentioned university because most people's first exposure to creating hash tables is in a first-year university computer science course. It was for me as well.

I have never been a fan of creating multiple variables to store an array. That is one of the reasons I did a lot of work on arrays when they first started. I know other have done stuff on arrays too; I am not discounting their work, just giving a reason I did as much as I did. Also, I am of the school that generally wants to reduce the number of global variables if possible. So your solution is not desired on that principle as well.

I would like to see the time comparison between our methods as well. I believe the larger the number of keys (and subsequently, the array) the faster the AutoIt-written hash table will be compared to using groups of global variables.

I agree that having lots of global variables, or constants as I see it in this case, is undesirable, and in this respect the hash table method is certainly superior. I did remove some global variables from my post with an update today but my method still relies on global values. In my defense I would suggest that it's not so much worse than including guiconstants.au3 and constants.au3 which introduce a huge number of global constants, most of which are probably never used. However, that's not really much of a defense.

If your method is also faster then my approach doesn't stack up and you will have converted me.


Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.

Share this post


Link to post
Share on other sites

David,

How does your implementation compare (in terms of speed) to using something like:

$dict = ObjCreate("Scripting.Dictionary")

$dict.Add( $key, $val )

$val = $dict.Item($key)

Thanks,

-John

I was just wondering something similar with regards to an ADOR.recordset.

Share this post


Link to post
Share on other sites

#20 ·  Posted (edited)

David,

How does your implementation compare (in terms of speed) to using something like:

$dict = ObjCreate("Scripting.Dictionary")

$dict.Add( $key, $val )

$val = $dict.Item($key)

Thanks,

-John

That's an interesting thing to try.

To write 10,000 new keys takes 190 ms using Scripting.Dictionary, with the method I suggested 828ms.

(But I had to make a mod to my post first because it was so much worse I was embarrassed.)

To read each and every one of the 10,000 keys take respectively 139 mS and 338 mS.

The Scripting.Dictionary wins quite easily over my suggestion.

Addendum: The more keys that are generated the more the Scripting.Dictionary method slows down, but AutoIt maintains a constant rate. At 500,000 keys AutoIt can read the keys twice as fast.

Edited by martin

Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now