Jump to content

UDF: Binary Trees


blindwig
 Share

Recommended Posts

A while ago I wrote a "tables" UDF, for doing look-up tables (unique key=value). As I've worked with them, I've wanted one that are faster, and will be sorted when displayed. I also wanted a quick way to search an array, without having to sort it everytime I add something to it.

If you don't know how Binary Trees work, don't worry about it. Just think of it this way:

instead of having a flat array where the indexes are 0-x, you can have whatever indexes you want. For example:

;Create a binary tree
$btNew = _BTreeCreate()
;assign some values
_BTreeSet($btNew, 1, 'The first item')
_BTreeSet($btNew, 2, 'The second item')
_BTreeSet($btNew, 'three', 'The third item')
_BTreeSet($btNew, 'fourth', 'The fourth item')
_BTreeSet($btNew, 'red', 'A warm color')
_BTreeSet($btNew, 'august', 'The 8th month')
;retireve a stored value
msgbox(0,'RED',_BTreeGet($btNew, 'RED'));keys are not case sensative

After putting a bunch of values into the tree, you'll want to optimize it, so it's as fast as it can be:

;Create the Binary Tree
$btTemp = _BTreeCreate()

;Populate Binary Tree with random key/values
ProgressOn('MWR_BTree UDF Test','Populating Keys...')
for $i = 1 to 50
    $j = Random(1, 1000, 1)
    _BTreeSet($btTemp, $j, '');set a random key to a blank value
    ProgressSet($i * 2, $j)
Next
ProgressOff()

;See how ugly your tree can be
MsgBox(0,'MWR_BTree UDF Test: Raw Tree',_BTreePrintKeys($btTemp))

;See how pretty your tree can be
_BTreeOptimize($btTemp)
MsgBox(0,'MWR_BTree UDF Test: Optimized Tree',_BTreePrintKeys($btTemp))

And here's what the unoptimized tree looks like:

CODE

,-------7

,-------24

| '-------25

| | ,-------54

| '-------116

| | ,-------121

| | | '-------162

| | | | ,-------174

| | | '-------180

| '-------189

,-------237

| | ,-------240

| | ,-------283

| | | | ,-------290

| | | | | '-------291

| | | | | '-------354

| | | | | '-------437

| | | '-------438

| '-------480

| | ,-------489

| '-------522

| | ,-------537

| | | '-------538

| | ,-------547

| '-------549

| | ,-------602

| | | '-------606

| | | '-------624

| '-------655

--------662

| ,-------666

| | '-------667

| ,-------673

| ,-------761

| | | ,-------778

| | '-------808

'-------839

| ,-------843

| ,-------851

| ,-------859

| | | ,-------862

| | '-------910

| | '-------915

| ,-------932

| ,-------938

| | '-------941

| | '-------948

'-------992

'-------998

And the optimized one:

CODE

,-------7

| '-------24

,-------25

| '-------54

| '-------116

,-------121

| | ,-------162

| | | '-------174

| '-------180

| '-------189

| '-------237

,-------240

| | ,-------283

| | | '-------290

| | ,-------291

| | | '-------354

| | | '-------437

| '-------438

| | ,-------480

| | | '-------489

| '-------522

| | ,-------537

| '-------538

| '-------547

--------549

| ,-------602

| | '-------606

| ,-------624

| | '-------655

| | '-------662

| ,-------666

| | | ,-------667

| | | | '-------673

| | '-------761

| | '-------778

| | '-------808

'-------839

| ,-------843

| | '-------851

| ,-------859

| | '-------862

| | '-------910

'-------915

| ,-------932

| | '-------938

'-------941

| ,-------948

'-------992

'-------998

MWR_BTree.zip

Link to comment
Share on other sites

I forgot to say - you have to have the latest beta to run this. To optimize the tree (in the _BTreeOptimize() function), I use the _ArraySort routine in the Array.au3, and it has a bug on some older versions.

Link to comment
Share on other sites

I've added a new function to my library, called _BTreeToTable.

It takes a binary tree as input, as spits out a table (2d array) where:

[x][0]=key

[x][1]=value

[x][2]=original index

And it actually walks the tree, so the keys will be sorted in ascending order.

here's the code:

CODE

;Returns a table of [x][3], where [x][0]=key, [x][1]=value, [x][2]=original index

Func _BTreeToTable(ByRef $BTree)

Dim $aOut[$BTree[0][0] + 1][3]

If $BTree[0][1] <> 0 Then __BTreeToTable($BTree, $BTree[0][1], $aOut)

Return $aOut

EndFunc

;private

Func __BTreeToTable(ByRef $BTree, $Ptr, ByRef $aKey)

If $BTree[$Ptr][1] <> 0 Then __BTreeToTable($BTree, $BTree[$Ptr][1], $aKey)

$aKey[0][0]=$aKey[0][0]+1

$aKey[$aKey[0][0]][0]=$BTree[$Ptr][0]

$aKey[$aKey[0][0]][1]=$BTree[$Ptr][3]

$aKey[$aKey[0][0]][2]=$Ptr

If $BTree[$Ptr][2] <> 0 Then __BTreeToTable($BTree, $BTree[$Ptr][2], $aKey)

EndFunc

And using this new code, I've re-written my optimizing routine. Since the keys are returned in sorted order, I no longer need to rely on _ArraySort, so the routine is now about 4x faster. Here's the new code:

CODE

;This will optimize the binary tree (make sure all the branches are balanced)

Func _BTreeOptimize(ByRef $BTree, $ProgressTitle = @ScriptName)

If $ProgressTitle <> '' Then ProgressOn($ProgressTitle, 'Optimizing Binary Tree...', 'Gathering Keys...')

Local $aKeys[$BTree[0][0]+1][3]

$aKeys = _BTreeToTable($BTree)

If $ProgressTitle <> '' Then ProgressSet(50, 'Restructuring Tree...')

$BTree[0][1]=__BTreeOpt($BTree, 1, $BTree[0][0], $aKeys)

If $ProgressTitle <> '' Then ProgressSet(99)

If $ProgressTitle <> '' Then ProgressOff()

EndFunc

;private

Func __BTreeOpt(ByRef $BTree, $min, $max, ByRef $aKeys)

If $min = $Max Then

$BTree[$aKeys[$min][2]][1] = 0

$BTree[$aKeys[$min][2]][2] = 0

Return $aKeys[$min][2]

EndIf

$mid = int(($min + $max) / 2)

If $min = $mid Then

$BTree[$aKeys[$mid][2]][1]=0

Else

$BTree[$aKeys[$mid][2]][1]=__BTReeOpt($BTree, $Min, $Mid - 1, $aKeys)

EndIf

If $mid = $max Then

$BTree[$aKeys[$mid][2]][2]=0

Else

$BTree[$aKeys[$mid][2]][2]=__BTReeOpt($BTree, $Mid + 1, $Max, $aKeys)

EndIf

Return $aKeys[$mid][2]

EndFunc

The biggest difference is that it no longer "fixes" the tree - if you've tampered with the array manually, this routine won't find orphaned node like the old one would. The solution for that is simple - don't #%$@&* with the array outside of the _BTree* functions! :)

Link to comment
Share on other sites

I've just wrote a routine to remove a key/value pair from the tree.

CODE

;Find the key $Key in Binary Tree $BTree, returns it's parent value (or 0 if $key is at root)

;Also sets @Extended to point to the array index of the parent item.

Func _BTreeFindParentKey(ByRef $BTree, $Key)

Local $BTCur = $BTree[0][1], $BTLast = 0

While $BTCur <> 0 And $BTree[$BTCur][0] <> $Key

Select

Case (IsNumber($Key) = IsNumber($BTree[$BTCur][0]) And $Key < $BTree[$BTCur][0]) Or (IsNumber($Key) <> IsNumber($BTree[$BTCur][0]) And String($Key) < String($BTree[$BTCur][0]))

$BTLast = $BTCur

$BTCur = $BTree[$BTCur][1]

Case (IsNumber($Key) = IsNumber($BTree[$BTCur][0]) And $Key > $BTree[$BTCur][0]) Or (IsNumber($Key) <> IsNumber($BTree[$BTCur][0]) And String($Key) > String($BTree[$BTCur][0]))

$BTLast = $BTCur

$BTCur = $BTree[$BTCur][2]

EndSelect

WEnd

If $BTCur = 0 Then

SetError(1) ;$Key was not found in this tree

Return 'The key "' & $Key & '" was not found in the tree.'

Else

SetExtended($BTLast)

If $BTLast = 0 Then

Return 0

Else

Return $BTree[$BTLast][0]

EndIf

EndIf

EndFunc

;This function will remove a key from the binary tree.

;Note that it does not actually remove it, it only orpahns it by re-routing the links around it.

Func _BTreeDelete(ByRef $BTree, $Key)

;Locate $key in $BTree

Local $RetVal = _BTreeGet($BTree, $Key), $ptr = @extended

If Not @error Then

;Gather a list of sub-branches under this key

Local $aBranch[$BTree[0][0]][3]

If $BTree[$ptr][1]<>0 Then __BTreeToTable($BTree, $BTree[$ptr][1], $aBranch)

If $BTree[$ptr][2]<>0 Then __BTreeToTable($BTree, $BTree[$ptr][2], $aBranch)

_BTreeFindParentKey($BTree, $Key)

;graft the new sub-branch onto the parent branch

Local $ParentPtr = @extended

If $ParentPtr = 0 Then

$BTree[0][1] = __BTreeOpt($BTree, 1, $aBranch[0][0], $aBranch)

ElseIf $BTree[$ParentPtr][1] = $ptr Then

$BTree[$ParentPtr][1] = __BTreeOpt($BTree, 1, $aBranch[0][0], $aBranch)

ElseIf $BTree[$ParentPtr][2] = $ptr Then

$BTree[$ParentPtr][2] = __BTreeOpt($BTree, 1, $aBranch[0][0], $aBranch)

EndIf

;remove branches from the original item

$BTree[$ptr][1] = 0

$BTree[$ptr][2] = 0

Else

SetError(1)

Return $RetVal

EndIf

EndFunc

Link to comment
Share on other sites

Very nice. I am amazed what can be done woth AutoIt3. I wrote a word counting program with C when I was learning it in 1983. Same concept different language.

Can you update the zip file with the new changes?

Thanks for your effort. :)

Link to comment
Share on other sites

Since you don't use _ArraySort anymore, is v3.1.1 good enough? Also, it doesn't seem that the zip file attached in the first post is updated with your new functions...

As for your algorithm, how does it handle adding new elements? Greater than, on the right, less than to the left, but what about when it's equal? Does it leave the tree unaltered? Or am I just totally confused? I'm not sure about this key and value business.

Edited by HyperChao
Link to comment
Share on other sites

Since you don't use _ArraySort anymore, is v3.1.1 good enough? Also, it doesn't seem that the zip file attached in the first post is updated with your new functions...

Correct, I no longer need the lates Beta

no, the ZIP file is still old. Get it, then patch it with the functions I have posted.

As for your algorithm, how does it handle adding new elements? Greater than, on the right, less than to the left, but what about when it's equal? Does it leave the tree unaltered? Or am I just totally confused? I'm not sure about this key and value business.

Yes, higher and lower key values spawn new branches. Similiar key values always over-write older ones.

If you just want to use the tree to store unique values, then just add keys, and don't worry about their values. The purpose of this UDF is to quickly store and retreive information (values) based on a unique key, but if the key *IS* the information, then no additional data is required. You'll notice on the _BTreeSet() function, the $Value parameter is optional.

Link to comment
Share on other sites

I'm very impressed, I don't understand this stuff quite yet... good job :)

"I thoroughly disapprove of duels. If a man should challenge me, I would take him kindly and forgivingly by the hand and lead him to a quiet place and kill him." - Mark TwainPatient: "It hurts when I do $var_"Doctor: "Don't do $var_" - Lar.
Link to comment
Share on other sites

I'm very impressed, I don't understand this stuff quite yet... good job :)

<{POST_SNAPBACK}>

It's not too complicated, my functions wrap up all the nitty-gritty stuff. It works like this:

First thing to do is create a binary tree variable:

$btMyData = _BTreeCreate()

The main purpose for these routines is to associate data (a number, string, or even an array) with a unique key (which is typically a number, but can be a string or even an array):

_BTreeSet($btMyData, 1, 'first bit of info')

Now the key '1' contains the data 'first bit of info'

To retreive the data using the key:

$vData = _BTreeGet($btMyData, 1)

And that will return 'first bit of info'

Now, key must be unique, so if you try to put the same key in with different data:

_BTreeSet($btMyData, 1, 'second bit of info')

Now the key '1' is associated with the data 'second bit of info', and the original data 'first bit of info' has been destroyed.

Now, that's if you want to have a unique set of keys and some associated data. The second way to use my functions is if you want to make a list of data, but want to make sure that you don't duplicate any data. In this case, you would use the keys as data (since they are forced to be unique) and just not associate any data with them:

$btMyData = _BTreeCreate()
_BTreeSet($btMyData, 'some data')
_BTreeSet($btMyData, 'some more data')
_BTreeSet($btMyData, 'even more data')
_BTreeSet($btMyData, 'some more data')

Now in the example above, the tree will only have 3 entries, since 'some more data' was keyed twice. So you can use my routine to store data that you want to keep it unique (for example if you want a list of files, but you want to make sure that each file is only listed once)

The function _BTreeGet() sets @error if the key was not found, so you can use that to check for the existence of keys:

$vRetVal = _BTreeGet($btMyData, 'crazy data')
if @Error then MsgBox(0, 'not found', $vRetVal)

One of the big limitations of binary trees is that they are designed to take random values (keys). They don't do well if you give them values that are sorted in order (either ascending or descnding). For example:

For $i = 1 to 50
    _BTreeSet($btMyData, $i)
Next

This code would give you a very slow tree. Not even a tree really - more like a linked list. If you've worked with those before, you know that they can be very slow.

To fix this, either make sure that you put stuff in in a random order, or run _BTreeOptimize every once in a while.

I'm thinking of a way to write an Array-to-BTree routine, but haven't done it yet (mostly because I haven't needed one yet, and I write my stuff on a when-I-need-it basis :evil: )

Anyway, I hope that gives some insight into how to work with my BTree routines. Any questions, ask away...

Link to comment
Share on other sites

So it's sort of like a hashtable? :)

Thank you very much for your time in explaining it.

"I thoroughly disapprove of duels. If a man should challenge me, I would take him kindly and forgivingly by the hand and lead him to a quiet place and kill him." - Mark TwainPatient: "It hurts when I do $var_"Doctor: "Don't do $var_" - Lar.
Link to comment
Share on other sites

So it's sort of like a hashtable? :)

Thank you very much for your time in explaining it.

<{POST_SNAPBACK}>

Yes, I designed it to work very much like a hash table. infact, if you wanted a true hash table, just write a hash routine to hash the key before submitting it:

_BTreeSet($btMyData, Hash($Key), $Data)
$RetVal = _BTreeGet($btMyData, Hash($Key), $Data)

And like I said, you can also use it to store a unique list (using the keys as values and not providing or checking the associated data values)

Link to comment
Share on other sites

  • 2 weeks later...

I've written 2 more functions for creating a Binary Tree from an existing array.

Use _BTreeFromArray1() for 1-d arrays and _BTreeFromArray2() for 2-d arrays.

These functions use _ArraySort(), so make sure you are using the latest version of that (the 3.1.1 version has a bug in it)

CODE

;Returns a Binary Tree keyed with values found in 1-d array $aIn

Func _BTreeFromArray1(ByRef $aIn, $iBase = 1)

Local $i, $InTop, $Next = 0

;Create the output structure

If $iBase Then

$iBase = 1

$InTop = $aIn[0]

Else

$iBase = 0

$InTop = UBound($aIn, 1) - 1

EndIf

Dim $btOut[$InTop + 2 - $iBase][4]

For $i = $iBase To $InTop

$btOut[0][0] = $btOut[0][0] + 1

$btOut[$btOut[0][0]][0] = $aIn[$i]

Next

_ArraySort($btOut, 0, 1, $btOut[0][0], 2, 0)

;Remove duplicates

$i = $iBase

While $i <= $btOut[0][0] - $Next

While $i + $Next + 1 <= $btOut[0][0] And $btOut[$i][0] = $btOut[$i + $Next + 1][0]

$Next = $Next + 1

WEnd

$i = $i + 1

If $Next And $i + $Next + 1 <= $btOut[0][0] Then

$btOut[$i][0] = $btOut[$i + $Next][0]

EndIf

WEnd

$btOut[0][0] = $i - 1

ReDim $btOut[$i][4]

;Build Tree Structure

$btOut[0][1] = __BTreeFromArray($btOut, 1, $btOut[0][0])

Return $btOut

EndFunc

;Returns a Binary Tree keyed with values found in 2-d array $aIn

;keys are taken from $aIn[x][$iKey] and data is taken from $aIn[x][$iData]

Func _BTreeFromArray2(ByRef $aIn, $iBase = 1, $iKey = 0, $iData = 1)

Local $i, $InTop, $Next = 0

;Create the output structure

If $iBase Then

$iBase = 1

$InTop = $aIn[0][0]

Else

$iBase = 0

$InTop = UBound($aIn, 1) - 1

EndIf

Dim $btOut[$InTop + 2 - $iBase][4]

For $i = $iBase To $InTop

$btOut[0][0] = $btOut[0][0] + 1

$btOut[$btOut[0][0]][0] = $aIn[$i][$iKey]

$btOut[$btOut[0][0]][3] = $aIn[$i][$iData]

Next

_ArraySort($btOut, 0, 1, $btOut[0][0], 4, 0)

;Remove duplicates

$i = 1

While $i <= $btOut[0][0] - $Next

While $i + $Next + 1 <= $btOut[0][0] And $btOut[$i][0] = $btOut[$i + $Next + 1][0]

$Next = $Next + 1

WEnd

$i = $i + 1

If $Next And $i + $Next + 1 <= $btOut[0][0] Then

$btOut[$i][0] = $btOut[$i + $Next][0]

$btOut[$i][3] = $btOut[$i + $Next][3]

EndIf

WEnd

$btOut[0][0] = $i - 1

ReDim $btOut[$i][4]

;Build Tree Structure

$btOut[0][1] = __BTreeFromArray($btOut, 1, $btOut[0][0])

Return $btOut

EndFunc

;private

Func __BTreeFromArray(ByRef $BTree, $min, $max)

If $min = $Max Then

$BTree[$min][1] = 0

$BTree[$min][2] = 0

Return $min

EndIf

$mid = Round(($min + $max) / 2)

If $min = $mid Then

$BTree[$mid][1]=0

Else

$BTree[$mid][1]=__BTreeFromArray($BTree, $Min, $Mid - 1)

EndIf

If $mid = $max Then

$BTree[$mid][2]=0

Else

$BTree[$mid][2]=__BTreeFromArray($BTree, $Mid + 1, $Max)

EndIf

Return $mid

EndFunc

And some examples:

#include <MWR_BTree.au3>

$aTemp = StringSplit('a,b,c,d,e,f,g,h,i,j,k,l,m',',')
$btTemp = _BtreeFromArray1($aTemp)
MsgBox(0,'From 1-d Array',_BTreePrintKeys($btTemp))

$aTemp = IniReadSection(@WindowsDir & '\System.Ini', '386enh')
$btTemp = _BtreeFromArray2($aTemp)
MsgBox(0,'From 2-d Array',_BTreePrintKeys($btTemp))

Edit: Code patch.

Edited by blindwig
Link to comment
Share on other sites

Fixed a bug with the above routines (the _BTreeFromArray() routines) and pasted the new code into the above post.

Link to comment
Share on other sites

  • 2 months later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...