Jump to content

Is it possible to make an insert function for an array in O(log n)?


Mithrandir
 Share

Recommended Posts

Very good post. I had a look at the source code, what the AutoIt source is doing when it encounters a ReDim. Internally AutoIt uses a one dimensional array of pointers to vector types. Even when multidimensional arrays are used in AutoIt.

It creates a new array using the new size with all empty variant types in a similiar fashion to what Dim does. This means that an array is created and for each element in that array the constructor is called. The constructor is empty, so only variables are given default values.

Then it uses ArrayCopy to copy the old array into the new array. For each element in the array it gets a pointer to the element from the old array and puts it in the new array. Because of multi-dimensional arrays defined in AutoIt, and the one dimensional array that is used internally, after every copy element it needs to check if it now needs to copy elements from another dimension.

So, hardly a straight forward ReDim as we seem to think it works.

Disclaimer: All information in this post is taken from the AutoIt source code and falls under the AutoIt source code license. If you use this information for writing new code then it is considered a derivative work based on the AutoIt source code which is not allowed.

Thanks for pointing me to get the autoit source :) I should have thought the one shared was from a previous version. Concerning your post, are you referring to this part of script_parser.cpp aren't you?:

///////////////////////////////////////////////////////////////////////////////

// Parser_Keyword_DIM()

//

// Actually, this routine is used for Dim, Local and Global which are effectively

// the same function just with a difference of scope.

//

// These keywords can be used to pre-declare a variable OR to dimension an array.

//

// JON: REDIM needs splitting out of this function into its own - too crowded

//

///////////////////////////////////////////////////////////////////////////////

void AutoIt_Script::Parser_Keyword_DIM(VectorToken &vLineToks, uint &ivPos, int nReqScope)

{

Variant vTemp;

Variant vOld; // stores the array during resizing

Variant *pvTemp;

bool bConst = false;

int nNumSubscripts;

int nSubScriptDetails[VAR_SUBSCRIPT_MAX];

int nColTemp;

bool bReDim = ((nReqScope & VARTABLE_RESIZE) != 0);

nReqScope &= VARTABLE_ANY | VARTABLE_FORCELOCAL | VARTABLE_FORCEGLOBAL; // leave scope values only

++ivPos; // Skip DIM/LOCAL/GLOBAL/REDIM keyword

// If the next keyword is Const then jump to our CONST routine

if ( vLineToks[ivPos].m_nType == TOK_KEYWORD && vLineToks[ivPos].nValue == K_CONST )

{

Parser_Keyword_CONST(vLineToks, ivPos, nReqScope);

return;

}

for (;;)

{

// Get variable name

if (vLineToks[ivPos].m_nType != TOK_VARIABLE)

{

FatalError(IDS_AUT_E_DIMWITHOUTVAR, vLineToks[ivPos-1].m_nCol);

return;

}

// Get a reference to the variable in the requested scope, if it doesn't exist, then create it.

AString sVarName = vLineToks[ivPos].szValue;

g_oVarTable.GetRef(sVarName, &pvTemp, bConst, nReqScope);

if (pvTemp == NULL)

{

if (bReDim)

{

// The array to redim must already exist

FatalError(IDS_AUT_E_VARNOTFOUND, vLineToks[ivPos].m_nCol);

return;

}

vTemp = ""; // Let the uninitialised value be "" (equates to 0.0 for numbers)

g_oVarTable.Assign(sVarName, vTemp, false, nReqScope);

g_oVarTable.GetRef(sVarName, &pvTemp, bConst, nReqScope);

}

else if (bConst)

{

// Can't Dim a constant!

FatalError(IDS_AUT_E_ASSIGNTOCONST, vLineToks[ivPos].m_nCol);

return;

}

// If redim then the variable must already be an array - otherwise what is the point?

if (bReDim)

{

if ( !(pvTemp->isArray()) )

{

FatalError(IDS_AUT_E_REDIMUSEDONNONARRAY, vLineToks[ivPos].m_nCol);

return;

}

}

++ivPos; // Skip variable name

if (vLineToks[ivPos].m_nType == TOK_LEFTSUBSCRIPT)

{

// Read the subscripts

// Store the old value while the it is resized.

if (bReDim)

vOld = *pvTemp;

nNumSubscripts = 0;

nColTemp = vLineToks[ivPos].m_nCol; // Store for error output

while (vLineToks[ivPos].m_nType == TOK_LEFTSUBSCRIPT)

{

++ivPos; // Skip [

nColTemp = vLineToks[ivPos].m_nCol; // Store for error output

// Parse expression for subscript

if ( AUT_FAILED( Parser_EvaluateExpression(vLineToks, ivPos, vTemp) ) )

{

FatalError(IDS_AUT_E_PARSESUBSCRIPT, nColTemp);

return;

}

// Subscript cannot be less than or equal to 0

if ( vTemp.nValue() <= 0 )

{

FatalError(IDS_AUT_E_PARSESUBSCRIPT, nColTemp);

return;

}

// Next token must be ]

if (vLineToks[ivPos].m_nType != TOK_RIGHTSUBSCRIPT)

{

FatalError(IDS_AUT_E_PARSESUBSCRIPT, nColTemp);

return;

}

++ivPos; // Next token

// Add this subscript and increase the number of subscripts

nSubScriptDetails[nNumSubscripts++] = vTemp.nValue();

} // End While

// Must have at least one subscript for a valid array

if (nNumSubscripts < 1)

{

FatalError(IDS_AUT_E_TOOFEWSUBSCRIPTS, nColTemp);

return;

}

// Now run through and set the various subscripts (must be done after the above to fix

// cases like ReDim $a[$a[0]] )

pvTemp->ArraySubscriptClear(); // Reset the subscript

for (int i=0; i<nNumSubscripts; ++i)

{

if ( pvTemp->ArraySubscriptSetNext( nSubScriptDetails ) == false )

{

FatalError(IDS_AUT_E_TOOMANYSUBSCRIPTS, nColTemp);

return;

}

}

// Ok, valid subscripts, dimension the variant into an array

if ( pvTemp->ArrayDim() == false )

{

FatalError(IDS_AUT_E_ARRAYALLOC, nColTemp);

return;

}

// copy old array values into new array

if (bReDim && vOld.ArrayGetBound(0)>0)

pvTemp->ArrayCopy(vOld);

} // TOK_LEFTSUBSCRIPT

else if (vLineToks[ivPos].m_nType == TOK_EQUAL)

{

++ivPos; // skip = to evaluate and store the result

nColTemp = vLineToks[ivPos].m_nCol; // Store for error output

// Parse expression for value

if ( AUT_FAILED( Parser_EvaluateExpression(vLineToks, ivPos, vTemp) ) )

{

FatalError(IDS_AUT_E_Expression, nColTemp);

return;

}

// initialize the variable

(*pvTemp)=vTemp;

} // TOK_EQUAL

// If the next token is END, then we finish

if (vLineToks[ivPos].m_nType == TOK_END)

break;

else if (vLineToks[ivPos].m_nType != TOK_COMMA)

{

FatalError(IDS_AUT_E_DIMWITHOUTVAR, vLineToks[ivPos].m_nCol);

return;

}

++ivPos; // skip COMMA to check next variable to declare

} // while looping on variable declaration

} // Parser_Keyword_DIM()

(Disclaimer: All information in the previous and following spoiler is taken from the AutoIt source code and falls under the AutoIt source code license. If you use this information for writing new code then it is considered a derivative work based on the AutoIt source code which is not allowed.)

So, in the end is it expensive or not to use redim in autoit? , It is said to be expensive in other languages like Visual Basic:

http://www.xtremevbtalk.com/showthread.php?t=20304

http://forums.devx.com/archive/index.php/t-66485.html

http://objectmix.com/basic-visual/193119-redim-computationally-expensive.html

http://www-10.lotus.com/ldd/nd6forum.nsf/55c38d716d632d9b8525689b005ba1c0/7a4628849304781e862570070058fbec?OpenDocument

And from reading that code (it's C isn't it?) I see that there is a for bucle that go through the array subscripts:

// Now run through and set the various subscripts (must be done after the above to fix

// cases like ReDim $a[$a[0]] )

pvTemp->ArraySubscriptClear(); // Reset the subscript

for (int i=0; i<nNumSubscripts; ++i)

{

if ( pvTemp->ArraySubscriptSetNext( nSubScriptDetails ) == false )

{

FatalError(IDS_AUT_E_TOOMANYSUBSCRIPTS, nColTemp);

return;

}

}

And that to the best of my knowledge is O(n)...But Valik, if I interpreted him right thinks it is not expensive:

This makes me wonder why ReDim is expensive to begin with.

Or maybe he meant relatively expensive, in opposition to other operations which are O(n2) or O(n3) so I will start using instead of 'expensive', the order in execution time.

It seems opinions are mixed about ReDim execution time:

You're still using Redim though and has already been pointed out, that is one of the slowest of all functions and can slow the script down considerably.

So, I did some quick tests -nothing neat, just to have an idea since I have to put some of my projects on hold due to exams but will return and look into this more deeply- to see how much time it takes to perform some of the operations discussed here and others as well:

Global $stringtorepeeat = _StringRepeat('a|', 127)
Global $string = ""
Global $count2 = 0
Global $stringwithpositions2 = ""
Global $positionoflastdelim = 0
$timer = TimerInit()
For $i = 0 To 634999;6349999;50000;750000
    $string &= 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa|';$stringtorepeeat
    ;$positionoflastdelim += StringLen('a|')
    ;$stringwithpositions2 &= $positionoflastdelim & '&'
    $count2 += 1
Next
$diff = TimerDiff($timer)
MsgBox(0, 'time it takes to concatenate all the elements in a string in miliseconds', $diff)
MsgBox(0, "count of elements", $count2)
;FileWrite(@ScriptDir&'/stringwithpositions2.txt',$stringwithpositions2)
$timer = TimerInit()
$result2 = StringInStr($string, '|', 2, $count2 / 2)
$diff = TimerDiff($timer)
MsgBox(0, 'time it takes to search for the element in the middle in miliseconds', $diff)
$timer = TimerInit()
$result2 = StringInStr($string, '|', 2, $count2 / 4)
$diff = TimerDiff($timer)
MsgBox(0, 'time it takes to search for the element in 1/4 in miliseconds', $diff)
$timer = TimerInit()
$result2 = StringInStr($string, '|', 2, $count2 / 8)
$diff = TimerDiff($timer)
MsgBox(0, 'time it takes to search for the element in 1/8 of the string in miliseconds', $diff)
$timer = TimerInit()
$result2 = StringInStr($string, '|', 2, $count2 / 16)
$diff = TimerDiff($timer)
MsgBox(0, 'time it takes to search for the element in 1/16 of the string in miliseconds', $diff)
$timer = TimerInit()
$string &= 'data1501'
$diff = TimerDiff($timer)
MsgBox(0, 'time it takes to concatenate the element to a string in miliseconds', $diff)
$timer = TimerInit()
$result = StringInStr($string, "|c|")
$diff = TimerDiff($timer)
MsgBox(0, 'time it takes to search for a string in the worst case (i.e. the string is not present) using stringinstr in miliseconds', $diff)
Global $array[635000];[50000];[750000]
For $i = 0 To UBound($array) - 1
    $array[$i] = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa';$stringtorepeeat
Next
$timer = TimerInit()
_ArraySearch($array, 'c')
$diff = TimerDiff($timer)
MsgBox(0, 'time it takes to search for an element in the worst case (i.e. the element is not present) using _arraysearch in miliseconds', $diff)
$timer = TimerInit()
_ArrayAdd($array, 'data')
$diff = TimerDiff($timer)
MsgBox(0, 'time it takes to add the element to the array in miliseconds', $diff)
$timer = TimerInit()
$arrayofstring = StringSplit($string, '|')
$diff = TimerDiff($timer)
MsgBox(0, 'time it takes to split the string into an array in miliseconds', $diff)
$timer = TimerInit()
$stringreplaced = StringRegExpReplace($string, 'a', 'b', 1)
$diff = TimerDiff($timer)
MsgBox(0, 'time it takes to replace elements using regexp in miliseconds', $diff)
$timer = TimerInit()
$stringreplaced = StringReplace($string, 'a', 'b', 1)
$diff = TimerDiff($timer)
MsgBox(0, 'time it takes to replace elements using stringreplace in miliseconds', $diff)
$newsize = UBound($array)
MsgBox(0, 'New size of the array to be created', $newsize)
$timer = TimerInit()
ReDim $array[$newsize]
$diff = TimerDiff($timer)
MsgBox(0, 'time it takes to resize an array to hold one more value in miliseconds', $diff)

And performing a ReDim in an array of 650000 elements to hold one more element took 6000 miliseconds which is O(n) for me considering that concatenating the value to a string takes like 0.01 miliseconds.

On the other hand, searching a value in the worst case (ie the value isn't there) using stringinstr took 5879 miliseconds but using _ArraySearch it took 1777 miliseconds. Clearly the array wins here and more if using binary search. Concerning creating a 'StringBinarySearch' the problem is that it takes time to parse all the delimiters (in order to know how many elements there are and know where's the middle to start) so there would be no gain.

So, as a partial conclusion, I would use a string with delimiters for stacks or queues and if I need to store values in an array which I'm going to check frequently, use binary search and sort the array after and only if a new value has been added, like this (Thanks to Melba23 for his ploy!):

$Delim = Chr(29)
$IsSorted = False
While $StringQueue <> ""
        
      $element = _GetFirstElementFromQueue($StringQueue);A function that gets the first element of the queue and deletes it from the string    $StringQueue   
      
      $array2 = _somefunction($element)      
      
      If IsArray($array2) Then         
         For $i = 0 To Ubound($array2) - 1
             If Not $IsSorted Then 
                _ArraySort($arrayofresults)
                $IsSorted = True
             EndIf
             _ArrayBinarySearch($arrayofresults,$array2[$i],1); search whether the element is already in $arrayofresults
             If @error = 3 Then ; if it isn't in $arrayofresults then add it to both $arrayofresults and $StringQueue
                _RFLTA_AddToList($arrayofresults,$array2[$i]);Thanks Melba23 :)
                $StringQueue &= $Delim&$array2[$i]&$Delim
                $IsSorted = False;If an element is added then $arrayofresults will have to be sorted
             Else
                 $IsSorted = True 
             EndIf
         Next
      EndIf
WEnd

Also, if one knows $array2 won't have duplicate values (or if so, using _ArrayUnique) one can add the values that are not in $arrayofresults to a string and then add them all outside the For loop and in case the string isn't blank set $IsSorted to True:

$Delim = Chr(29)
$IsSorted = False
While $StringQueue <> ""
        
      $element = _GetFirstElementFromQueue($StringQueue);A function that gets the first element of the queue and deletes it from the string    ;$StringQueue   
      
      $array2 = _somefunction($element);at this point I could use the size of $array2 to ReDim $arrayofresults to accomodate the likely new ;values      
      
      If IsArray($array2) Then         
         If Not $IsSorted Then 
            _ArraySort($arrayofresults)
            $IsSorted = True
         EndIf 
         $StringToAddToQueue = ""
         For $i = 0 To Ubound($array2) - 1             
             _ArrayBinarySearch($arrayofresults,$array2[$i],1); search whether the element is already in $arrayofresults
             If @error = 3 Then ; if it isn't in $arrayofresults then add it to both $arrayofresults and $StringQueue
                $StringToAddToQueue &= $delim&$array2[i]$delim                
             EndIf
         Next
         If $StringToAddToQueue <> "" Then             
             For $elemtoadd In StringRegExp($StringToAddToQueue,$delim&'[^'&$delim&']*?'&$delim,3)
                 $elemtoadd = StringTrimRight($elemtoadd,1)
                 $elemtoadd = StringTrimLeft($elemtoadd,1)
                 _RFLTA_AddToList($arrayofresults,$elemtoadd);Thanks Melba23 :)
             Next
             $StringQueue &= $StringToAddToQueue
             $IsSorted = False  
         Else 
             $IsSorted = True      
         EndIf         
      EndIf
WEnd

Thus avoiding the _ArraySort O(n) operation inside the For loop and having a While Loop of O(n) (The execution time of last code would be 3n + log(n) in the worst case and 2n + log(n) in case it doesn't have to do a ReDim or it doesn't have to sort $arrayofresults)

Edited by Mithrandir
Link to comment
Share on other sites

  • Replies 42
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

And that to the best of my knowledge is O(n)...But Valik, if I interpreted him right thinks it is not expensive:

No, I just couldn't remember how it was implemented off the top of my head, I couldn't be arsed to look and I can think of ways that it can be less expensive to (almost) free.
Link to comment
Share on other sites

No, I just couldn't remember how it was implemented off the top of my head, I couldn't be arsed to look and I can think of ways that it can be less expensive to (almost) free.

Ok :) Well I'm not so experienced to guess what you have in mind but I learned that -usually- improving execution time/memory use are oposed and the ability is to improve one without increasing the other too much.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...