Jump to content

Big array search slow, how to improve searching time?


Go to solution Solved by kylomas,

Recommended Posts

Yes, if you have access to the LAN DB, just import the other data from the remote DB by query: ODBC will get you a 2D array which you can store in a [temporary] table in the LAN DB then work on the LAN DB.

Yet another possibility, depending on the data structure and computational needs: use a scripting dictionary.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

_ArrayBinarySearch may be the fastest method to search a sorted array, but what is the fastest method to sort the array?

You're pulling the information from a DB, have the information sorted when you pull it out of the DB and put it into the array

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

michaelslamet,

F.W.I.W,

The following will search a 1D or 2D array and return a 2D array (see notes in code)

#include<array.au3>
#include<userudfs.au3>

Local $st = TimerInit()

;=================================================================================
;
; generate array1 as 5000 X 3 random elements (A-Z||0-9)
;
; ucomment the following code to generate a 2D array for testing
;
;=================================================================================

;~ Local $st = TimerInit()
;~ Local $array2[26 * 10], $str

;~ Local $array1[5000][3]
;~ For $1 = 0 To UBound($array1) - 1
;~  For $2 = 0 To ubound($array1,2) - 1
;~      $array1[$1][$2] = Chr(Random(65, 90, 1)) & Random(0, 9, 1)
;~  Next
;~ Next

;=================================================================================
;
; generate array1 as 5000 X 1 random elements (A-Z||0-9)
;
;=================================================================================

Local $array1[5000]
For $1 = 0 To UBound($array1) - 1
        $array1[$1] = Chr(Random(65, 90, 1)) & Random(0, 9, 1)
Next

;=================================================================================
;
; generate array2 as 5000 X 1 random elements (A-Y||0-9)
;
;=================================================================================

;~ Local $ele = 0, $array2[26 * 10]
;~ For $1 = 1 To 25
;~  For $2 = 0 To 9
;~      $array2[$ele] = Chr($1 + 64) & $2
;~      $ele += 1
;~  Next
;~ Next

Local $array2[5000]
For $1 = 0 To UBound($array2) - 1
        $array2[$1] = Chr(Random(65, 90, 1)) & Random(0, 9, 1)
Next

ConsoleWrite(StringFormat('> %-40s %2.4f seconds', 'Time to generate arrays ', TimerDiff($st) / 1000) & @LF)

;=================================================================================
;
; Initialize PsuedoSearch with array to be searched.  Can be 1D or 2D
;
;=================================================================================

$st = TimerInit()
_PsuedoSearchInit($array1)
If @error <> 0 Then
    ConsoleWrite('PsuedoSearch Initialization Error...Exiting...' & @LF)
    Exit
EndIf
ConsoleWrite(StringFormat('> %-40s %2.4f seconds', 'Time to initialize PsuedoSearch ', TimerDiff($st) / 1000) & @LF)

;=================================================================================
;
; Search array2 for all elements in array1
;
;=================================================================================

Local $st = TimerInit(), $str

For $1 = 0 To UBound($array2) - 1

    switch ubound($array2,0)
        case 1
            $Ret = _PsuedoSearch($array2[$1])
        case 2
            for $2 = 0 to ubound($array2,2)- 1
                $Ret = _PsuedoSearch($array2[$1][$2])
            next
    endswitch

    If IsArray($Ret) Then

        ; returns a 2D array as
        ;    [0][0] = # of times found
        ;    [0][1] = String searched for
        ;    [n][0] = row found in
        ;    [n][1] = column found in
        ;
        ; if the array to search is a 1D array then [n][1] will be blank

        $str &= $Ret[0][1]
        For $d = 1 To $Ret[0][0]
            $str &= @TAB & StringFormat('found at row [%04i] col [%04i]', $Ret[$d][0], $Ret[$d][1]) & @LF
        Next

    Else

        $str &= $array2[$1] & ' not found' & @LF

    EndIf

Next

ConsoleWrite(StringFormat('> %-40s %2.4f seconds', 'Total time for search ', TimerDiff($st) / 1000) & @LF)
ConsoleWrite($str & @LF)


; #FUNCTION# ====================================================================================================================
; Name ..........: _PsuedoSearchInit
; Description ...: Creates Global vars to be used to search against.  Vars contain row-column info.
; Syntax ........: _PsuedoSearchInit(Byref $aTargetArray)
; Parameters ....: $aTargetArray        - The array to be searched, either 1D or 2D
; Return values .:                      - @error = 1 parameter is not an array
;                                      |- @error = 2 parameter is not a 1D or 2D array
; Author ........: kylomas
; Modified ......: 5/18/2013
; Remarks .......: Created V1.1.0
; Related .......: _PsuedoSearch
; Link ..........:
; Example .......: Yes
; ===============================================================================================================================
Func _PsuedoSearchInit(ByRef $aTargetArray)

    If Not IsArray($aTargetArray) Then Return SetError(1)
    If UBound($aTargetArray, 0) > 2 Then Return SetError(2)

    Local $instance = 0

    Switch UBound($aTargetArray, 0)
        Case 1
            For $1 = 0 To UBound($aTargetArray, 1) - 1
                $instance = 1
                While 1
                    If IsDeclared("a" & $aTargetArray[$1] & '_' & $instance) Then
                        $instance += 1
                    Else
                        Assign("a" & $aTargetArray[$1] & '_' & $instance, $1, 2)
                        ExitLoop
                    EndIf
                WEnd
            Next
        Case 2
            For $1 = 0 To UBound($aTargetArray, 1) - 1
                For $2 = 0 To UBound($aTargetArray, 2) - 1
                    $instance = 1
                    While 1
                        If IsDeclared("a" & $aTargetArray[$1][$2] & '_' & $instance) Then
                            $instance += 1
                        Else
                            Assign("a" & $aTargetArray[$1][$2] & '_' & $instance, $1 & '_' & $2, 2)
                            ExitLoop
                        EndIf
                    WEnd
                Next
            Next
    EndSwitch
EndFunc   ;==>_PsuedoSearchInit

; #FUNCTION# ====================================================================================================================
; Name ..........: _PsuedoSearch
; Description ...: Search the array specified in _PsuedosearchInit()
; Syntax ........: _PsuedoSearch($str)
; Parameters ....: $str                 - The string to search for
; Return values .:                      - "0" = search argument not found
;                                      |- If the search argument is found then return a 2D array as
;                                      |        [0][0] = # of times found
;                                      |        [0][1] = String searched for
;                                      |        [n][0] = row found in
;                                      |        [n][1] = column found in
; Author ........: kylomas
; Modified ......: 5/18/2013
; Remarks .......: Created V1.1.0
; Related .......: _PsuedoSearchInit
; Link ..........:
; Example .......: Yes
; ===============================================================================================================================

Func _PsuedoSearch($str)

    Local $instance = 1, $aRet[1][2] = [[0, 0]]

    While IsDeclared("a" & $str & '_' & $instance)
        $aTmp = StringSplit(Eval("a" & $str & '_' & $instance), '_')
        $aRet[0][0] = $instance
        $aRet[0][1] = $str
        $instance += 1
        ReDim $aRet[UBound($aRet) + 1][2]
        switch ubound($aTmp)
            case 2
                $aRet[UBound($aRet) - 1][0] = $aTmp[1]
            case 3
                $aRet[UBound($aRet) - 1][0] = $aTmp[1]
                $aRet[UBound($aRet) - 1][1] = $aTmp[2]
        endswitch
    WEnd
    If $aRet[0][0] = 0 Then
        Return 0
    Else
        Return $aRet
    EndIf
EndFunc   ;==>_PsuedoSearch

Searching a 5000 element array against a 5000 element array takes appx. 5 secs.

As you increase the dimensions of the array, the run times increase, of course.  E.G.  a 5000X1 array searched against a 5000 x 3 array takes about 25 seconds.

As before, the more work you can get out of the DB engine the better off you'll be (I know it's obvious but if speed is your concern this is the best place to get it).

kylomas

edit: additional info

If you are just dealing with 1D arrays then _arraysort followed by _arraybinarysearch ran the same arrays as above in 3.4 seconds.

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

  • Solution

michaelslamet,

I don't know if this matters to you but _arraybinarysearch returns the first ocurrence of a search argument within an array whereas the code that I posted returns index numbers for all elements.  They both run in appx the same time, see code below for comparison.

#include<array.au3>
#include<userudfs.au3>

Local $st = TimerInit()

local $array1[15] = ['A4','B3','A0','Z9','Q2','A0','A2','D3','A0','A0','Z9','B3','A0','Z9','B3']

local $array2[4] = ['A0','A4','Z9','B3']

ConsoleWrite(StringFormat('> %-40s %2.4f seconds', 'Time to generate arrays ', TimerDiff($st) / 1000) & @LF)

;=================================================================================
;
; Initialize PsuedoSearch with array to be searched.  Can be 1D or 2D
;
;=================================================================================

$st = TimerInit()
_PsuedoSearchInit($array1)
If @error <> 0 Then
    ConsoleWrite('PsuedoSearch Initialization Error...Exiting...' & @LF)
    Exit
EndIf
ConsoleWrite(StringFormat('> %-40s %2.4f seconds', 'Time to initialize PsuedoSearch ', TimerDiff($st) / 1000) & @LF)

;=================================================================================
;
; Search array2 for all elements in array1 using PsuedoSearch
;
;=================================================================================

local $str = ''

$st = timerinit()

For $1 = 0 To UBound($array2) - 1

    switch ubound($array2,0)
        case 1
            $Ret = _PsuedoSearch($array2[$1])
        case 2
            for $2 = 0 to ubound($array2,2)- 1
                $Ret = _PsuedoSearch($array2[$1][$2])
            next
    endswitch

    If IsArray($Ret) Then

        ; returns a 2D array as
        ;    [0][0] = # of times found
        ;    [0][1] = String searched for
        ;    [n][0] = row found in
        ;    [n][1] = column found in
        ;
        ; if the array to search is a 1D array then [n][1] will be blank

        $str &= $Ret[0][1]
        For $d = 1 To $Ret[0][0]
            $str &= @TAB & StringFormat('found at row [%04i] col [%04i]', $Ret[$d][0], $Ret[$d][1]) & @LF
        Next

    Else

        $str &= $array2[$1] & ' not found' & @LF

    EndIf

Next

ConsoleWrite(StringFormat('> %-40s %2.4f seconds', 'Total time for search ', TimerDiff($st) / 1000) & @LF)
ConsoleWrite($str & @LF)

;=================================================================================
;
; Search array2 for all elements in array1 using _arraysort and _arraybinarysearch
;
;=================================================================================

Local $st = TimerInit(), $str

_arraysort($array1)

For $1 = 0 To UBound($array2) - 1

    $Ret = _arraybinarysearch($array1,$array2[$1])

    switch $Ret
        case -1
            ConsoleWrite($array2[$1] & ' not found' & @LF)
        case else
            ConsoleWrite($array2[$1] & ' found at ' & $ret  & @LF)
    endswitch

Next

ConsoleWrite(StringFormat('> %-40s %2.4f seconds', 'Total time for search ', TimerDiff($st) / 1000) & @LF)


; #FUNCTION# ====================================================================================================================
; Name ..........: _PsuedoSearchInit
; Description ...: Creates Global vars to be used to search against.  Vars contain row-column info.
; Syntax ........: _PsuedoSearchInit(Byref $aTargetArray)
; Parameters ....: $aTargetArray        - The array to be searched, either 1D or 2D
; Return values .:                      - @error = 1 parameter is not an array
;                                      |- @error = 2 parameter is not a 1D or 2D array
; Author ........: kylomas
; Modified ......: 5/18/2013
; Remarks .......: Created V1.1.0
; Related .......: _PsuedoSearch
; Link ..........:
; Example .......: Yes
; ===============================================================================================================================
Func _PsuedoSearchInit(ByRef $aTargetArray)

    If Not IsArray($aTargetArray) Then Return SetError(1)
    If UBound($aTargetArray, 0) > 2 Then Return SetError(2)

    Local $instance = 0

    Switch UBound($aTargetArray, 0)
        Case 1
            For $1 = 0 To UBound($aTargetArray, 1) - 1
                $instance = 1
                While 1
                    If IsDeclared("a" & $aTargetArray[$1] & '_' & $instance) Then
                        $instance += 1
                    Else
                        Assign("a" & $aTargetArray[$1] & '_' & $instance, $1, 2)
                        ExitLoop
                    EndIf
                WEnd
            Next
        Case 2
            For $1 = 0 To UBound($aTargetArray, 1) - 1
                For $2 = 0 To UBound($aTargetArray, 2) - 1
                    $instance = 1
                    While 1
                        If IsDeclared("a" & $aTargetArray[$1][$2] & '_' & $instance) Then
                            $instance += 1
                        Else
                            Assign("a" & $aTargetArray[$1][$2] & '_' & $instance, $1 & '_' & $2, 2)
                            ExitLoop
                        EndIf
                    WEnd
                Next
            Next
    EndSwitch
EndFunc   ;==>_PsuedoSearchInit

; #FUNCTION# ====================================================================================================================
; Name ..........: _PsuedoSearch
; Description ...: Search the array specified in _PsuedosearchInit()
; Syntax ........: _PsuedoSearch($str)
; Parameters ....: $str                 - The string to search for
; Return values .:                      - "0" = search argument not found
;                                      |- If the search argument is found then return a 2D array as
;                                      |        [0][0] = # of times found
;                                      |        [0][1] = String searched for
;                                      |        [n][0] = row found in
;                                      |        [n][1] = column found in
; Author ........: kylomas
; Modified ......: 5/18/2013
; Remarks .......: Created V1.1.0
; Related .......: _PsuedoSearchInit
; Link ..........:
; Example .......: Yes
; ===============================================================================================================================

Func _PsuedoSearch($str)

    Local $instance = 1, $aRet[1][2] = [[0, 0]]

    While IsDeclared("a" & $str & '_' & $instance)
        $aTmp = StringSplit(Eval("a" & $str & '_' & $instance), '_')
        $aRet[0][0] = $instance
        $aRet[0][1] = $str
        $instance += 1
        ReDim $aRet[UBound($aRet) + 1][2]
        switch ubound($aTmp)
            case 2
                $aRet[UBound($aRet) - 1][0] = $aTmp[1]
            case 3
                $aRet[UBound($aRet) - 1][0] = $aTmp[1]
                $aRet[UBound($aRet) - 1][1] = $aTmp[2]
        endswitch
    WEnd
    If $aRet[0][0] = 0 Then
        Return 0
    Else
        Return $aRet
    EndIf
EndFunc   ;==>_PsuedoSearch

kylomas

edit: as the searched array gets larger the comparison widens.  At 50000 randomly generated entries the binary search runs in 3.8 seconds and mine runs in 21 seconds.  If you only need the first ocurrence of a match then this is the way to go.

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

@Kylomas and BrewManNH: thanks a lot for the example code!

I modified my code to pull the data from MySQL in a sorted form, then I search it using _ArrayBinarySearch.

Guess what?

Amazing to see the search time change from 136 secs to 1 secs :P

Thanks a lot for helping!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...