Jump to content

Big array search slow, how to improve searching time?


Go to solution Solved by kylomas,

Recommended Posts

I have array1 that contain about 5000 records and growing.

I have another array (array2) that contain also about 5000 records and keep growing too.

For each record on array2, I search it againts array1. If found that I'm doing some operation on it.

This is the code. Is there any posibilities to improve the searching speed? Sort the array and use _ArrayBinarySearch maybe?

Thank you in advance!

For $n = 0 to Ubound($array1)-1
    $array1_split = StringSplit($array1[$n], "|")
    $what_to_search = $array1_split[1]
    $search_result_index = _ArraySearch($array2, $what_to_search, 0, 0, 0, 1)                ; for each array1, search on array2, not case sensitive
    If Not @error Then                                                                        ; if found then
        ; doing something here
    EndIf
Next


 

Link to comment
Share on other sites

Are the arrays sorted?

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

Would it be a problem to sort them? I think searching would be much faster then.

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

@Water: no problem to sort them. Is there anybody here ever compare non-sorted array + _ArraySearch versus sorted-array + _ArrayBinarySearch?

@UEZ: actually I read those Array1 and Array2 data from different MySQL database (one on LAN and another on internet) and then put them into array :P Doing wrong? :P

Link to comment
Share on other sites

If you compare array 1 against a non sorted array 2 then you need O(n^2) whereas searching within a sorted array using _ArrayBinarySearch() should to in O(logn). Sorting an array using Quicksort takes also (n*logn).

More info at Wikipedia (section Orders of common functions)

That means sorting and comparing should give you a benefit.

Br,

UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

michaelslamet,

Could you get the DB to do the work for you?  Something like:

1 - create the array of search arguments

2 - select * from "data to search" where "some column" =/like "search argument"

kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

Even an Excel workbook can be accessed like a database.

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

michaelslamet,

Can you post an example of what $array1 and $array2 look like?

kylomas

edit:  You can adapt a technique similar to the following.  This searches a 5000 element array using the first "|" delimited string from another 5000 element array as the search argument.  The search time is appx. .13 seconds (all 5000 search elements against the searched array).

Although, It makes more sense to me to get the DB itself to do the work.

#include<array.au3>

; generate array1 5000 elements with each element containing 1 to 20 "|" delimited strings of 3 to 10 uppercase characters

local $array1[5000], $array2[5000], $str

for $1 = 0 to ubound($array1) - 1
    for $2 = 1 to random(1,20,1)
        for $3 = 1 to random(3,10,1)
            $str &= chr(random(65,90,1))
        Next
        $str &= '|'
    Next
    $array1[$1] = $str
    $str = ''
Next

; generate array2 with 5000 elements with each element containing 3 to 10 uppercase chars

$str = ''
for $1 = 0 to ubound($array2) - 1
    for $2 = 1 to random(3,10,1)
        $str &= chr(random(65,90,1))
    Next
    $array2[$1] = $str
    $str = ''
Next

local $st = timerinit()

; create a varname = to the value of each element in array2 prepended with "a"

for $1 = 0 to ubound($array2) - 1
    assign("a" & $array2[$1],1)
next

; test if the first "|" delimited value of array 1 exists (is declared) and write output

for $1 = 0 to ubound($array1) - 1

    $aTmp = stringsplit($array1[$1],'|')

        if isdeclared(eval("a" & $aTMP[1])) then

            ; you would do your thing here...

            ConsoleWrite(stringformat('Hit at row = %04i value = %-20s',$1,$aTMP[1]) & @LF)
        EndIf

next

ConsoleWrite(stringformat('Time to search array1 against array2 = %2.4f seconds',timerdiff($st)/1000) & @LF)

Good Luck,

kylomas

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

michaelslamet,

This version of what I posted above is a little more "user friendly".  It is also quicker (does all 5000 searches in appx. .06 seconds)

#include<array.au3>
#include<userudfs.au3>

; generate array1 5000 elements with each element containing 1 to 20 "|" delimited strings of 5 uppercase characters

local $st = timerinit()
local $array1[5000], $array2[5000], $str

for $1 = 0 to ubound($array1) - 1
    for $2 = 1 to random(1,20,1)
        for $3 = 1 to 5
            $str &= chr(random(65,90,1))
        Next
        $str &= '|'
    Next
    $array1[$1] = $str
    $str = ''
Next

; generate array2 with 5000 elements with each element containing 5 uppercase chars

$str = ''
for $1 = 0 to ubound($array2) - 1
    for $2 = 1 to 5
        $str &= chr(random(65,90,1))
    Next
    $array2[$1] = $str
    $str = ''
Next

ConsoleWrite(stringformat('> %-40s %2.4f seconds','Time to generate arrays ' ,timerdiff($st)/1000) & @LF)

; initialize PsuedoSearch with array to search

$st = timerinit()
_PsuedoSearchInit($array2)
ConsoleWrite(stringformat('> %-40s %2.4f seconds','Time to initialize PsuedoSearch ' ,timerdiff($st)/1000) & @LF)

; find all occurences of the first '|' delimited string from the source array

local $sSearchArg = '', $st = timerinit()

for $1 = 0 to ubound($array1) - 1

    $sSearchArg = stringleft($array1[$1],stringinstr($array1[$1],'|')-1)

    if _PsuedoSearch($sSearchArg) then

        ; you would do your thing here...

        ConsoleWrite(stringformat('Hit at row = %04i value = %-20s',$1,$sSearchArg) & @LF)

    EndIf

next

ConsoleWrite(stringformat('> %-40s %2.4f seconds','Total time for search ',timerdiff($st)/1000) & @LF)

func _PsuedoSearchInit(byref $aTargetArray)
    for $1 = 0 to ubound($aTargetArray) - 1
        assign("a" & $aTargetArray[$1],1,2)
    next
endfunc

func _PsuedoSearch($str)
    if isdeclared("a" & $str) then
        return true
    Else
        return false
    endif
endfunc

kylomas

edit: corrected isdeclared statement

Edited by kylomas

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

Another possible route: read the table (or query resultset) from the remote DB into a local array, import it to the local DB in a separate table then work on the lan DB with SQL.

I don't believe you can attach both tables to avoid extracting one of the table.

Might be faster than copying both tables into arrays then work painfully from there, specially if your tables are expected to grow: SQL will scale gracefully and offer all power you may need to process your data.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

@kylomas: thanks a lot for the code example. Really appreciated it :)

I have tried that, but on the "doing something here" part is i need to StringSplit a string which is the index I got from _ArraySearch

This is my code:

For $n = 0 to Ubound($array1)-1
    $array1_split = StringSplit($array1[$n], "|")
    $what_to_search = $array1_split[1]
    $search_result_index = _ArraySearch($array2, $what_to_search, 0, 0, 0, 1)                ; for each array1, search on array2, not case sensitive
    If Not @error Then                                                                       ; if found then
        $array1_split = StringSplit($Array1[$n], "|")
        $array2_split = StringSplit($Array2[$search_result_index], "|")
        ; compare the value of $array1_split and $array2_split and doing something here
    EndIf
Next

Your example code doesn't returning $search_result_index so I cant do the compare.

I implemented your example code into my code and it works so fast! (136 seconds versus 0.2 seconds), but I think the one that make it slow is _ArraySearch,

so without your code get the $search_result_index, I cant really compare them.

Thanks again!

Edited by michaelslamet
Link to comment
Share on other sites

@jchd: thanks for offering another solution! :)

This script should be run on a machine that dont have any database service running, so I dont think it's possible for now.

Anyway, does SQLite need to be installed and run as a service?

Or we can just put the file(s) in to the same dir with our script and execute the SQLite exe file everytime our script need it?

Link to comment
Share on other sites

If you don't even have an ODBC (ADO rather) driver on your machine then how do you query/extract remote data and LAN data from both DBs?

I must miss something.

No, SQLite doesn't need any install/setup/maintainance, just a non-registered DLL which you can very well FileInstall within your script if you need a self-contained solution. If your input data is in fact a couple of CSV-style files (depending on which "standard" they use) then SQLite.exe can import them into a permanent or temporary local SQLite DB and perform queries on it.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

ArrayBinarySearch is probably the fastest method you'll find to search sorted arrays. You can cut the time by 100 times over a non-binary search depending on the data.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

I have ODBC installed on the machine :) But I dont have MySQL installed on the machine (well, i have it on development machine, but not in the production machine),

so we cant store the data on the local MySQL database

Or you mean store it on the LAN MySQL database? Oh gosh, I miss your point :sweating:

SQLite really seems great, I should take a time to learn it. Thanks jchd! :)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...