Jump to content

very slow search on long array


Recommended Posts

11 minutes ago, Jos said:

Maybe you downloaded the x64 version?

No, I extracted it from here:
https://sqlite.org/download.html

from the zip file:

sqlite-tools-win32-x86-3380200

it seems it doesn't exist the sqlite3.exe  64bit

 

BTW

I downloaded also the 2 dll both 32 and 64bit

but it works only (_SQLite_Startup) pointing the 32bit one...

 

Local Const $SQLITE_DLL = "C:\AutoitProgs\Sqlite\32bit\sqlite3.dll" ;<-- Change to the location of your sqlite dll
_SQLite_Startup($SQLITE_DLL, true, 1)

..... _SQLite_LibVersion=3.38.2

 

Edited by frank10
Link to comment
Share on other sites

I'm clueless for now. Try setting $Debug = 1 when calling the .EXE and let's see what error code it gives.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

49 minutes ago, jchd said:

I'm clueless for now. Try setting $Debug = 1 when calling the .EXE and let's see what error code it gives.

with this:

#include <File.au3>
    #include <Array.au3>
    #include <SQLite.au3>

    Global $sFileDB  =   "origDB.txt"
    Global $sFileNew =   "newLines.txt"



    Local Const $SQLITE_DLL = "c:\Temp\autoitProgs\sqlite3.dll" ;<-- Change to the location of your sqlite dll
    _SQLite_Startup($SQLITE_DLL, false, 0,Default)
    ConsoleWrite(@error & " _SQLite_LibVersion=" & _SQLite_LibVersion() & @CRLF)

    $tLoop = TimerInit()
    
    Local $files = _FileListToArray ( @scriptdir , "sqlite*", 1)
    Consolewrite(_ArrayToString($files, @crlf) & @CRLF)

    Local $s = "Part1,Part2" & @CRLF & FileRead( $sFileDB )
    $s = StringRegExpReplace($s, "(?m)((?:[^|]*\|){6})(.*\R)", "$1,$2")
    Local $h = FileOpen( "origdb1.txt", $FO_OVERWRITE)
    FileWrite($h, $s)
    FileClose($h)

    $s = "Part1,Part2" & @CRLF & FileRead( $sFileNew )
    $s = StringRegExpReplace($s, "(?m)((?:[^|]*\|){6})(.*\R)", "$1,$2")
    $h = FileOpen( "newlines1.txt", $FO_OVERWRITE)
    FileWrite($h, $s)
    FileClose($h)


    local $ret = _SQLite_SQLiteExe(  "Merging.sq3", _
        "CREATE TABLE Origin (Part1 TEXT PRIMARY KEY, Part2 TEXT) WITHOUT ROWID;" & @CRLF & _
        "CREATE TABLE Extras (Part1 TEXT PRIMARY KEY, Part2 TEXT,) WITHOUT ROWID;" & @CRLF & _
        ".import " &  "origdb1.txt Origin --csv" & @CRLF & _
        ".import " &  "newlines1.txt Extras --csv" & @CRLF & _
        "insert or replace into origin select * from extras;" & @CRLF & _
        ".output " &  "sqlResult.txt" & @CRLF & _
        "select part1 || part2 as "" from Origin;" & @CRLF & _
        ".quit", $s, @scriptdir , true)


   Consolewrite( @scriptdir & @crlf & " err:" & @error & "__" & $ret & @CRLF)

it says:

>Running:(3.3.15.3):C:\Program Files (x86)\AutoIt3\Beta\autoit3.exe "C:\Temp\autoitProgs\testSQL_exe.au3"    
+>Setting Hotkeys...--> Press Ctrl+Alt+Break to Restart or Ctrl+BREAK to Stop.
0 _SQLite_LibVersion=3.38.2
3
sqlite3.def
sqlite3.dll
sqlite3.exe
C:\Temp\autoitProgs
 err:2__21

No other Debug printed...

it creates the merging.sq3 file with dimension 0 KB...

EDIT
$ret = 21 should be:

$SQLITE_MISUSE
/* Library used incorrectly */
??

 

Edited by frank10
Link to comment
Share on other sites

Geez, I don't have a special setup, just run code from Scite and it works like a charm (albeit slower than latest contributions) with 83k lines resulting. I bet it would beat everything else in AutoIt when file sizes increase significantly but this isn't the issue here.

Since you've been provided better alternatives, I suggest to let this dog die. I'm too busy elsewhere to spend more time on this problem.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

  • Developers

I had the same error and think the issue is that when providing a path for sqlite3.exe you need to include the program name as well:
e.g.:

Local $ret = _SQLite_SQLiteExe("Merging.sq3", _
        ".quit", $sout, @ScriptDir & "\sqlite3.exe", True)

 

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Link to comment
Share on other sites

  • Developers

Ok... there were several errors in the syntax: A comma after TEXT:

"CREATE TABLE Extras (Part1 TEXT PRIMARY KEY, Part2 TEXT,) WITHOUT ROWID;" & @CRLF & _

but the most important one was this one:

"select part1 || part2 as "" from Origin;" & @CRLF & _

The "" translates to a single double-quote.

This version seems to work fine:

Local $ret = _SQLite_SQLiteExe("Merging.sq3", _
        "DROP TABLE IF EXISTS Origin ;" & @CRLF & _
        "DROP TABLE IF EXISTS Origin ;" & @CRLF & _
        "CREATE TABLE Origin (Part1 TEXT PRIMARY KEY, Part2 TEXT) WITHOUT ROWID;" & @CRLF & _
        "CREATE TABLE Extras (Part1 TEXT PRIMARY KEY, Part2 TEXT) WITHOUT ROWID;" & @CRLF & _
        ".import origdb1.txt Origin --csv" & @CRLF & _
        ".import newlines1.txt Extras --csv" & @CRLF & _
        "insert or replace into origin select * from extras;" & @CRLF & _
        ".output sqlResult.txt" & @CRLF & _
        "select part1 || part2 as '' from Origin;" & @CRLF & _
        ".quit", $s, @ScriptDir & "\sqlite3.exe", True)

Jos :) 

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Link to comment
Share on other sites

Oh, I added this "" after running because the column header appeared and I wanted to make it vanish without realizing that the double quote was a gross error. Sorry for that. Make that select part1 || part2 as [] from Origin;

About the extra comma, I didn't notice it and fortunately SQLite is smart enough to not bite my ass about it.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

4 hours ago, Jos said:

Ok... there were several errors in the syntax: A comma after TEXT:











"CREATE TABLE Extras (Part1 TEXT PRIMARY KEY, Part2 TEXT,) WITHOUT ROWID;" & @CRLF & _

but the most important one was this one:











"select part1 || part2 as "" from Origin;" & @CRLF & _

The "" translates to a single double-quote.

This version seems to work fine:

Local $ret = _SQLite_SQLiteExe("Merging.sq3", _
        "DROP TABLE IF EXISTS Origin ;" & @CRLF & _
        "DROP TABLE IF EXISTS Origin ;" & @CRLF & _
        "CREATE TABLE Origin (Part1 TEXT PRIMARY KEY, Part2 TEXT) WITHOUT ROWID;" & @CRLF & _
        "CREATE TABLE Extras (Part1 TEXT PRIMARY KEY, Part2 TEXT) WITHOUT ROWID;" & @CRLF & _
        ".import origdb1.txt Origin --csv" & @CRLF & _
        ".import newlines1.txt Extras --csv" & @CRLF & _
        "insert or replace into origin select * from extras;" & @CRLF & _
        ".output sqlResult.txt" & @CRLF & _
        "select part1 || part2 as '' from Origin;" & @CRLF & _
        ".quit", $s, @ScriptDir & "\sqlite3.exe", True)

Jos :) 

Thank you Jos!

 

There is another problem if you put a path different from the @scriptdir, as I supposed because of "\" in path: you should change them to slash "/" instead! Otherwise again it does the same error that does not find the sqlite.exe ....

local $myDir = "C:\Temp\autoitProgs\"
must become:
local $myDir = "C:/Temp/autoitProgs/"
......

   local $ret = _SQLite_SQLiteExe(  $myDir & "Merging.sq3", _
        "CREATE TABLE Origin (Part1 TEXT PRIMARY KEY, Part2 TEXT) WITHOUT ROWID;" & @CRLF & _
......

And.... another one.... always in path.... the spaces in dir/filenames:

You must enclose the path with single quotes '......' on the cmd part, but NOT in the input one! And also here it will not find the sqlite3.exe! Really annoying to detect all things....

So, this one is good:

 

global $myDir       = "C:\Temp\autoitProgs\___out\"   ; NO on SQL code!!!
    global $myDirSlash  = "C:/Temp/autoitProgs/___out/"   ; OK in SQL code and input BUT without '...' in input!!!
    Global $sFileDB  =   $myDir & "orig.txt"
    Global $sFileNew =   $myDir & "newLines.txt"

    Local $s = "Part1,Part2" & @CRLF & FileRead( $sFileDB )
    $s = StringRegExpReplace($s, "(?m)((?:[^|]*\|){6})(.*\R)", "$1,$2")
    Local $h = FileOpen( $myDir & "origdb1.txt", $FO_OVERWRITE)
    FileWrite($h, $s)
    FileClose($h)

    $s = "Part1,Part2" & @CRLF & FileRead( $sFileNew )
    $s = StringRegExpReplace($s, "(?m)((?:[^|]*\|){6})(.*\R)", "$1,$2")
    $h = FileOpen( $myDir & "newlines1.txt", $FO_OVERWRITE)
    FileWrite($h, $s)
    FileClose($h)
    
    local $ret = _SQLite_SQLiteExe(  $myDirSlash & "Merging.sq3", _
        "DROP TABLE IF EXISTS Origin ;" & @CRLF & _
        "DROP TABLE IF EXISTS Extras ;" & @CRLF & _
        "CREATE TABLE Origin (Part1 TEXT PRIMARY KEY, Part2 TEXT) WITHOUT ROWID;" & @CRLF & _
        "CREATE TABLE Extras (Part1 TEXT PRIMARY KEY, Part2 TEXT) WITHOUT ROWID;" & @CRLF & _
        ".import '" & $myDirSlash & "origdb1.txt' Origin --csv" & @CRLF & _
        ".import '" & $myDirSlash & "newlines1.txt' Extras --csv" & @CRLF & _
        "insert or replace into origin select * from extras;" & @CRLF & _
        ".output '" & $myDirSlash & "Result.txt'" & @CRLF & _
        'select part1 || part2 as "" from origin;' & @CRLF & _
        ".quit", $s, @ScriptDir & "\sqlite3.exe", True)

 

Thank you jchd anyway: your code opened my mind about SQLite...

Actually SQLite IS the winner!!

SQLlite: Total time:1171.2018

Load + 2xSort: 4110.4385
-- getPart added:5555
GetPart: 1226.4129
Total time:5336.8672

Load + 2xSort: 4136.0211
Scripting: 1702.8068
Total time:5838.8379

Load + 2xSort: 4039.9172
-- arrSearch added:5555
arrSearch: 3338.8831
Total time:7378.8284

Not only it's the fastest on the main part, BUT also it does NOT require the preliminary sort part you need to make on both the arr for the other methods and actually that's the most slow part of the process = 4.1s ! And it sorts out also the final result!

So 1.1s vs the other best GetPart at 5.3s !!

Thank you all again :)  

 

Edited by frank10
Link to comment
Share on other sites

2 hours ago, frank10 said:

There is another problem if you put a path different from the @scriptdir, as I supposed because of "\" in path: you should change them to slash "/" instead! Otherwise again it does the same error that does not find the sqlite.exe ....

That's very surprising indeed, because Windows accepts both slashes and backslashes in pathes and has been working this way for very long time.

2 hours ago, frank10 said:

Not only it's the fastest on the main part, BUT also it does NOT require the preliminary sort part you need to make on both the arr for the other methods and actually that's the most slow part of the process = 4.1s ! And it sorts out also the final result!

In fact the "sort" is hidden in the construction of the table's index. One can even fasten this step by inserting rows and only then create the index. This way the underlying B-tree is built in one go, without all the games required with re-balancing the B-tree while inserting.

This behavior is more or less the same as seen in the Scripting.Dictionary version. Internally the object does essentially the same buildup of internal index to be able to lookup members on demand.

Thanks to the speed of crisp C/C++ library code (embedded in SQLite or in Scripting.Dictionary code) the "sort" (not a real sort in fact) is fast.

Also in SQLite case, the option to use the CLI (sqlite.exe) to perform both imports equally benefit of fast C code. If you just read data files into arrays then insert them in SQLite tables using AutoIt code, you'll see runtime skyrocket.

I don't know if keeping the resulting data accessible for later similar merges is important/useful in your use case, but if it is permanent data then yes, SQLite should reveal the best choice in the long run. Of course you might want to split data rows in several columns if you have to lookup data or obtain statistical data about your dataset. SQLite easily handles very large databases:. SQLite database size limit is 281 terabytes, or 256 tebibytes.

BTW, your PC is 3 times faster than mine 😭

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

9 hours ago, jchd said:

That's very surprising indeed, because Windows accepts both slashes and backslashes in pathes and has been working this way for very long time.

In fact the "sort" is hidden in the construction of the table's index. One can even fasten this step by inserting rows and only then create the index. This way the underlying B-tree is built in one go, without all the games required with re-balancing the B-tree while inserting.

This behavior is more or less the same as seen in the Scripting.Dictionary version. Internally the object does essentially the same buildup of internal index to be able to lookup members on demand.

Thanks to the speed of crisp C/C++ library code (embedded in SQLite or in Scripting.Dictionary code) the "sort" (not a real sort in fact) is fast.

Also in SQLite case, the option to use the CLI (sqlite.exe) to perform both imports equally benefit of fast C code. If you just read data files into arrays then insert them in SQLite tables using AutoIt code, you'll see runtime skyrocket.

I don't know if keeping the resulting data accessible for later similar merges is important/useful in your use case, but if it is permanent data then yes, SQLite should reveal the best choice in the long run. Of course you might want to split data rows in several columns if you have to lookup data or obtain statistical data about your dataset. SQLite easily handles very large databases:. SQLite database size limit is 281 terabytes, or 256 tebibytes.

BTW, your PC is 3 times faster than mine 😭

Thanks for the info.

My test rig was a Surface pro 7 (i5)

 

10 hours ago, mikell said:

Hmm... Scripting.Dictionary does NOT require preliminary sorting too  :)

Ohh, good!
So the final winners:

1° SQLite

2° Scripting (but I woud need to sort at least the final result, but should be fast anyway in JS)

3° GetPart

 

Link to comment
Share on other sites

Optimized Array solution :

#include <Array.au3>
#include <File.au3>

Local $hTimer = TimerInit()
FileDelete("TempDB.txt")
Local $aDB = FileReadToArray("OrigDbSort.txt")
Local $aNew = FileReadToArray("newLinesSort.txt")
Local $aAdd[10000]

Local $part, $res, $start = 0, $ind = 0

For $i = 0 to UBound($aNew) - 1
  $part = GetPart($aNew[$i])
  $res = FastSearch($aDB, $part, $start)
  If $res >= 0 Then
    $aDB[$res] = $aNew[$i]
    $start = $res + 1
  Else
    $aAdd[$ind] = $aNew[$i]
    $ind += 1
  EndIf
Next
;ConsoleWrite($ind & @CRLF)

Append($aDB, $aAdd, $ind)
FileWrite("TempDB.txt", _ArrayToString($aDB, @CRLF))

Run(@ComSpec & " /c sort TempDB.txt /o NewDB.txt", "", @SW_HIDE)
MsgBox($MB_SYSTEMMODAL, "", TimerDiff($hTimer))

Func GetPart($string)
  Return StringMid($string, 1 , Stringinstr($string, "|", 1,6))
EndFunc

Func FastSearch(ByRef $array, $str, $begin)
  Local $txt
  For $i = $begin to UBound($array) - 1
    $txt = GetPart($array[$i])
    If $txt == $part Then Return $i
    If $txt > $part then return -1
  Next
  Return -1
EndFunc

Func Append(ByRef $base, ByRef $append, $end)
  Local Const $iBase = UBound($base)
  ReDim $base[$iBase + $end]
  For $i = 0 to $end - 1
    $base[$iBase + $i] = $append[$i]
  Next
EndFunc

In just about 1 sec :)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...