Jump to content

Need help in optimizing _arraysearch() script


marko001
 Share

Recommended Posts

Hi all,

I have this script, inside my program, that :

a) take a source file (.csv) (around 5.000 lines), internal.csv

b ) converts it so that I can separate each single part (sometimes the first item is i.e. '"Legend of, Zelda",PC,..'. so that there is an extra "," to consider)

b1) "Normalizes" the names found so that they can be compared to a .xml structure

c) create a string for every line in .xml format (<z:row c0='aaaaa' It2='UZ' Avail='No' fold='E...)

d) checks the string with another "arrayed" file (50.000 lines), $mydb

e) if _arraysearch returns OK then add to an array the complete string (all the parts of b ) and 2 fields of the .xml, c7=... and c8=...)

f) creates a file and uploads it to a server

with the code shown below it takes aroud 15 mins to finish.

Is there a way to optimize the code to reduce this time?

Thanks,

Marco

P.s. The code works, so no problem if some var/const is missing, probably they have been called by the main .au3 that has all the include needed.

#include <FTPEx.au3>
 
$exportfile = @ScriptDir & "\DB\internal.csv"
Dim $array2db[1]
Dim $mydb
Global $file2ftp = @TempDir & "\temp2db.txt"
 
Func _exp2db($user)
    Local $user2db
    Dim $csv_list[1][8]
    Dim $csv_array
    If $user= "" Then
        $user2db = "Test_User"
    Else
        $user2db = $user
    EndIf
    _status(0, "Converting now .csv")
    _FileReadToArray($exportfile, $csv_array)
    For $i = 2 To UBound($csv_array) - 1
        If StringInStr($csv_array[$i], ", ,") Then
            $string = StringReplace($csv_array[$i], ", ,", ",@,", 1)
        Else
            $string = $csv_array[$i]
        EndIf
        If StringInStr($string, ", ") Then $string = StringReplace($string, ", ", "^", 1)
        $name_s = _StringBetween($string, "", ",")
        $name = _cleanfordb($name_s[0]) ;name cleaned (remove " and ^ I used before and replace , if exists)
        $string = StringReplace($string, ",", "*", 1) ; comma to get the next item
        $string = StringReplace($string, ",", "#", 1) ; comma to get the next item
        $item2 = _StringBetween($string, "*", "#") ;another item
        $item3 = _StringBetween($string, "#", ",") ;another item
        $string = StringReplace($string, ",", "_", 1) ; comma to get the next item
        $item4 = _StringBetween($string, "_", ",") ;another item
        $item5 = _StringBetween($string, ",", ",") ;another item
        If $item4[0] <> "TRASH" And $item4[0] <> "PC" And $item4[0] <> "PSX" And $item4[0] <> "WII" And $item4[0] <> "DS" Then
            $item5[0] = "TRASH" ;
            $item4[0] = "X"
        EndIf
        If $item4[0] = "TRASH" Then
            $item5[0] = $item4[0]
            $item4[0] = "None"
        EndIf
        $item6= StringRight($string, 2) ;another item
        If $item6= "es" Then $item6= "Yes"
        ReDim $csv_list[$i + 1][8]
        $csv_list[$i][0] = $name
        $csv_list[$i][1] = $item4[0]
        If $item5[0] <> "@" Then $csv_list[$i][2] = $item5[0]
        $csv_list[$i][3] = $item6
        $csv_list[$i][4] = $item2[0]
        $csv_list[$i][5] = $item3[0]
        _FileReadToArray($originalDB, $mydb)
        $string = "<z:row c0='" & _normalize_db($csv_list[$i][0]) & "' c1='" & $csv_list[$i][2] & "' c2='" & $csv_list[$i][3] & "' c3='" & $csv_list[$i][1]
        [b]$check= _ArraySearch($mydb, $string, 39, UBound($mydb), 0, 1)[/b]
        if $check <> -1 Then
            $csv_list[$i][6] = _find1($check_1)
            $csv_list[$i][7] = _find2($check_1)
        EndIf
        $string2db = $user2db & "|" & $csv_list[$i][0] & "|" & $csv_list[$i][1] & "|" & $csv_list[$i][2] & "|" & $csv_list[$i][3] & "|" & $csv_list[$i][4] & "|" & $csv_list[$i][5] & "|" & $csv_list[$i][6] & "|" & $csv_list[$i][7]
        _ArrayAdd($array2db, $string2db)
    Next
    _FileWriteFromArray($file2ftp, $array2db)
    _status(0, "Uploading now")
    _upload2server($user2db)
EndFunc   ;==>_exp2db
 
Func _cleanfordb($a)
    If StringInStr($a, "^") Then $a= StringReplace($a, "^", ", ") ; ^
    If StringInStr($a, '"') Then $a= StringMid($a, 2, StringLen($a) - 2)
    Return $a
EndFunc   ;==>_cleanfordb
 
Func _upload2server($us)
    Local $server = "myserver.com"
    Local $username = "mylogin"
    Local $password = "mypw"
    $i_Passive = '1'
    $port = '21'
    $open = _FTP_Open("MyFTP")
    $connect = _FTP_Connect($open, $server, $username, $password, $i_Passive, $port)
    If $connect = 0 Then
        _status(1, "Unable to connect")
    Else
        $send = _FTP_FilePut($connect, $file2ftp, "/in/" & $us & "_db.txt")
        If $send = 0 Then
            _status(1, "Error in uploading file")
        Else
            _status(0, "Updated")
        EndIf
    EndIf
    $close = _FTP_Close($open)
    ;   FileDelete($file2ftp)
EndFunc   ;==>_upload2server

and

Func _normalize_db($name)
    If StringInStr($name, "&") Then $name= StringReplace($name, "&", "&#x26;") ; &
    If StringInStr($name, "'") Then $name= StringReplace($name, "'", "&#x27;") ; '
    If StringInStr($name, '"') Then $name= StringReplace($name, '"', "&#x22;") ; "
    Return $name
EndFunc   ;==>_normalize_db
 
Func _find1($cc)
    $p = _StringBetween($mydb[$cc], "c4='", "'")
    If $p[0] = 0 Then $p = _StringBetween($mydb[$cc], "c5='", "'")
    Return $p[0]
EndFunc   ;==> _find1
 
Func _find2($cc)
    $p = _StringBetween($mydb[$cc], "c6='", "'")
    If $p[0] = 0 Then $p = _StringBetween($mydb[$cc], "c7='", "'")
    Return $p[0]
EndFunc   ;==>_find2
Edited by marko001
Link to comment
Share on other sites

Can the array you're searching be sorted without causing problems with the data in it? If you can sort the array without it causing issues with your program, use _ArrayBinarySearch after running _ArraySort on the array, it searches a lot faster.

Also, don't use _ArrayAdd to add something to your array as it uses ReDim which can cause speed issues. If you are looping through 5000 times and adding one line to the array at a time, this can take a considerable amount of time.

Dimension your array to be able to hold the contents of your csv file, and add the lines you need to one at a time by manually adding them yourself and then ReDim the $mydb array after you're done to make it so that it's just the size of the original information, plus the added strings from the csv file.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

Nope, the DB, in xml format cannot be sorted (at least... don't know what index to use to sort it, it stars with

<z:row c0='...

and the first 39 lines are the usual xml header (that's for i start from 39)

the final array "should" be the same dimension of the initial unless some _arraysearch() return 0. You mean to add with a $array[$i] = $var ?

Link to comment
Share on other sites

The $mydb array can be sorted, you're not writing it back to the disk, you're just using it to compare the strings from the .csv file if I'm reading your script right. _ArrayBinarySearch will definitely help speed up some of the process.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

I'm surprised to not see a StringSplit() in a script processing a comma-delimited file. I'd think you could replace all those StringReplace() and StringBetween() statements with something like "$item = StringSplit($csv_array[$i]

, ",") ?

Do you need to be reading in the original database for every line of the csv file ( _FileReadToArray($originalDB, $mydb)) ?

Edit: oops, quoted the worng line of code :graduated: PS - I'd be interested in hearing how much of the 15 minutes is taken up by the upload portion...

Edited by Spiff59
Link to comment
Share on other sites

Here's a way of doing it that should speed up part of the process. I didn't change anything in regards to the reformatting of the input lines, just the searching and adding to the array parts. I don't know what the lines you're generating look like, or what's in the $mydb array so the _ArrayBinarySearch section probably won't work without tweaking to match the data in both areas so you can search on them.

Trust me in this, once you see how fast ArrayBinarySearch is over ArraySearch on a large array, you will figure out a way to get the search to work for you.

Look for the comments in the code for changes.

#include <FTPEx.au3>
$exportfile = @ScriptDir & "\DB\internal.csv"
;~ Dim $array2db[1]
Dim $mydb
Global $file2ftp = @TempDir & "\temp2db.txt"
Func _exp2db($user)
 Local $user2db, $array2db[10000]
 Dim $csv_list[1][8]
 Dim $csv_array
 If $user = "" Then
  $user2db = "Test_User"
 Else
  $user2db = $user
 EndIf
 _status(0, "Converting now .csv")
 _FileReadToArray($exportfile, $csv_array)
 _FileReadToArray($originalDB, $mydb) ; moved out of the loop
 _ArraySort($mydb) ; sort the array
 Local $iCounter = 0
 For $i = 2 To UBound($csv_array) - 1
  If StringInStr($csv_array[$i], ", ,") Then
   $string = StringReplace($csv_array[$i], ", ,", ",@,", 1)
  Else
   $string = $csv_array[$i]
  EndIf
  If StringInStr($string, ", ") Then $string = StringReplace($string, ", ", "^", 1)
  $name_s = _StringBetween($string, "", ",")
  $name = _cleanfordb($name_s[0]) ;name cleaned (remove " and ^ I used before and replace , if exists)
  $string = StringReplace($string, ",", "*", 1) ; comma to get the next item
  $string = StringReplace($string, ",", "#", 1) ; comma to get the next item
  $item2 = _StringBetween($string, "*", "#") ;another item
  $item3 = _StringBetween($string, "#", ",") ;another item
  $string = StringReplace($string, ",", "_", 1) ; comma to get the next item
  $item4 = _StringBetween($string, "_", ",") ;another item
  $item5 = _StringBetween($string, ",", ",") ;another item
  If $item4[0] <> "TRASH" And $item4[0] <> "PC" And $item4[0] <> "PSX" And $item4[0] <> "WII" And $item4[0] <> "DS" Then
   $item5[0] = "TRASH" ;
   $item4[0] = "X"
  EndIf
  If $item4[0] = "TRASH" Then
   $item5[0] = $item4[0]
   $item4[0] = "None"
  EndIf
  $item6 = StringRight($string, 2) ;another item
  If $item6 = "es" Then $item6 = "Yes"
  ReDim $csv_list[$i + 1][8]
  $csv_list[$i][0] = $name
  $csv_list[$i][1] = $item4[0]
  If $item5[0] <> "@" Then $csv_list[$i][2] = $item5[0]
  $csv_list[$i][3] = $item6
  $csv_list[$i][4] = $item2[0]
  $csv_list[$i][5] = $item3[0]
  $string = "<z:row c0='" & _normalize_db($csv_list[$i][0]) & "' c1='" & $csv_list[$i][2] & "' c2='" & $csv_list[$i][3] & "' c3='" & $csv_list[$i][1]
  $check = _ArrayBinarySearch($mydb, StringLeft($string, 100), 39, 0) ; depending on how many characters you need to check against $mydb, change the 100 appropriately
  If $check <> -1 Then
   $csv_list[$i][6] = _find1($check_1)
   $csv_list[$i][7] = _find2($check_1)
  EndIf
  $string2db = $user2db & "|" & $csv_list[$i][0] & "|" & $csv_list[$i][1] & "|" & $csv_list[$i][2] & "|" & $csv_list[$i][3] & "|" & $csv_list[$i][4] & "|" & $csv_list[$i][5] & "|" & $csv_list[$i][6] & "|" & $csv_list[$i][7]
  $array2db[$iCounter] = $string2db ; add string to the array
  $iCounter += 1 ; increment the counter by 1
  If $iCounter = UBound($array2db) - 2 Then ReDim $array2db[UBound($array2db) * 2) ; if the counter = Ubound($array2db) double the size of the array
 Next
 ReDim $array2db[$iCounter]
 _FileWriteFromArray($file2ftp, $array2db)
 _status(0, "Uploading now")
 _upload2server($user2db)
EndFunc   ;==>_exp2db
Func _cleanfordb($a)
 If StringInStr($a, "^") Then $a = StringReplace($a, "^", ", ") ; ^
 If StringInStr($a, '"') Then $a = StringMid($a, 2, StringLen($a) - 2)
 Return $a
EndFunc   ;==>_cleanfordb
Func _upload2server($us)
 Local $server = "myserver.com"
 Local $username = "mylogin"
 Local $password = "mypw"
 $i_Passive = '1'
 $port = '21'
 $open = _FTP_Open("MyFTP")
 $connect = _FTP_Connect($open, $server, $username, $password, $i_Passive, $port)
 If $connect = 0 Then
  _status(1, "Unable to connect")
 Else
  $send = _FTP_FilePut($connect, $file2ftp, "/in/" & $us & "_db.txt")
  If $send = 0 Then
   _status(1, "Error in uploading file")
  Else
   _status(0, "Updated")
  EndIf
 EndIf
 $close = _FTP_Close($open)
 ;   FileDelete($file2ftp)
EndFunc   ;==>_upload2server

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

Mainly this is the thing:

FIle A: is a .csv that contains (real example:)

Line1: Fifa2011,PC,2011,Y,1,2007

LineN: "Zeldá, Legend of", WII,2009,N,2,2008

And this is the .xml (from line 39 and without the last 2 lines that close the .xml)

Line 45: <z:row c0='Fifa2011' C1='2011' C2='PC' C3='Y' c4='35' c5='43' c6='26' c7='33'/>

Line 133: <z:row c0='Zeld&#xE2;, Legend of' C1='2009' C2='Wii' C3='N' c4='0' c5='11' c6='0' c7='23'/>

What I need to have is a file that contains this:

Line1: Fifa2011|PC,2011|Y|1|35|33

...

LineN: Zeldá, Legend of|WII|2009|N|2|11|23 (if c4=0 i get c5 and if c6=0 i get c7 values)

.csv is around 5/6.000 lines and .xml can be around 45.000

Upload is nothing, around 10 secs, that's not the problem.

M.

Edited by marko001
Link to comment
Share on other sites

That's a lot better result than 15 minutes. :graduated: You probably don't need the binary searching in this case.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

P.s. The code works, so no problem if some var/const is missing, probably they have been called by the main .au3 that has all the include needed.

After reading that and then seeing this, Im wonding if $check_1 is a var in the main .au3? if so, ignore the rest this post.

$check= _ArraySearch($mydb, $string, 39, UBound($mydb), 0, 1)
if $check <> -1 Then
       $csv_list[$i][6] = _find1($check_1)
       $csv_list[$i][7] = _find2($check_1)
EndIf

As an alternative to _arraysearch(), a dictionary object (PERL associative array) might work too. They can be very fast.

After you read your originaldb to an array, build the dictionary with the db strings as keys, and the array index as items.

Global $o_mydb = ObjCreate("Scripting.Dictionary")
_FileReadToArray($originalDB, $mydb)
For $i = 1 to $mydb[0]
   If Not $o_$mydb.Exists($mydb[$i]) Then $o_$mydb.Add($mydb[$i], $i)
Next

Then instead of using _arraysearch():

$string = "<z:row c0='" & _normalize_db($csv_list[$i][0]) & "' c1='" & $csv_list[$i][2] & "' c2='" & $csv_list[$i][3] & "' c3='" & $csv_list[$i][1]
If $o_mydb.exists($string) Then
   $csv_list[$i][6] = _find1($o_mydb.item($string))
   $csv_list[$i][7] = _find2($o_mydb.item($string))
EndIf
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...