cmcb Posted August 7, 2011 Share Posted August 7, 2011 (edited) Hi I'm working on a script to do the following: 1) query a SQL database for a filename and id 2) store the results somewhere 3) use the filename obtained in 1 and check it exists 4) if file doesn't exist, take the id of that file and obtain the corresponding details End product will be in the form of a GUI but struggling with basic functionality just now. What's the best way to store the 2 database table columns (ID & Filename)? 2D array? How would I then use only the ID or only the filename? If 2D array is not the way to go, I'm open to suggestions. Think I'm OK with the SQL, using variables etc. Appreciate any help with this. Thanks!! 1) Get the files and IDs. $file = @ScriptDir & "\outputfile.txt" $conn = ObjCreate( "ADODB.Connection" ) $DSN = "DRIVER={SQL Server};SERVER=myserver;DATABASE=testdb;UID=myuser;PWD=mypass;" $conn.Open($DSN) $RS = ObjCreate("ADODB.RecordSet") $RS.Open = "SELECT ID, Filename 'file' FROM Table" If $RS.RecordCount Then While Not $RS.EOF FileWrite($file, "\\myserver\myshare\" & $RS.Fields("file").value ; just testing $RS.MoveNext WEnd EndIf $conn.close FileClose($file) $file_to_read = @scriptDir & "\outputfile.txt" $file_to_write = @scriptDir & "\docs_out.txt" Dim $aRecords If Not _FileReadToArray($file_to_read, $aRecords) Then MsgBox(4096,"Error", " Nothing found for specified date range!:") Exit EndIf For $x = 1 to $aRecords[0] ;Msgbox(0,'Record:' & $x, $aRecords[$x]) If Not FileExists($x) Then FileWriteLine($file_to_write, $aRecords[$x]) ElseIf FileExists($x) Then ;FileWriteLine($files_that_exist, $aRecords[$x]) ; dont do anything... EndIf Next Edited August 7, 2011 by cmcb Link to comment Share on other sites More sharing options...
KaFu Posted August 8, 2011 Share Posted August 8, 2011 If you store the info in a simple Autoit 2D array you would always have to loop through all values to find specific ones. I think what you should look for is an "associative array", for which no Autoit standard exists. However there are some UDFs for this in the examples forum, also do a search for the 'Scripting.Dictionary' object, which is my personal favorite for such kind of tasks. Create two different associative arrays, one with the ID as the key and one with the filename as the ID. This way you can poll the values with either array(id)=file or array(file)=id. OS: Win10-22H2 - 64bit - German, AutoIt Version: 3.3.16.1, AutoIt Editor: SciTE, Website: https://funk.eu AMT - Auto-Movie-Thumbnailer (2022-Nov-26) BIC - Batch-Image-Cropper (2023-Apr-01) COP - Color Picker (2009-May-21) DCS - Dynamic Cursor Selector (2024-Feb-16) HMW - Hide my Windows (2018-Sep-16) HRC - HotKey Resolution Changer (2012-May-16) ICU - Icon Configuration Utility (2018-Sep-16) SMF - Search my Files (2023-Jun-03) - THE file info and duplicates search tool SSD - Set Sound Device (2017-Sep-16) Link to comment Share on other sites More sharing options...
cmcb Posted August 8, 2011 Author Share Posted August 8, 2011 Thanks Kafu. That was very helpful. Not had much time today, but I have queried my database and checked file existence. I write my missing files out to a text file and my array of document IDs. Not sure if I need to write out the document IDs or just keep it in an array. If Not FileExists($oDictionary.Item($vKey)) Then $filesFile = FileWriteLine($file_to_write2, $oDictionary.Item($vKey) & @CRLF) $bArray = $bArray + $vKey $docIDsFile = FileWriteLine($file_to_write1, $bArray) ElseIf FileExists($oDictionary.Item($vKey)) Then ;ConsoleWrite("exists" & $oDictionary.Item($vKey)) EndIf Next The next bit is more complicated. I need to use the document IDs stored in $bArray as a variable in SQL. SQL would be something like $RS.Open("SELECT * FROM TABLE WHERE DOCID IN '"&$bArray&'", $conn) I'm not sure how to use the array elements for this. Not sure if I need to store them with a comma either to format them for SQL? i.e. IN '1,2'3 Hope that makes sense. Appreciate any feedback Link to comment Share on other sites More sharing options...
KaFu Posted August 8, 2011 Share Posted August 8, 2011 (edited) By $bArray you mean $string_missing_IDs? Maybe something like this: $string_missing_IDs = "" if not ... then $string_missing_IDs &= $vKey & "," And I guess the SQL call should be something like if $string_missing_IDs then $string_missing_IDs = StringTrimRight($string_missing_IDs,1) ; to kill the trailing comma $RS.Open("SELECT * FROM TABLE WHERE DOCID IN (" & $string_missing_IDs & ")", $conn) Edited August 8, 2011 by KaFu OS: Win10-22H2 - 64bit - German, AutoIt Version: 3.3.16.1, AutoIt Editor: SciTE, Website: https://funk.eu AMT - Auto-Movie-Thumbnailer (2022-Nov-26) BIC - Batch-Image-Cropper (2023-Apr-01) COP - Color Picker (2009-May-21) DCS - Dynamic Cursor Selector (2024-Feb-16) HMW - Hide my Windows (2018-Sep-16) HRC - HotKey Resolution Changer (2012-May-16) ICU - Icon Configuration Utility (2018-Sep-16) SMF - Search my Files (2023-Jun-03) - THE file info and duplicates search tool SSD - Set Sound Device (2017-Sep-16) Link to comment Share on other sites More sharing options...
cmcb Posted August 9, 2011 Author Share Posted August 9, 2011 By $bArray you mean $string_missing_IDs? Maybe something like this:$string_missing_IDs = ""if not ... then$string_missing_IDs &= $vKey & ","And I guess the SQL call should be something likeif $string_missing_IDs then $string_missing_IDs = StringTrimRight($string_missing_IDs,1) ; to kill the trailing comma$RS.Open("SELECT * FROM TABLE WHERE DOCID IN (" & $string_missing_IDs & ")", $conn)Ah... it was the SQL part that was throwing the error. I hadn't handled the commas correctly so I had a trailing one. The StringTrimRight($string_missing_IDs,1) sorted it.Fantastic, thanks Link to comment Share on other sites More sharing options...
cmcb Posted August 9, 2011 Author Share Posted August 9, 2011 Below is my code so far. This will: 1) Get list of documents from DB 2) Check that they exist 3) Build array of document IDs 4) Get details of documents that don't exist My main concern with this is speed. I've only tested for around 200 docs and that takes over 20 seconds. Going by the code below, are there any obvious performance tweaks? For instance, should I just be opening 1 SQL connection and re-using it or am I right to open that 2nd connection? Is there a more efficient method of counting? Will be running this on a production environment using date filters, however, may try a one off search with no dates at which point it could be looking at 100000 files. Thanks again. expandcollapse popup; Declare variables Global $vKey, $sItem, $sMsg, $oDictionary Local $filesFile, $docIDsFile Dim $string_missing_IDs Local $missingCount, $totalCount $missingCount = 0 $totalCount = 0 ; Output filenames ;$file_to_write1 = @ScriptDir & "\my_docids.txt" ;$file_to_write2 = @ScriptDir & "\my_filenames.txt" $docDetails = @ScriptDir & "\my_DocumentDetails.txt" ;Delete files if already present ;FileDelete($file_to_write1) ;FileDelete($file_to_write2) FileDelete($DocDetails) ; Declare and create scripting dictionary $oDictionary = ObjCreate("Scripting.Dictionary") ; begin first SQL connection - get documents and ids $conn = ObjCreate( "ADODB.Connection" ) $DSN = "DRIVER={SQL Server};SERVER=myserver;DATABASE=mydb;UID=testuser;PWD=testpass;" $conn.Open($DSN) ;Create record set and run query $RS = ObjCreate("ADODB.RecordSet") $RS.Open("SELECT DocID 'docid', filename 'file' FROM mytable WHERE service = '1'", $conn) If $RS.RecordCount Then While Not $RS.EOF $docid = $RS.Fields("DocID").value $totalCount = $totalCount + 1 $filename = "\\myserver\c$\Interfaces\" & $RS.Fields("file").value $RS.MoveNext ; Add to dictionary $oDictionary.Add ($docid, $filename) WEnd EndIf $conn.close ; end first SQL connection For $vKey In $oDictionary $sItem &= $vKey & ": " & $oDictionary.Item($vKey) & @CRLF ; EXISTENCE CHECK - if exists, write to file and build doc id string for sql query If Not FileExists($oDictionary.Item($vKey)) Then ;$filesFile = FileWriteLine($file_to_write2, $oDictionary.Item($vKey) & @CRLF) $string_missing_IDs &= $vKey & "," $missingCount = $missingCount+1 ElseIf FileExists($oDictionary.Item($vKey)) Then ;ConsoleWrite("exists" & $oDictionary.Item($vKey)) EndIf Next ; begin 2nd SQL connection - get details for missing documents $conn2 = ObjCreate( "ADODB.Connection" ) $DSN2 = "DRIVER={SQL Server};SERVER=myserver;DATABASE=mydb;UID=testuser;PWD=testpass;" $RS2 = ObjCreate("ADODB.RecordSet") $conn2.Open($DSN) $string_missing_IDs = StringTrimRight($string_missing_IDs,1) $RS2.Open("SELECT docid 'DocID', name 'name', doctitle 'title', docsubject 'subject' docdate 'date' from mytable WHERE docid IN (" & $string_missing_IDs & ")", $conn2) If $RS2.RecordCount Then While Not $RS2.EOF FileWrite($docDetails, $RS2.Fields("DocID").value & ", " & $RS2.Fields("name").value & ", " & $RS2.Fields("title").value & ", " & $RS2.Fields("subject").value & ", " & $RS2.Fields("date").value & ", " & @CRLF) $RS2.MoveNext WEnd EndIf $conn2.close Msgbox(0, "Summary", "Total: " & $totalCount & @CRLF & "Missing: " & $missingCount) Link to comment Share on other sites More sharing options...
Spiff59 Posted August 9, 2011 Share Posted August 9, 2011 (edited) Why not change the first query to "Select * From ...", so you'll have all your columns available, and then process your exceptions in one loop? Throw out the second and third loops, get rid of Scripting.Dictionary and the second query. Edit: Something like: $oADO = ObjCreate("ADODB.Connection") If Not IsObj($oADO) Then MsgBox(1,"ADO Error", "Unable to connect to database") $oRS = ObjCreate("ADODB.RecordSet") $oRS.Open("SELECT * FROM mytable WHERE service = '1'", $oADO) While Not $oRS.EOF $totalCount += 1 $filename = $oRS("filename").value If Not FileExists("\\myserver\c$\Interfaces\" & $filename) Then FileWriteLine($docDetails, $oRS("DocID").value & ", " & $filename & ", " & $oRS("title").value & ", " & $oRS("subject").value & ", " & $oRS("date").value) EndIf $oRS.MoveNext WEnd $oRS.close $oADO.close Edited August 9, 2011 by Spiff59 Link to comment Share on other sites More sharing options...
cmcb Posted August 9, 2011 Author Share Posted August 9, 2011 Thanks for your suggestion Spiff59. I see your point here, but I just checked there and for a particular service I have nearly 2.5 million documents! Looks like I missed a 0 from my initial estimate. Select * FROM table where service = '1' would return all the documents aswell as their details which I would not necessarily need. That's why I'm going back to the database to get details for only missing documents. If anything, would this not slow me down? Link to comment Share on other sites More sharing options...
Spiff59 Posted August 9, 2011 Share Posted August 9, 2011 Thanks for your suggestion Spiff59. I see your point here, but I just checked there and for a particular service I have nearly 2.5 million documents! Looks like I missed a 0 from my initial estimate.Select * FROM table where service = '1' would return all the documents aswell as their details which I would not necessarily need. That's why I'm going back to the database to get details for only missing documents.If anything, would this not slow me down? I don't know how many total columns are in your table, if most are used in the $docDetails report then I'd think "Select *" is fine. If there are a bunch of fields in the table that you're not using, then specifying just the ones you want in the Select would probably be a better idea.I do know that queries that use the "IN" verb are notoriously slow. It doesn't seem a lot of code to play with, maybe try running a benchmark both ways? Link to comment Share on other sites More sharing options...
cmcb Posted August 9, 2011 Author Share Posted August 9, 2011 I don't know how many total columns are in your table, if most are used in the $docDetails report then I'd think "Select *" is fine. If there are a bunch of fields in the table that you're not using, then specifying just the ones you want in the Select would probably be a better idea.I do know that queries that use the "IN" verb are notoriously slow. It doesn't seem a lot of code to play with, maybe try running a benchmark both ways?Interesting Spiff59. I removed the second query and grabbed the document details at the same time as the ID/filename. Original app took 16 seconds, the second attempt took 7 seconds.I'll try and get a test server / database setup and see how this runs against a very large amount of data. Link to comment Share on other sites More sharing options...
cmcb Posted August 10, 2011 Author Share Posted August 10, 2011 (edited) I've tested this against a fair amount of files (40000). Took 7 minutes. The SQL part only takes 7 seconds via Management Studio. I then tried against 160000 files , SQL = 29 seconds. My SQL execution plan suggests the query is as optimised as possible using indexes and so on. I think the part taking the longest is either checking existence or writing details to file. On a live system there will be few (if any) files missing so this might be faster. Perhaps there's quicker means of either 1) existence check or 2) file writing that I could consider. I'm keen to optimise if possible. Any advice or suggestions appreciated. Thanks! Here's my code (I just call the function using a GUI). expandcollapse popupFunc myfunc() Local $filesFile, $docIDsFile Global $missingCount, $totalCount Global $missing_COunt, $total_Count Global $startDate, $endDate $missingCount = 0 $totalCount = 0 $docDetails = @ScriptDir & "\DocumentDetails.txt" $Summary = @ScriptDir & "\Summary.txt" FileDelete($DocDetails) FileDelete($Summary) ; begin first SQL connection - get documents and ids $conn = ObjCreate( "ADODB.Connection" ) $DSN = "DRIVER={SQL Server};SERVER=myserver;DATABASE=mydb;UID=testuser;PWD=testpass;" $conn.Open($DSN) ;Create record set and run query $RS = ObjCreate("ADODB.RecordSet") $RS.Open("SELECT DocID 'docid', filename 'file', title 'title', subject 'subject', date 'date', name 'name' FROM mytable WHERE service = '1'", $conn) If $RS.RecordCount Then While Not $RS.EOF $totalCount += 1 $filename = "\\myserver\c$\Interfaces\" & $RS.Fields("file").value If Not FileExists($filename) Then $missingcount += 1 FileWrite($docDetails, $RS.Fields("DocID").value & ", " & $RS.Fields("name").value & ", " & $RS.Fields("title").value & ", " & $RS.Fields("subject").value & ", " & $RS.Fields("date").value & @CRLF) EndIf $RS.MoveNext WEnd EndIf $conn.close FileWrite($Summary, "Total: " & $totalCount & @CRLF & "Missing: " & $missingCount) EndFunc Edited August 10, 2011 by cmcb Link to comment Share on other sites More sharing options...
KaFu Posted August 10, 2011 Share Posted August 10, 2011 Maybe this is a little faster? Local $sBuffer_docDetails While Not $RS.EOF $totalCount += 1 If Not FileExists("\\myserver\c$\Interfaces\" & $RS.Fields("file" ).value) Then $missingcount += 1 $sBuffer_docDetails &= $RS.Fields("DocID" ).value & ", " & $RS.Fields("name" ).value & ", " & $RS.Fields("title" ).value & ", " & $RS.Fields("subject" ).value & ", " & $RS.Fields("date" ).value & @CRLF EndIf $RS.MoveNext WEnd FileWrite($docDetails, $sBuffer_docDetails) OS: Win10-22H2 - 64bit - German, AutoIt Version: 3.3.16.1, AutoIt Editor: SciTE, Website: https://funk.eu AMT - Auto-Movie-Thumbnailer (2022-Nov-26) BIC - Batch-Image-Cropper (2023-Apr-01) COP - Color Picker (2009-May-21) DCS - Dynamic Cursor Selector (2024-Feb-16) HMW - Hide my Windows (2018-Sep-16) HRC - HotKey Resolution Changer (2012-May-16) ICU - Icon Configuration Utility (2018-Sep-16) SMF - Search my Files (2023-Jun-03) - THE file info and duplicates search tool SSD - Set Sound Device (2017-Sep-16) Link to comment Share on other sites More sharing options...
MvGulik Posted August 10, 2011 Share Posted August 10, 2011 (edited) Not speed related, But you might like to use* "Static Local" for those "Global ..." vars. (if your not using those var's in other part of your code.) *and test, as Static still has a experimental status. (Working pretty well though) Edited August 10, 2011 by iEvKI3gv9Wrkd41u "Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions.""The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014) "Believing what you know ain't so" ... Knock Knock ... Link to comment Share on other sites More sharing options...
Spiff59 Posted August 10, 2011 Share Posted August 10, 2011 (edited) Argh! KaFu beat me to it! But, on another note... Your FileWrite was opening, appending, and closing the file each time you added a line. Not good practice. As shown in the help file, you can pass FileWrite() and most I/O routines either a filename or a file handle. For multiple actions against a file, using the handle method (without all the automatic opens and closes) is much faster. You should open the file once up-front getting a file handle, then use that handle in the read/write function, then at the end close the file (again by referrencing the handle). Alternately, it might be quicker to just build a huge string($Str &= ...) in the loop, and dump it to a file all at once at the end. PS - There may be sneaky things you can do to save time in the FileExists() area. Are all the files being processed in the one "\\myserver\c$\Interfaces\" folder, with nothing in subfolders? What are the formats of the filenames being processed? Edited August 10, 2011 by Spiff59 Link to comment Share on other sites More sharing options...
cmcb Posted August 10, 2011 Author Share Posted August 10, 2011 Ahh,., I get ya. Understand the file write suggestion. Makes perfect sense so I'll get that implemented ASAP. I ran the app on my file server and performance was much improved using a local filepath instead of unc. The files are stored under year -> month -> day sub folders. Perhaps that rules out your next suggestion? :-0 Thanks again for the input folks. :-) Link to comment Share on other sites More sharing options...
cmcb Posted August 11, 2011 Author Share Posted August 11, 2011 Not speed related, But you might like to use* "Static Local" for those "Global ..." vars. (if your not using those var's in other part of your code.)*and test, as Static still has a experimental status. (Working pretty well though)I declare the counts as globals here then use them in my main gui file. Thanks. Link to comment Share on other sites More sharing options...
Spiff59 Posted August 11, 2011 Share Posted August 11, 2011 (edited) There is a sneaky way to gain a lot of speed by discarding all of the FileExists() calls. It's playing dirty, and wouldn't be allowed in a shop that enforced standards, but it does work, and is very fast. An adaptation of a Yashied trick: expandcollapse popup#include <File.au3> Global $FileList, $SearchList[101] = [100] For $x = 1 to 5000 ; Create test files $str = StringRight("0000" & $x, 4) FileWrite("test" & $str & ".log", "") Next $FileList = _FileListToArray(@ScriptDir, "test*.log"); Load files into array For $x = 1 to UBound($SearchList) - 1 ; Create search list $rnd = Random(1,$FileList[0],1) $SearchList[$x] = $FileList[$rnd] Next $SearchList[25] = "xyzzy" ; insert bad apples $SearchList[50] = "plugh" $SearchList[75] = "plover" $result1 = Traditional() ; Using FileExists() $result2 = Sneaky() ; Using Assign()/IsDeclared() MsgBox(1, "5000 files, 500000 searches", $result1 & @CRLF & $result2) FileDelete("test*.log") ; delete test files Exit ;=================================================================================================================================== Func Traditional() $timer = TimerInit() For $x = 1 to 5000 $cnt = 0 For $y = 1 to $SearchList[0] ; If Not FileExists($SearchList[$y]) Then ConsoleWrite("T > $SearchList[" & $y & "]: " & $SearchList[$y] & " not found" & @CRLF) If Not FileExists($SearchList[$y]) Then $cnt += 1 Next Next Return "T > " & $cnt & " files missing, found in " & Round(TimerDiff($timer)/1000, 2) & " seconds" EndFunc ;=================================================================================================================================== Func Sneaky() $timer = TimerInit() $FileList = _FileListToArray(@ScriptDir, "test*.log"); Load files into array (again, to make benchmark more fair) For $x = 1 to $FileList[0] Assign($FileList[$x], 0 ,1) ; Create variables from array Next For $x = 1 to 5000 $cnt = 0 For $y = 1 to $SearchList[0] ; search for missing files ; If Not IsDeclared($SearchList[$y]) Then ConsoleWrite("S > $SearchList[" & $y & "]: " & $SearchList[$y] & " not found" & @CRLF) If Not IsDeclared($SearchList[$y]) Then $cnt += 1 Next Next Return "S > " & $cnt & " files missing, found in " & Round(TimerDiff($timer)/1000, 2) & " seconds" EndFunc If not for (disk) caching I'm betting the FileExists() method would be miserably slower, but still a 350% performance gain ain't too bad! Besides being "dirty" code, it does have the requirement that the filenames be fairly standard, as filenames containing certain special characters will blow the Assign() statement out of the water. But if you are dealing with a million searches, and your filenames are vanilla, then I might consider throwing standards out the window. Edit: From another test it appears the performance gap widens considerably (in favor of the "cheating" method) as the number of files and searches increase. Edited August 15, 2011 by Spiff59 Link to comment Share on other sites More sharing options...
MvGulik Posted August 11, 2011 Share Posted August 11, 2011 Don't think its sneaky/cheating. If the work situation allows for it, it just a potential fast alternative. Comspec with 'dir /b' comes to mind as a other one than might shave some points of the time. - With the (assumed) benefit (over autoit) that any consecutive dir calls for the same location (if needed) are lightning fast due to windows cache use. + if dir-call return data is used as is(string block), no array use needed either. "Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions.""The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014) "Believing what you know ain't so" ... Knock Knock ... Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now