rudi Posted February 1, 2008 Share Posted February 1, 2008 Hello, I have an array of files to be processed (sort only certain files to specific folders), ~20k filenames which I did readin to an array using <_FileListToArrayFaster1e.au3> Before starting the sort process I need to clean out all those entries from the Array, that do not need to be sorted. #include <_FileListToArrayFaster1e.au3> #include <array.au3> Const $sPath="C:\DropSourceFilesHere" $MyArray = _FileListToArray3($sPath, "*", 1, 1, 1, "", 0) ; read file names only, recursively: $sFilter = "*", $iFlag = 1 (FilesOnly), $iRecurse = 1, $iBaseDir = 1, $sExclude = "", $i_deleteduplicate = 1 ConsoleWrite($MyArray[0] & " Filenames have to be processed" & @LF) CleanupArray($MyArray) Func CleanupArray(ByRef $MyArray) Local $Clean For $Clean = $MyArray[0] To 1 Step - 1 ; going upwards would make it necessary to check again the same Value for $Clean. If Not CheckFile($MyArray[$Clean]) Then _ArrayDelete($MyArray, $Clean) ; this file is not going to be sorted later -> take it out from MyArray EndIf Next $MyArray[0] = UBound($MyArray) - 1 ; Längenangabe nach Säubern wieder korrekt setzen. EndFunc ;==>CleanupArray Func CheckFile($FullPath) Local $OK = True If stringinstring($FullPath) Then $OK = False ; ... several other testing done here. Return $OK EndFunc ;==>CheckFile Maybe creating a separate array and copying over valid values would be the better approach? Regards, Rudi. Earth is flat, pigs can fly, and Nuclear Power is SAFE! Link to comment Share on other sites More sharing options...
The Kandie Man Posted February 1, 2008 Share Posted February 1, 2008 (edited) Maybe creating a separate array and copying over valid values would be the better approach?Correct. Untested String Method:Func CleanupArray(ByRef $MyArray) Local $Clean, $sTemp = "" For $Clean = 1 To $MyArray[0] Step 1 If CheckFile($MyArray[$Clean]) Then $sTemp &= $MyArray[$Clean] & "|" EndIf Next $MyArray = StringSplit(StringTrimRight($sTemp,1),"|");StringTrimRight removes last pipe delimeter EndFunc ;==>CleanupArrayoÝ÷ ÙIízË^t ëk#¶jëh×6Func CleanupArray(ByRef $MyArray) Local $Clean, $asTemp[1] For $Clean = 1 To $MyArray[0] Step 1 If CheckFile($MyArray[$Clean]) Then Redim $asTemp[UBound($asTemp)+1] $asTemp[UBound($asTemp)-1] = $MyArray[$Clean] EndIf Next $asTemp[0] = UBound($asTemp)-1 $MyArray = $asTemp EndFunc ;==>CleanupArray The array method will probably be faster. - The Kandie Man ;-) Edited February 1, 2008 by The Kandie Man "So man has sown the wind and reaped the world. Perhaps in the next few hours there will no remembrance of the past and no hope for the future that might have been." & _"All the works of man will be consumed in the great fire after which he was created." & _"And if there is a future for man, insensitive as he is, proud and defiant in his pursuit of power, let him resolve to live it lovingly, for he knows well how to do so." & _"Then he may say once more, 'Truly the light is sweet, and what a pleasant thing it is for the eyes to see the sun.'" - The Day the Earth Caught Fire Link to comment Share on other sites More sharing options...
rudi Posted February 1, 2008 Author Share Posted February 1, 2008 (edited) Thanks for your reply. As this way the string will become extremly long I tried it a different way and by that came across a amazing bahavior of _FileWriteFromArray() When you execute the following code you will see, that the created TXT file has a leading blank line where none should be: #include <file.au3> #include <array.au3> $MyArray = StringSplit("1,2,3,4,5,6,7,8,9,10,11,12", ",") Dim $TempArray[$MyArray[0] + 1] Dim $t = 1 _ArrayDisplay($MyArray) For $i = 1 To $MyArray[0] If Valid($MyArray[$i]) Then $TempArray[$t] = $MyArray[$i] $t = $t + 1 EndIf Next $TempArray[0] = UBound($TempArray) - 1 _FileWriteFromArray("C:\Foo-Bar.txt", $TempArray, 1, $t - 1) RunWait("notepad C:\Foo-Bar.txt") _FileReadToArray("C:\Foo-Bar.txt", $MyArray) _ArrayDisplay($MyArray) Func Valid($VString) If $VString = "5" Then Return False If $VString = "7" Then Return False Return True EndFunc ;==>Valid So it looks to me as if _FileWriteFromArray() is gererally putting a "leading blank line" into its OutFile, or what do I miss ? Regards, Rudi. Edited February 1, 2008 by rudi Earth is flat, pigs can fly, and Nuclear Power is SAFE! Link to comment Share on other sites More sharing options...
Siao Posted February 1, 2008 Share Posted February 1, 2008 (edited) So it looks to me as if _FileWriteFromArray() is gererally putting a "leading blank line" into its OutFile, or what do I missIt's a bug, which has been noted many times, and fixed in the latest beta.Anyway, there's no reason to use _FileWriteFromArray anyway, because it's pretty ineffective (calling FileWrite that often is a really bad design idea), and anyone past "total newbie" level should be able to write a simple loop to do that without trouble anyway. Edited February 1, 2008 by Siao "be smart, drink your wine" Link to comment Share on other sites More sharing options...
Uten Posted February 1, 2008 Share Posted February 1, 2008 I would use a linked list approach. It should be considerable faster (10-1000 times) than the code you have provided. It would goe something like this: 1: Use a two dimensional array. 2: One field for the string and one for an identifier. 3: The first entry in the array[index=0][0] holds the starting point of the linked list, and [index=0][1] is the starting point of a linked list of free slots. 4: The last item array[index=UBound(array)-1][0] could also be given a special meaning. Last item in list. Items in list or similar. 5: Each entry array[index=n][0] is the index of the next item in the list (this goes if it is the data or the free slots we are pointing at). 6: Make functions for listadd, listremove, initlist and so one. Those functions just alter array[index=n][0] and [index=0][1] I don't have the time to provide the code at the moment. Maybe someone else have the time or you could find samples in scripts. I'm sure I have seen samples there.. Best of luck.. Please keep your sig. small! Use the help file. Search the forum. Then ask unresolved questions :) Script plugin demo, Simple Trace udf, TrayMenuEx udf, IOChatter demo, freebasic multithreaded dll sample, PostMessage, Aspell, Code profiling Link to comment Share on other sites More sharing options...
ame1011 Posted February 1, 2008 Share Posted February 1, 2008 I would reccomend altering this code that you got that creates the array in the first place so that the first array created contains only files that match your criteria. [font="Impact"] I always thought dogs laid eggs, and I learned something today. [/font] Link to comment Share on other sites More sharing options...
rudi Posted February 1, 2008 Author Share Posted February 1, 2008 It's a bug, which has been noted many times, and fixed in the latest beta.Ah. Good to know, as with the next production one it will be fixed too. I'll try to keep that in my mind.Currently I do a ArrayDelete($MyArray,1) after I did read in the file again to get rid of that blank line.Anyway, there's no reason to use _FileWriteFromArray anyway, because it's pretty ineffective (calling FileWrite that often is a really bad design idea), and anyone past "total newbie" level should be able to write a simple loop to do that without trouble anyway.? Why do you write "calling ... that often"?I just use one write and one read to get rid of the empty array parts at it's end. It was much faster than multiple ArrayDelete calls?Well, it might be even faster to re-DIM the $MyArray with the required nuber of values and then to copy these over from the $TempArray.Thanks, Rudi. Earth is flat, pigs can fly, and Nuclear Power is SAFE! Link to comment Share on other sites More sharing options...
rudi Posted February 1, 2008 Author Share Posted February 1, 2008 I would reccomend altering this code that you got that creates the array in the first place so that the first array created contains only files that match your criteria.How?Currently I recursively read in all files starting from a certain directory. (some 19000 files)Then I sort out from this list, which I will need to touch later on.Regards, Rudi. Earth is flat, pigs can fly, and Nuclear Power is SAFE! Link to comment Share on other sites More sharing options...
rudi Posted February 1, 2008 Author Share Posted February 1, 2008 I would use a linked list approach. It should be considerable faster (10-1000 times) than the code you have provided. It would goe something like this: 1: Use a two dimensional array. 2: One field for the string and one for an identifier. 3: The first entry in the array[index=0][0] holds the starting point of the linked list, and [index=0][1] is the starting point of a linked list of free slots. 4: The last item array[index=UBound(array)-1][0] could also be given a special meaning. Last item in list. Items in list or similar. 5: Each entry array[index=n][0] is the index of the next item in the list (this goes if it is the data or the free slots we are pointing at). 6: Make functions for listadd, listremove, initlist and so one. Those functions just alter array[index=n][0] and [index=0][1] I don't have the time to provide the code at the moment. Maybe someone else have the time or you could find samples in scripts. I'm sure I have seen samples there.. Best of luck.. Interesting advice. But why should this be so much faster to build a linked list rather than searching $MyArray for valid values and copying these over to a second one, my $TempArray? In this case it's just one write to the 2nd array for every valid entry in the first one. With a linked list it's two of them: Pointer from the last found valid entry to the "now" found valid entry and the [index=0][1] pointing towards the "now" found valid entry? Propably I missunderstood something... Is this what you wanted to give to me? expandcollapse popup$MyArray[19001][2] ; fill in the data... 0: 19000 1 1: fault 0 2: Valid 0 3: fault 0 4: fault 0 5: Valid 0 6: fault 0 7: Valid 0 8: valid 0 9: fault 0 ... 18995: fault 0 18996: valid 0 18997: fault 0 18998: valid 0 18999: fault 0 19000: fault 0 ; would be after doing the linked list processing: 0: 2 18998 1: fault 0 2: Valid 5 3: fault 0 4: fault 0 5: Valid 7 6: fault 0 7: Valid 8 8: valid <next valid> 9: fault 0 ... 18995: fault 0 18996: valid 18998 18997: fault 0 18998: valid <EOV> 18999: fault 0 19000: fault 0 Thanks, Rudi. Earth is flat, pigs can fly, and Nuclear Power is SAFE! Link to comment Share on other sites More sharing options...
Siao Posted February 1, 2008 Share Posted February 1, 2008 ? Why do you write "calling ... that often"?I just use one write and one read to get rid of the empty array parts at it's end. It was much faster than multiple ArrayDelete calls?I thought I made pretty clear I was talking about _FileWriteFromArray. It calls FileWrite for each array element, which is a bad programming no matter how you look at it, and makes it pretty dam slow for big (or even decent sized) arrays, something that could be easily avoided concatenating the string in memory and writing to output file just once (or in big chunks, if it has to handle reaaaally huge arrays). "be smart, drink your wine" Link to comment Share on other sites More sharing options...
The Kandie Man Posted February 1, 2008 Share Posted February 1, 2008 Did you look at my methods? This one was the winner:Func CleanupArray(ByRef $MyArray) Local $Clean, $sTemp = "" For $Clean = 1 To $MyArray[0] Step 1 If CheckFile($MyArray[$Clean]) Then $sTemp &= $MyArray[$Clean] & "|" EndIf Next $MyArray = StringSplit(StringTrimRight($sTemp,1),"|");StringTrimRight removes last pipe delimeter EndFunc ;==>CleanupArray It sorted through an array of 85,810 file path elements in 1-3 seconds.(An entire hard drive of mine) With arrays only several thousand elements in size, it sorted through and removed elements in less than a second. - The Kandie Man ;-) "So man has sown the wind and reaped the world. Perhaps in the next few hours there will no remembrance of the past and no hope for the future that might have been." & _"All the works of man will be consumed in the great fire after which he was created." & _"And if there is a future for man, insensitive as he is, proud and defiant in his pursuit of power, let him resolve to live it lovingly, for he knows well how to do so." & _"Then he may say once more, 'Truly the light is sweet, and what a pleasant thing it is for the eyes to see the sun.'" - The Day the Earth Caught Fire Link to comment Share on other sites More sharing options...
Uten Posted February 2, 2008 Share Posted February 2, 2008 @rudi, Could be I misunderstood the task. The reason other approaches are faster is that each time you call _ArrayDelete a new array is created replacing the one you passed on to _ArrayDelete (just take a look at the source in the include file.. ). Creating arrays are expensive in every language I know of. The string approach suggested is probably fast enough if you have the power (memory and CPU). I tend to use low end computers where memory is an issue. Therfore I suggested a linked list like approach. Happy scripting Please keep your sig. small! Use the help file. Search the forum. Then ask unresolved questions :) Script plugin demo, Simple Trace udf, TrayMenuEx udf, IOChatter demo, freebasic multithreaded dll sample, PostMessage, Aspell, Code profiling Link to comment Share on other sites More sharing options...
randallc Posted February 2, 2008 Share Posted February 2, 2008 Func CheckFile($FullPath) Local $OK = True If stringinstring($FullPath) Then $OK = False ; ... several other testing done here. Return $OK EndFunc ;==>CheckFileRegards, Rudi.Hi, I agree with above. Your "CheckFile" is confusing, though! 1. You seem to be eliminating -all- the array items, because they will all have "$FullPath)" in them, won't they? - and if so, you are deleting them? 2. You can be using multiplr filters in the filter section, although I should be adding latest update to the thread post#1, so will do that soon. 3. Although it is not widely tested, th e"Exclude" criterion should work well too, I think with multiple filters; in any case, I would be interested to hear. Best, Randall ExcelCOM... AccessCom.. Word2... FileListToArrayNew...SearchMiner... Regexps...SQL...Explorer...Array2D.. _GUIListView...array problem...APITailRW Link to comment Share on other sites More sharing options...
randallc Posted February 2, 2008 Share Posted February 2, 2008 @rudi, Could be I misunderstood the task. The reason other approaches are faster is that each time you call _ArrayDelete a new array is created replacing the one you passed on to _ArrayDelete (just take a look at the source in the include file.. ). Creating arrays are expensive in every language I know of. The string approach suggested is probably fast enough if you have the power (memory and CPU). I tend to use low end computers where memory is an issue. Therfore I suggested a linked list like approach. Happy scriptingHi, You may be right about that, but _ArrayDelete is now all byref in 3.2.11.0; Func _ArrayDelete(ByRef $avArray, $iElement) If Not IsArray($avArray) Then Return SetError(1, 0, 0) Local $iUBound = UBound($avArray, 1) - 1 If Not $iUBound Then $avArray = "" Return 0 EndIf ; Bounds checking If $iElement < 0 Then $iElement = 0 If $iElement > $iUBound Then $iElement = $iUBound ; Move items after $iElement up by 1 Switch UBound($avArray, 0) Case 1 For $i = $iElement To $iUBound - 1 $avArray[$i] = $avArray[$i + 1] Next ReDim $avArray[$iUBound] Case 2 Local $iSubMax = UBound($avArray, 2) - 1 For $i = $iElement To $iUBound - 1 For $j = 0 To $iSubMax $avArray[$i][$j] = $avArray[$i + 1][$j] Next Next ReDim $avArray[$iUBound][$iSubMax + 1] Case Else Return SetError(3, 0, 0) EndSwitch Return $iUBound EndFunc ;==>_ArrayDeleteBest, Randall ExcelCOM... AccessCom.. Word2... FileListToArrayNew...SearchMiner... Regexps...SQL...Explorer...Array2D.. _GUIListView...array problem...APITailRW Link to comment Share on other sites More sharing options...
Uten Posted February 2, 2008 Share Posted February 2, 2008 (edited) Thanks for pointing that out Randall. I think I checked against 3.2.9.1. It's hard to keep up with the progress.. It still have to shuffle all elements beyond the elements removed. Obviously if you remove from the end (the last item) of the array all the time it will not have any impact at all. In this case a linked list is probably faster. But probably harder to implement. So the question is how much time to spend on coding versus waiting for the job to be done (with the current implementation or one of the other suggestions)? Linked list (nuttster's associative array sample looks like it's worth studying) samples are easy to locate in the examples forum. Edited February 2, 2008 by Uten Please keep your sig. small! Use the help file. Search the forum. Then ask unresolved questions :) Script plugin demo, Simple Trace udf, TrayMenuEx udf, IOChatter demo, freebasic multithreaded dll sample, PostMessage, Aspell, Code profiling Link to comment Share on other sites More sharing options...
rudi Posted February 4, 2008 Author Share Posted February 4, 2008 I thought I made pretty clear I was talking about _FileWriteFromArray. It calls FileWrite for each array element, which is a bad programming no matter how you look at it, and makes it pretty dam slow for big (or even decent sized) arrays, something that could be easily avoided concatenating the string in memory and writing to output file just once (or in big chunks, if it has to handle reaaaally huge arrays). Ah. I (wrong) expected that _FileWriteFromArray would do that. You mean that this function is just looping for the value count doing many, many FileWriteLine(Array[$i]) ?? <argh> When concatenating the array's values in RAM, how to avoid a "string length overflow"? ( AI3 Help for "String": Maximum string length is 2147483647 characters (but keep in mind that no line in an AutoIt script can exceed 4095 characters.) _FileReadToArray is OK? What is the fastest way to trim the value count of an array? After doing some testing in this case I copied ~7000 valid values out of ~19000 from ArrayA to ArrayB. As in the beginning I only know, that ArrayB <= ArrayB I have to "kick out" the values beyond the last valid one in ArrayB. _ArrayDelete seems to be quite slow, that was the reason why I gave a try to _FileWriteFromArray / _FileReadFromArray.... Thanks, Rudi. Earth is flat, pigs can fly, and Nuclear Power is SAFE! Link to comment Share on other sites More sharing options...
rudi Posted February 4, 2008 Author Share Posted February 4, 2008 Did you look at my methods? This one was the winner:Func CleanupArray(ByRef $MyArray) Local $Clean, $sTemp = "" For $Clean = 1 To $MyArray[0] Step 1 If CheckFile($MyArray[$Clean]) Then $sTemp &= $MyArray[$Clean] & "|" EndIf Next $MyArray = StringSplit(StringTrimRight($sTemp,1),"|");StringTrimRight removes last pipe delimeter EndFunc ;==>CleanupArray Yes, that's really fast! I'll change my code. and especially thanks for that comment upon STringTrimRight! Regards, Rudi. Earth is flat, pigs can fly, and Nuclear Power is SAFE! Link to comment Share on other sites More sharing options...
rudi Posted February 4, 2008 Author Share Posted February 4, 2008 (edited) Hi, I agree with above. Your "CheckFile" is confusing, though! This CheckFile wasn't my issue so I simlified the code very much and put in a mistake, it should look like this: Func CheckFile($FullPath) Local $OK = True If stringinstring($FullPath,"criteria") Then $OK = False ; ... several other testing done here. Return $OK EndFunc ;==>CheckFile 2. You can be using multiplr filters in the filter section, although I should be adding latest update to the thread post#1, so will do that soon. I cannot follow this sentence at all, sorry... 3. Although it is not widely tested, th e"Exclude" criterion should work well too, I think with multiple filters; in any case, I would be interested to hear. Once more, I miss completly what you want to tell to me... (my English... ) Thanks, Rudi. Edited February 4, 2008 by rudi Earth is flat, pigs can fly, and Nuclear Power is SAFE! Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now