BlueScreen Posted January 17, 2006 Share Posted January 17, 2006 (edited) Hi Guys, First, thanks for your help. I have written a function which receives as parameters 2 files and remove from the Source file the Data which exists on the DataFile. All works fine, but very low (5 minutes). Is my algorithm not efficient enough? Here is what I did inside the function: 1) Read the Data file into an array using _FileReadToArray 2) Read the SRC file into another array using _ FileReadToArray 3) Openning a temp file 4) Running in a Loop (as many lines there is in the SRC file) and checking (for each SRC line) using StringInStr(in a while) if the line in the SRC file, contains strings from the Data file. 5) If all the lines in the Data file was read and there is no match, than the line can be written in a temp file. 6) All this is running till there is no more lines in the SRC file. 7) Closing the temp file, deleting the SRC, Moving the Temp file to SRC 8) Arriving here, there are no Strings from the DATA file into the SRC file. Continuing with the next Data file Now, my SRC file contains around 8000 lines. I have also 6 Data file of 30 lines each. So, in order to go over all the lines (around 1440000), it takes about 5 minutes Is there a way to do it better? Here is my code: For $w=1 to $NumOfDataFiles RemoveDatafromSrc ($SRCfile,$Data[$w-1][0]) Next expandcollapse popupDim $Temp[1] Dim $SrcValue Dim $DataValue Global $LineInData=1 Global $TempID=1 #include <file.au3> Func RemoveDatafromSrc ($SrcFile,$DataFile) If Not _FileReadToArray($DataFile,$DataValue) Then Exit If Not _FileReadToArray($SrcFile,$SrcValue) Then Exit For $e=1 to $DataValue[0] $DataValue[$e]=StringLeft($DataValue[$e],4); I need only the 4 left chars Next $TmpFile = FileOpen ("temp.tmp",2) For $LineInSrc=1 to $SrcValue[0]; Lines in SRC While $LineInData <= $DataValue[0]; for each Src line, need to check all DATA line If StringinStr ($SrcValue[$LineInSrc], $DataValue[$LineInData] & ":") <> 0 then ;Data line found ExitLoop EndIf If $LineInData=$DataValue[0] Then FileWriteLine($TmpFile,$SrcValue[$LineInSrc] & @LF) $LineInData=1 ExitLoop Else $LineInData=$LineInData+1 EndIf WEnd Next FileClose ($TmpFile) FileDelete ($SrcFile) FileMove ("temp.tmp", $SrcFile,1) EndFunc Edited January 17, 2006 by BlueScreen Link to comment Share on other sites More sharing options...
flyingboz Posted January 17, 2006 Share Posted January 17, 2006 (edited) As I recall, the _Array* UDFs weren't the fastest things in the world back when I looked at them. Try to get the files into memory and operate on them there. I use (untested / no error checking pseudo pseudo code!) $fh = FileOpen($fn,0) $contents = FileRead($fh,FileGetSize($fn)); use handles instead of filenames FileClose($fh) ;Depending on filesize, StringSplit() or StringLeft() may be faster; i.e. ;test in your environment and let us know what you find. $line = StringSplit($contents,@LF) For $i = 1 to $line[0] ;_DoStuff($line[$i]) Next or : While StringLen($contents) > 0 $line = StringLeft($contents,StringInStr($contents,@LF)) $contents = StringTrimLeft($contents,StringInStr($contents,@LF) _DoStuff($line) Wend Wend Edited January 17, 2006 by flyingboz Reading the help file before you post... Not only will it make you look smarter, it will make you smarter. Link to comment Share on other sites More sharing options...
BlueScreen Posted January 17, 2006 Author Share Posted January 17, 2006 As I recall, the _Array* UDFs weren't the fastest things in the world back when I looked at them.You mean the "_FileReadToArray"? I don't get it... Where to put the StringSplit? Link to comment Share on other sites More sharing options...
flyingboz Posted January 18, 2006 Share Posted January 18, 2006 You mean the "_FileReadToArray"? mebbe i should have said _*array*(*) ...clearer? I don't get it... Where to put the StringSplit? In Example 1, the StringSplit is there - example shows using the builtin Function to create an array of a variable. In example 2, but StringInStr(), StringLeft() and StringTrimLeft() are used to get the data stringsplit is not required - another way of skinning the same cat, maybe it's faster, maybe it ain't. If your data is fixed length (e.g.. each line is 80 chars long), you could use something like this: $line_length = 80 $file_pos While $file_pos < $file_size $line = StringMid($line_num,$line_length) _DoStuff($line) $line_pos = $line_pos + $line_length + 1 Wend Reading the help file before you post... Not only will it make you look smarter, it will make you smarter. Link to comment Share on other sites More sharing options...
seandisanti Posted January 18, 2006 Share Posted January 18, 2006 In Example 1, the StringSplit is there - example shows using the builtin Function to create an array of a variable. In example 2, but StringInStr(), StringLeft() and StringTrimLeft() are used to get the data stringsplit is not required - another way of skinning the same cat, maybe it's faster, maybe it ain't. If your data is fixed length (e.g.. each line is 80 chars long), you could use something like this: $line_length = 80 $file_pos While $file_pos < $file_size $line = StringMid($line_num,$line_length) _DoStuff($line) $line_pos = $line_pos + $line_length + 1 Wend Sorry, i've not had any negative experiences with the array udf's myself, and my solution uses them also; but should be considerably faster, and i'll explain why. First, here's my code to replace your function: Dim $SrcValue Dim $DataValue #include <file.au3> #include<array.au3> Func RemoveDatafromSrc ($SrcFile,$DataFile) If Not _FileReadToArray($DataFile,$DataValue) Then Exit If Not _FileReadToArray($SrcFile,$SrcValue) Then Exit $StringOfSrc = _ArrayToString($SrcValue,"$",1,$SrcValue[0]) For $e=1 to $DataValue[0] if StringInStr($StringOfSrc,StringLeft($DataValue[$e],4)) Then For $f = 1 To $SrcFile[0] If StringInStr($SrcValue[$f],StringLeft($DataValue[$e],4) & ":") Then _ArrayDelete($SrcValue,$f) Next EndIf Next FileDelete ($SrcFile) _FileWriteFromArray($SrcFile,$SrcValue,1,UBound($SrcValue)) EndFunc Now a few things that i did to speed it up. First i took out your For loop that was replacing the DataValue array elements with only the first 4 characters of each element. That's making alot of unnecessary assignments because we can search for just the substring that you want and ignore the rest of the line without making any assignments. Then, you were searching each line of the the data for each line of the source, which works out to Source(lines) X Data(lines) comparisons. To trim the fat on that one, i made a string from all of the elements in the Source data, and only did line by line comparisons if the data i'm searching for was already confirmed to be in the data searching by the StringInStr(). So worst case scenario, with 8000 lines of source, and 6000 lines of data, if there are NO values that are in both, yours is doing 48,000,000 evaluations, where mine does 6000. I also changed the way that the output is created, removing the need for a temp file. By deleting the lines that contain the data we don't want from the array, we're creating an array of good data which at the end would contain all of the data that we want in the end file. So continuing the example above, if there were NO values present in each of the arrays with the sizes given, you'd be writing to the temp file 8000 times, then copying that file over the original source. The way that i've changed it, there is a single file write at the end, regardless of how many hits there were. The changes i've made should be enough to see a good cut in execution time, but this is not the only way that you could achieve the same result. Link to comment Share on other sites More sharing options...
herewasplato Posted January 18, 2006 Share Posted January 18, 2006 @cameronsdad,Take a look at my post in this related (duplicate?) thread:http://www.autoitscript.com/forum/index.ph...ndpost&p=140602The first post in that other thread mentions a rather simple task:File in:This is line number 1This is line number 2This is line number 3File out:This is line number 1This is line number 3Task = remove all lines with 'This is line number 2' and the CR, LF or CRLFFor that task, I posted this possible solution:http://www.autoitscript.com/forum/index.ph...ndpost&p=140987I see no reason/need to use arrays here, just loop thru all of the StringReplace statements that you want and output the file once. Am I missing something? Is the file too big to put into one variable? [size="1"][font="Arial"].[u].[/u][/font][/size] Link to comment Share on other sites More sharing options...
seandisanti Posted January 18, 2006 Share Posted January 18, 2006 @cameronsdad,Take a look at my post in this related (duplicate?) thread:http://www.autoitscript.com/forum/index.ph...ndpost&p=140602The first post in that other thread mentions a rather simple task:File in:This is line number 1This is line number 2This is line number 3File out:This is line number 1This is line number 3Task = remove all lines with 'This is line number 2' and the CR, LF or CRLFFor that task, I posted this possible solution:http://www.autoitscript.com/forum/index.ph...ndpost&p=140987I see no reason/need to use arrays here, just loop thru all of the StringReplace statements that you want and output the file once. Am I missing something? Is the file too big to put into one variable?I actually thought of the same approach, but decided against it, as lines in the source will almost definitely vary in length (that is actually just an assumption on my part, that the lengths will vary), and because he wants to remove the whole line, that could work out to more work. That was the way i was thinking of going at first, but decided against it because i don't know what his data looks like, and wanted to make sure that my solution worked without much follow up. That's also why i wanted to make sure to explain to him that the method suggested wasn't the only way to do it, but could give him ideas to better tune his script to his specific data. Link to comment Share on other sites More sharing options...
herewasplato Posted January 19, 2006 Share Posted January 19, 2006 (edited) I actually thought of the same approach, but......you were brighter than that.Apparently I payed too much attention to the input/output in the first post of that other thread without noticing that string being searched for could be (and probably is) a subset of one line in the input file and not the entire line itself... chased the wrong rabbit again! Edited January 19, 2006 by herewasplato [size="1"][font="Arial"].[u].[/u][/font][/size] Link to comment Share on other sites More sharing options...
randallc Posted January 19, 2006 Share Posted January 19, 2006 Hi, Once again, this is way, way, quicker with DOS if the files are large in particular. You can get the second file lines into a long string, or tell DOS the file with the exclusion lines. See "_DeleteFoundLineDOS" script in link from my signature, either dOsComs or my bookmarks. Best, Randall ExcelCOM... AccessCom.. Word2... FileListToArrayNew...SearchMiner... Regexps...SQL...Explorer...Array2D.. _GUIListView...array problem...APITailRW Link to comment Share on other sites More sharing options...
seandisanti Posted January 19, 2006 Share Posted January 19, 2006 Hi,Once again, this is way, way, quicker with DOS if the files are large in particular.You can get the second file lines into a long string, or tell DOS the file with the exclusion lines.See "_DeleteFoundLineDOS" script in link from my signature, either dOsComs or my bookmarks.Best, Randalllooking at your approach, i'm not sure how it would be faster to parse the 6000 4 character strings into exclude parameters then check each line in the new file for each of those parameters. could you write up an example using your UDF's to do as you're suggesting? I'm not disagreeing that your way may be faster, i just don't see a way to implement it for this situation that would be faster, and would be interested to see it in action. Link to comment Share on other sites More sharing options...
randallc Posted January 19, 2006 Share Posted January 19, 2006 (edited) hey, Did you try the Example I have linked to? Alternately, you would still need to nominate your main file and your delete file; try "_DOSDeleteFoundLineEx2.au3" Best, Randall ;_DOSDeleteFoundLineEx2.au3 ;to Delete all file lines containing any of Data1, Data2 ...etc [separated in the string by spaces; or.. ; Else look into "findstr" in Dos and retrieve the strings to avoid from a file instead (/G:[Filename]) ; 80Mb file in 40secs #include<DOSComs.au3> ;$s_FileOpen1-FileOpenDialog("Choose file",@ScriptDir,"Images (*.jpg;*.bmp)", 1 + 4 ) $s_DeletemarkerStringsFile=@ScriptDir&"\DeleteLines.txt" $s_DeleteFile=@ScriptDir&"\Table.txt" _DeleteFound($s_DeleteFile,$s_DeletemarkerStringsFile) func _DeleteFound($s_DeleteFile,$s_DeletemarkerStringsFile) $s_FileOpen=FileOpen($s_DeletemarkerStringsFile) While 1 $s_DelString &= FileReadLine($s_FileOpen)&" " If @error = -1 Then ExitLoop WEnd FileClose($s_FileOpen) ;$s_Exclude="Data1 Data2 Data3 Data4 Data5 Data6 Data7 Data8 Data9 Data10 Data11" _DeleteFoundLineDOS($s_DeleteFile,$s_DelString) if @error then MsgBox(0,"","Error, So FileName not Exists="&@error) Else RunWait("Notepad.exe " & @ScriptDir&"\Table1.txt",@ScriptDir,@SW_SHOW) EndIf EndFunc ;==>_DeleteFound ExitfileLineDelete2.au3 Edited January 19, 2006 by randallc ExcelCOM... AccessCom.. Word2... FileListToArrayNew...SearchMiner... Regexps...SQL...Explorer...Array2D.. _GUIListView...array problem...APITailRW Link to comment Share on other sites More sharing options...
BlueScreen Posted January 19, 2006 Author Share Posted January 19, 2006 Hi Cameronsdad, Thanks Thanks Thanks for your support. Regarding your code you have posted: Dim $SrcValue Dim $DataValue #include <file.au3> #include<array.au3> Func RemoveDatafromSrc ($SrcFile,$DataFile) If Not _FileReadToArray($DataFile,$DataValue) Then Exit If Not _FileReadToArray($SrcFile,$SrcValue) Then Exit $StringOfSrc = _ArrayToString($SrcValue,"$",1,$SrcValue[0]) For $e=1 to $DataValue[0] if StringInStr($StringOfSrc,StringLeft($DataValue[$e],4)) Then For $f = 1 To $SrcFile[0] If StringInStr($SrcValue[$f],StringLeft($DataValue[$e],4) & ":") Then _ArrayDelete($SrcValue,$f) Next EndIf Next FileDelete ($SrcFile) _FileWriteFromArray($SrcFile,$SrcValue,1,UBound($SrcValue)) EndFuncDoes this line shouldn't be For $f = 1 To $SrcValue[0] instead of For $f = 1 To $SrcFile[0]? Also, it doesnt seem to work I cannot see why. Attached my code, my source file and my data file C:\parser.au3 (20) : ==> Array variable has incorrect number of subscripts or subscript dimension range exceeded.: If StringInStr($SrcValue[$f],StringLeft($DataValue[$e],4) & ":") Then _ArrayDelete($SrcValue,$f) If StringInStr(^ ERROR Helllllppppppppp source.txtdata.txt Link to comment Share on other sites More sharing options...
herewasplato Posted January 19, 2006 Share Posted January 19, 2006 BlueScreen,This is one line from your data file:1111: 0000 0000 0000 0000 0000 0000 0000 0000That one line is also in your source file:1111: 0000 0000 0000 0000 0000 0000 0000 0000For each complete line in the data file - you want to remove that entire line from the source file. Right?Please let me know.... [size="1"][font="Arial"].[u].[/u][/font][/size] Link to comment Share on other sites More sharing options...
BlueScreen Posted January 19, 2006 Author Share Posted January 19, 2006 Exactly. The issue is that suggesting that <<1111: 0000 0000 0000 0000 0000 0000 0000 0000>> is a line in the data file and <<2222: 1111 0000 0000 0000 0000 0000 0000 0000>> is a line in the source file, I will NOT want to remove it from the source file, since what's interesting me is the address (1111) and not all the stuff after it. This is why I have also added the ":" to the StringInStr Link to comment Share on other sites More sharing options...
herewasplato Posted January 19, 2006 Share Posted January 19, 2006 (edited) Edit: The code below now works, but only for files that terminates each line of data with CRCRLF like the source file from post 12 seems to do - as shown by SciTE with EOL turned on. [Thanks Valik for your help on this, it was driving me crazy - I know, short trip.];read the entire contents of the source file into the variable $SourceInfo = FileRead('c:\temp\source.txt', FileGetSize('c:\temp\source.txt')) ;open the data file $DataFile = FileOpen("c:\temp\data.txt", 0) ;read in lines of the data file until the EOF is reached While 1 $ReplaceIt = FileReadLine($DataFile) & @CR & CRLF If @error = -1 Then ExitLoop ;MsgBox(0,"To be replaced",$ReplaceIt) $SourceInfo = StringReplace($SourceInfo, $ReplaceIt, "") MsgBox(0, "Lines replaced", @extended) WEnd FileClose($DataFile) FileOpen('c:\temp\source.txt', 2) FileWrite('c:\temp\source.txt', $SourceInfo) FileClose('c:\temp\source.txt')It should be faster than using arrays. Not that I'm against using arrays, but the original post asked for faster code. This method should be much faster. The code above is just to show the concept - you need to add more error checking and use filehandles where I used full paths. Edited January 23, 2006 by herewasplato [size="1"][font="Arial"].[u].[/u][/font][/size] Link to comment Share on other sites More sharing options...
seandisanti Posted January 21, 2006 Share Posted January 21, 2006 Hi Cameronsdad, Thanks Thanks Thanks for your support. Regarding your code you have posted: Does this line shouldn't be For $f = 1 To $SrcValue[0] instead of ? Also, it doesnt seem to work I cannot see why. Attached my code, my source file and my data file C:\parser.au3 (20) : ==> Array variable has incorrect number of subscripts or subscript dimension range exceeded.: If StringInStr($SrcValue[$f],StringLeft($DataValue[$e],4) & ":") Then _ArrayDelete($SrcValue,$f) If StringInStr(^ ERROR Helllllppppppppp you're right about $SrcValue instead of File, sorry on that. What's going on is that as lines are removed, the UBound of the array changes, but the end value of the for loop doesn't. so say once you remove a single line, the last iteration of the for loop will fail. poor practice on my side there. what we should do, is change it to a while loop instead of a for. like so: Dim $SrcValue Dim $DataValue #include <file.au3> #include<array.au3> Func RemoveDatafromSrc ($SrcFile,$DataFile) If Not _FileReadToArray($DataFile,$DataValue) Then Exit If Not _FileReadToArray($SrcFile,$SrcValue) Then Exit $StringOfSrc = _ArrayToString($SrcValue,"$",1,$SrcValue[0]) For $e=1 to $DataValue[0] if StringInStr($StringOfSrc,StringLeft($DataValue[$e],4)) Then $f = 1 While $f <= UBound($SrcValue) If StringInStr($SrcValue[$f],StringLeft($DataValue[$e],4) & ":") Then _ArrayDelete($SrcValue,$f) Else $f = $f + 1 EndIf WEnd EndIf Next FileDelete ($SrcFile) _FileWriteFromArray($SrcFile,$SrcValue,1,UBound($SrcValue)) EndFunc Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now