JohnyX Posted March 26, 2019 Posted March 26, 2019 Hello, i have a large file ~2 million lines (max 20 characters each line) and i need to recursively loop through all the lines and do some calculations(each one's lenght and total lenght and so on). #include <Array.au3> For $i = 1 to 2000000 step +1 $readline1 = FileReadLine(@ScriptDir & "\data.csv", $i) For $z = 1 to 2000000 step +1 $readline2 = FileReadLine(@ScriptDir & "\data.csv", $z) ;ToolTip($readline1 & $z &"-"& $readline2) ;Calculate Next Next #include <Array.au3> $arr = FileReadToArray(@ScriptDir & "\data.csv") For $b = 1 to 2000000 step +1 For $v = 1 to 2000000 step +1 ;ToolTip($arr[$b] &"-"& $arr[$v]) ;Do stuff Next Next _ArrayDisplay($arr) The problem is that it takes to long to loop through all those lines millions times. Is there a better approach to this? Thank you.
Developers Jos Posted March 26, 2019 Developers Posted March 26, 2019 On 3/26/2019 at 6:59 PM, JohnyX said: Is there a better approach to this? Expand Before even trying to answer that: Why do you need to loop through the whole file for each record? Jos SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past.
JohnyX Posted March 26, 2019 Author Posted March 26, 2019 (edited) Thanks for your answer. Like i said, i need to compare and perform calculations for each record. As a last resort i will import the csv to excel but i will have to split the file because excel can handle less over 1 million rows. Edited March 26, 2019 by JohnyX
Nine Posted March 26, 2019 Posted March 26, 2019 Seem to me that the embedded loop is wrong, You understand that you are looping 2,000,000 * 2,000,000. So you are handling at 4,000,000,000,000 rows !!! “They did not know it was impossible, so they did it” ― Mark Twain Reveal hidden contents Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Debug Messages Monitor UDF Screen Scraping Round Corner GUI UDF Multi-Threading Made Easy Interface Object based on Tag
JohnyX Posted March 26, 2019 Author Posted March 26, 2019 Just realized that:) I'll try to find a better approach. Thanks.
Developers Jos Posted March 26, 2019 Developers Posted March 26, 2019 Hence my question.... and you are still utterly vague on what it is you need to do with the content of the record so how can you expect to get any proper help? Jos SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past.
BrewManNH Posted March 26, 2019 Posted March 26, 2019 Going by just the script you posted, this is the fastest way to do it. You read through the whole file 2,000,000 times, because of your 2 loops. Do you REALLY need to do that? #include <Array.au3> $hFile1 = FileOpen(@ScriptDir & "\data.csv") While 1 $readline1 = FileReadLine($hFile1) If @error = -1 Then ExitLoop ;ToolTip($readline1 & $z &"-"& $readline2) ;Calculate WEnd If you use a file handle instead of the file name in the FileReadLine, it's much faster, plus FileReadLine will automatically increase by one line every time you read one line. If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag GudeHow to ask questions the smart way! Reveal hidden contents I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from. Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays. - ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script. - Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label. - _FileGetProperty - Retrieve the properties of a file - SciTE Toolbar - A toolbar demo for use with the SciTE editor - GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI. - Latin Square password generator
SlackerAl Posted March 27, 2019 Posted March 27, 2019 I'd also suggest you check out: FileReadToArray() ; and for delimited data _FileReadToArray() In my experience with large files it is always faster to read the entire file into an array and then work with the array. Problem solving step 1: Write a simple, self-contained, running, replicator of your problem.
JohnyX Posted March 27, 2019 Author Posted March 27, 2019 Hi, sorry for late reply and for my vague previous answers, i was on a rush. This is what my script should do: #include <Array.au3> For $i = 1 to 2000000 step +1 $readline1 = FileReadLine(@ScriptDir & "\data.csv", $i) For $z = 1 to 2000000 step +1 $readline2 = FileReadLine(@ScriptDir & "\data.csv", $z) ToolTip($readline1 & $z &"-"& $readline2) ;Calculate If StringLen($readline1) <> StringLen($readline2) Then $len = StringLen($readline1 & $readline2) If $len >= 21 And StringLeft($readline1, 1) = "t" Then FileWriteLine(@ScriptDir & "\output.txt", $readline1&$readline2 &"-"& $len) ElseIf $len < 21 And StringRight($readline2,1) = "v" Then FileWriteLine(@ScriptDir & "\output.txt", $readline1&$readline2 &"-"& $len) EndIf EndIf ;Done Next Next I am looping so many times because i can't think of another way to get the total lenght and the first and last character of the two records.
Nine Posted March 27, 2019 Posted March 27, 2019 It is very hard to understand what is the purpose of this script. Could you show us like 10 lines of data.csv and the result you want to achieve in output.txt ? “They did not know it was impossible, so they did it” ― Mark Twain Reveal hidden contents Block all input without UAC Save/Retrieve Images to/from Text Monitor Management (VCP commands) Tool to search in text (au3) files Date Range Picker Virtual Desktop Manager Sudoku Game 2020 Overlapped Named Pipe IPC HotString 2.0 - Hot keys with string x64 Bitwise Operations Multi-keyboards HotKeySet Recursive Array Display Fast and simple WCD IPC Multiple Folders Selector Printer Manager GIF Animation (cached) Debug Messages Monitor UDF Screen Scraping Round Corner GUI UDF Multi-Threading Made Easy Interface Object based on Tag
BrewManNH Posted March 27, 2019 Posted March 27, 2019 On 3/27/2019 at 3:57 PM, JohnyX said: two records. Expand What are you doing in comparing the 2? From what I can see, you read the first line of the file, then compare it to the 2,000,000 other lines in the file. After that you read the second line of the file, and proceed to loop through 2,000,000 lines of the file again to compare that one. Then you write those 4 trillion lines of seemingly useless data to the another file. FrancescoDiMuro 1 If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag GudeHow to ask questions the smart way! Reveal hidden contents I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from. Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays. - ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script. - Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label. - _FileGetProperty - Retrieve the properties of a file - SciTE Toolbar - A toolbar demo for use with the SciTE editor - GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI. - Latin Square password generator
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now