xenoranger Posted October 1, 2012 Share Posted October 1, 2012 (edited) So, I am still a total newb when it comes to programming. I mostly write scripts. I wrote the following applet to filter the GPS data for a single unit and write each line to a new file. The problem is that it cycles through each line. When my files have over 1,000,000 lines, this can take days. Is there a faster way to just pull the lines that match certain criteria and write those to the new file? expandcollapse popup$xnFile = InputBox("File???", "Which file do you want to review?","C:") $xnUnit = Inputbox("Unit ID","What is the Unit ID you want?","Some Unit String") $xnTargetFile = $xnFile & " - Unit - " & $xnUnit & " - Parsed.csv" FileOpen($xnFile,0) FileWrite($xnTargetFile,"") FileOpen($xnTargetFile, 1) $i = 1 ProgressOn("Creating CSV","Extracting matching data.","",0,0,16) While $i > 0 $xnCurrentLine = FileReadLine($xnFile,$i) ;MsgBox(1,"",$xnCurrentLine) if stringlen($xnCurrentLine) > 1 Then if StringInStr($xnCurrentLine,$xnUnit & " $GPRMC") > 40 Then $xnDate = stringmid($xnCurrentLine,stringinstr($xnCurrentLine,"Verbose: 0 : ") + 13,10) $xnTime = stringmid($xnCurrentLine,StringInStr($xnCurrentLine,"4G?`") - 12,8) $xnGeoData = stringmid($xnCurrentLine,StringInStr($xnCurrentLine,"$GPRMC")) $xnTargetLine = $xnDate & "," & $xnTime & "," & $xnGeoData & @CRLF FileWriteLine($xnTargetFile,$xnTargetLine) EndIf $i = $i + 1 if round($i/1000000 * 100,1) < 90 Then ProgressSet(round($i/1000000 * 100,1),$i & " lines examined." & @cr & "Thank you for your patience.") Else ProgressSet(90,$i & " lines examined." & @cr & @cr & "Just a little longer... ..." & @CR & "Thank you for your patience.") EndIf Else $i = 0 EndIf WEnd ProgressOff() PS: in the ProgressSet line, the 1,000,000 comes from other people hating that it didn't have a progress bar that updates and the estimation of at least 1,000,000 lines in each log file. Edited October 1, 2012 by xenoranger Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted October 1, 2012 Moderators Share Posted October 1, 2012 xenoranger,Welcome to the AutoIt forum. You might be able to read the file into an array and manipulate it - that is faster than reading each line. However, I have never tried to do that with a 1M line file. Perhaps you could give it a try - look at _FileReadToArray in the Help file. Let us know if it throws an error or crashes - but if it succeeds please do NOT try and display the result with _ArrayDisplay as that would take forever! You might also be able to use a RegEx to extract the data directly if the required data had a suitable pattern to search for. Could you let us have an example (preferably less than 1M lines ) of a log file and indicate which lines you wish to extract. We can then see what might be possible. M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
guinness Posted October 1, 2012 Share Posted October 1, 2012 Also what is the filesize of that file? UDF List: _AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples... Updated: 22/04/2018 Link to comment Share on other sites More sharing options...
kylomas Posted October 1, 2012 Share Posted October 1, 2012 xenoranger, You are reading a file line by line in a loop using the file name. Per the doc If a filename is given rather than a file handle - the file will be opened and closed during the function call - for parsing large text files this will be much slower than using filehandles. You are opening the file for read but not getting a file handle. Change FileOpen($xnFile,0) to local $hfl = FileOpen($xnFile,0) then use the handle for reading like $xnCurrentLine = FileReadLine($hfl,$i) Incidentally, you do not need to specify the line number if you are reading the file sequentially (again, in the help file!). This change will speed up you script, however, processing the file as an array or string variable will obviously be much faster. The entire file can be read to a variable (either array or string) in one instruction. Good Luck, kylomas Forum Rules Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
kylomas Posted October 1, 2012 Share Posted October 1, 2012 xenoranger, One other thing that I noticed. You should create an output string within your search loop and write the output file from that string when you exit the loop. Filewriteline also does an open and close for each write. This may, or may not impact you depending on the volume of matches you find.. kylomas Forum Rules Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill Link to comment Share on other sites More sharing options...
xenoranger Posted October 2, 2012 Author Share Posted October 2, 2012 Also what is the filesize of that file?The file is 100,000 MB in size. I've been moved to other projects, so I'll have to come back to this one. I appreciate your help! Link to comment Share on other sites More sharing options...
jchd Posted October 2, 2012 Share Posted October 2, 2012 Searching a 100GB flat text file doesn't sound anything linke reasonable to me, particularly using an interpreted language. You GPS data almost certainly has some structure which you should use to build a database. Adequate indexing can then bring back queries results in seconds, not days. You can use SQLite for that. This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe hereRegExp tutorial: enough to get startedPCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta. SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt) Link to comment Share on other sites More sharing options...
xenoranger Posted October 29, 2012 Author Share Posted October 29, 2012 (edited) Sorry, jchd, I meant 100,000kb (100MB). Still, I do need to sort through super huge files. I did find out that someone else wrote an application in C++ that would output the lines matching criteria. As a result, I haven't advanced this project much further. Thank you guys for your help!! Edited October 29, 2012 by xenoranger Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now