Jump to content
Sign in to follow this  
xenoranger

Faster way to pull lines matching string?

Recommended Posts

xenoranger

So, I am still a total newb when it comes to programming. I mostly write scripts.

I wrote the following applet to filter the GPS data for a single unit and write each line to a new file. The problem is that it cycles through each line. When my files have over 1,000,000 lines, this can take days. Is there a faster way to just pull the lines that match certain criteria and write those to the new file?

$xnFile = InputBox("File???", "Which file do you want to review?","C:")
$xnUnit = Inputbox("Unit ID","What is the Unit ID you want?","Some Unit String")

$xnTargetFile = $xnFile & " - Unit - " & $xnUnit & " - Parsed.csv"
FileOpen($xnFile,0)
FileWrite($xnTargetFile,"")
FileOpen($xnTargetFile, 1)

$i = 1
ProgressOn("Creating CSV","Extracting matching data.","",0,0,16)
While $i > 0

$xnCurrentLine = FileReadLine($xnFile,$i)
;MsgBox(1,"",$xnCurrentLine)

if stringlen($xnCurrentLine) > 1 Then

     if StringInStr($xnCurrentLine,$xnUnit & " $GPRMC") > 40 Then
    
     $xnDate = stringmid($xnCurrentLine,stringinstr($xnCurrentLine,"Verbose: 0 : ") + 13,10)
    
     $xnTime = stringmid($xnCurrentLine,StringInStr($xnCurrentLine,"4G?`") - 12,8)
     $xnGeoData = stringmid($xnCurrentLine,StringInStr($xnCurrentLine,"$GPRMC"))
        
     $xnTargetLine = $xnDate & "," & $xnTime & "," & $xnGeoData & @CRLF
    
     FileWriteLine($xnTargetFile,$xnTargetLine)
    
    
     EndIf

     $i = $i + 1
    
     if round($i/1000000 * 100,1) < 90 Then
    
         ProgressSet(round($i/1000000 * 100,1),$i & " lines examined." & @cr & "Thank you for your patience.")
    
     Else
    
         ProgressSet(90,$i & " lines examined." & @cr & @cr & "Just a little longer... ..." & @CR & "Thank you for your patience.")
    
    
     EndIf
    
    
Else
     $i = 0

EndIf


WEnd
ProgressOff()

PS: in the ProgressSet line, the 1,000,000 comes from other people hating that it didn't have a progress bar that updates and the estimation of at least 1,000,000 lines in each log file.

Edited by xenoranger

Share this post


Link to post
Share on other sites
Melba23

xenoranger,

Welcome to the AutoIt forum. :)

You might be able to read the file into an array and manipulate it - that is faster than reading each line. However, I have never tried to do that with a 1M line file. Perhaps you could give it a try - look at _FileReadToArray in the Help file. Let us know if it throws an error or crashes - but if it succeeds please do NOT try and display the result with _ArrayDisplay as that would take forever! :D

You might also be able to use a RegEx to extract the data directly if the required data had a suitable pattern to search for. Could you let us have an example (preferably less than 1M lines ;)) of a log file and indicate which lines you wish to extract. We can then see what might be possible. :)

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
guinness

Also what is the filesize of that file?


UDF List:

 
_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_ArrayFilter/_ArrayReduce_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 22/04/2018

Share this post


Link to post
Share on other sites
kylomas

xenoranger,

You are reading a file line by line in a loop using the file name. Per the doc

If a filename is given rather than a file handle - the file will be opened and closed during the function call - for parsing large text files this will be much slower than using filehandles.

You are opening the file for read but not getting a file handle. Change
FileOpen($xnFile,0)
to
local $hfl = FileOpen($xnFile,0)
then use the handle for reading like
$xnCurrentLine = FileReadLine($hfl,$i)

Incidentally, you do not need to specify the line number if you are reading the file sequentially (again, in the help file!).

This change will speed up you script, however, processing the file as an array or string variable will obviously be much faster. The entire file can be read to a variable (either array or string) in one instruction.

Good Luck,

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites
kylomas

xenoranger,

One other thing that I noticed. You should create an output string within your search loop and write the output file from that string when you exit the loop. Filewriteline also does an open and close for each write. This may, or may not impact you depending on the volume of matches you find..

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites
xenoranger

Also what is the filesize of that file?

The file is 100,000 MB in size.

I've been moved to other projects, so I'll have to come back to this one. I appreciate your help!

Share this post


Link to post
Share on other sites
jchd

Searching a 100GB flat text file doesn't sound anything linke reasonable to me, particularly using an interpreted language.

You GPS data almost certainly has some structure which you should use to build a database. Adequate indexing can then bring back queries results in seconds, not days.

You can use SQLite for that.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
xenoranger

Sorry, jchd, I meant 100,000kb (100MB). Still, I do need to sort through super huge files.

I did find out that someone else wrote an application in C++ that would output the lines matching criteria. As a result, I haven't advanced this project much further.

Thank you guys for your help!!

Edited by xenoranger

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×