Jump to content

Modified _FileCountLines()


Spiff59
 Share

Recommended Posts

It doesn't matter what the default buffer is. I didn't look. The point is I've seen a lot of code, esp. in C/C++ that uses a 1 kb buffer on the stack as in char buf[1024] type of thing. My strategy would be if file size is trivial, suck the whole thing into memory. If not, then pick some buffer size. What is the "best" buffer size I would extrapolate from empirical evidence(iow, trial and error with a phd.)

If it turns out the buffer size never matters(which is difficult to imagine) then use a small buffer. But the optimization that's likely the most useful would be, why is this person counting the lines in a file? Are they doing it 200 times per second? Or once in the whole run of the app to find out how many lines are in an ini file? I would have general text file count lines and industrial strength. But just sucking the whole file up and letting it fail if the file is too big seems a bit sloppy. There should at least be some size check. If file size < heuristic then do it at once, else block read it.

Edited by MilesAhead
Link to comment
Share on other sites

Of course StringInStr is way faster for "flat match" like this. I know better but half my fucking brain is crunching regexp by itself so that's what gets typed first :)

About buffer size, I didn't experiment at all. 8Mb looked decent. IMHO varying buffer size won't gain much in the general, average case.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

To me when you use the term "general case" it sounds more like trivial case. In the truly trivial case I would read a line inside a while loop and increment a variable. If the last line is blank I would count it or not according to my preference. Otherwise, since we are talking a library function and not something I whip together for a single script, it seems the function should be more robust and have some reasons for the things it does. One reason would be knowing the file size. I don't understand the aversion to taking advantage of the function that provides the information.

I don't think there really is an average case in this case. If I need to know the number of lines in an ini file, my while loop is fine. If I need some optimization because for some strange reason I'm going to parse the library of congress data using scripting, then an optimized version should be called. I don't get the fascination with benchmarks for the trivial case. If it's a big job that needs optimizing then get some facts about the job.

Link to comment
Share on other sites

Of course StringInStr is way faster for "flat match" like this. I know better but half my fucking brain is crunching regexp by itself so that's what gets typed first :)

You meant StringReplace() ?

Edit: Nevermind, I think you were refering to Malkey's post.

If it turns out the buffer size never matters(which is difficult to imagine) then use a small buffer.

No I'm pretty sure it matters. I never verified it but we've talked about it a couple times though out this thread. I think Tlem said he hit 170mb when it fails. Edited by Beege
Link to comment
Share on other sites

@MilesAhead

AutoIt does not address necessarily to an elite, thus the general optimization is not still within the reach of each.

Such as it is indicated :

the software was primarily intended to create automation scripts

Usage

AutoIt can be used to produce utility software for Microsoft Windows and automate common tasks, such as website monitoring, network monitoring, disk defragging and backup. It is also used to simulate application users, whereby an application is driven by an AutoIt script in place of manual application control during software testing. It is also commonly used for developing Computer game bots, for automating in-game tasks.

As we know it all, an automation is designed to do repetitive tasks. When in a script, a function is not hurriedly optimized, it burdens the whole program. If for you, the fact of winning 2 seconds represents nothing, when you have to repeat the operation 5000 times per day, it is three hours of lost!!!

I think that your computer has more interesting things to make that to waste time on a not optimized function.

In your daily work, you certainly not have this kind of inconvenience, but you are not alone to use AutoIt, thus think not only of you.

Edited by Tlem

Best Regards.Thierry

Link to comment
Share on other sites

Yeah, StringReplace not StringInStr. I was exhausted by too long driving.

When we talk algorithm, what we refer to the "general case" is "all cases that _may_ occur" (regardless of frequency). The "practical case" refers to what you can expect on average real-world use. That may highly depend on your domain you're doing the job for.

Think of a string manipulation fonction that would crash or give erroneous result(s) when two x00 are found in a row in the subject string. If the definition of "string" you refer to allows presence of x00, then it's definitely not useable in the general case. In practice, you may consider things differently.

Now, the "buffer size" question is useless nitpicking. Please ask yourself how frequently _you_ have found yourself compelled to using this function! Counting lines without reading them is very rarely meaningful except when you need to get an idea of how large a project is, or situations like that. IMHO, in such contexts, the actual, precise number is essentially irrelevant (only the order of magnitude matters) and that the function is highly optimized or not is also a non sequitur.

Given the rare need to use that function, it would be ridiculous to even consider adding code to "optimize" buffer size to the filesize to be processed in order to spare an unsignificant fraction of time. Higher optimization would also require to decide if the media is HD or SSD, which sector- or block-size that devices uses and which cache size the device has, probability of parts of the file being in OS cache, rotation speed of HD (probably the most important factor here) ... (I'm joking!)

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Please ask yourself how frequently _you_ have found yourself compelled to using this function!

Frequently for me. At least twice every ten years.
Link to comment
Share on other sites

I don't think I ever have, because 10 times out of 10 I need to read the file anyway.

UDF List:

 
_AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples...

Updated: 22/04/2018

Link to comment
Share on other sites

About the frequency of the use of this function, all is a point of view.

Personally, if a work can be do faster, why not to take advantage of it.

Having said that, with the new functions, the size of the file is not a problem any more. :)

Now, even if I use this function rarely, I have an alternative faster than mine for big file, so for me this subject will not have been useless. ^^

Best Regards.Thierry

Link to comment
Share on other sites

  • 6 years later...

Good to know.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...