AlmarM Posted October 9, 2009 Share Posted October 9, 2009 Hi, I needed to count a certain words in a file (7,4 mb~) and I made a little script of it. $Fod = FileOpenDialog("Open .txt", @DesktopDir, "Text Files (*.txt)") $word = 0 $word_str = "test" $fo = FileOpen($fod, 0) $read = FileRead($fo) $spl = StringSplit($read, Chr(10)) For $i = 1 To $spl[0] $Math = (100 / $spl[0]) * $i $readline = FileReadLine($fo, $i) ToolTip(Round($Math, 2) & "%", 0, 0) If StringInStr($readline, $word_str) Then $word += 1 EndIf Next FileClose($fo) FileWrite(@DesktopDir & "\word_count.txt", $word) MsgBox(0, "", "'word' found: " & $word) Its a bit slow, it takes like 4 hours +~ to search all the lines. Can anyone tweak my script so its a bit faster? AlmarM Minesweeper A minesweeper game created in autoit, source available. _Mouse_UDF An UDF for registering functions to mouse events, made in pure autoit. 2D Hitbox Editor A 2D hitbox editor for quick creation of 2D sphere and rectangle hitboxes. Link to comment Share on other sites More sharing options...
Valuater Posted October 9, 2009 Share Posted October 9, 2009 (edited) ***NOT TESTED $Fod = FileOpenDialog("Open .txt", @DesktopDir, "Text Files (*.txt)") $word_count = 0 $word_str = "test" $fo = FileOpen($Fod, 0) $read = FileRead($fo) StringReplace($read, $word_str, "") $word_count = @extended MsgBox(0, "", "'word' found: " & $word_count) 8) Edited October 9, 2009 by Valuater Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted October 9, 2009 Moderators Share Posted October 9, 2009 AlmarM, I would do it this way: $Fod = FileOpenDialog("Open .txt", @DeskTopDir, "Text Files (*.txt)") $word_str = "test" $aArray = StringRegExp(FileRead($Fod), "(?i)(" & $word_str & ")", 3) If IsArray($aArray) Then MsgBox(0, "", $word_str & " found " & UBound($aArray) & " times") Else MsgBox(0, "", $word_str & " not found") EndIf M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
UEZ Posted October 9, 2009 Share Posted October 9, 2009 What about this method: ;coded by UEZ #include <WinAPI.au3> Global $nBytes $Fod = FileOpenDialog("Open .txt", @DesktopDir, "Text Files (*.txt)") $word = 0 $word_str = "test" $size = FileGetSize($Fod) $tBuffer = DllStructCreate("byte[" & $size & "]") $hFile = _WinAPI_CreateFile($Fod, 2, 2) _WinAPI_SetFilePointer($hFile, 0) _WinAPI_ReadFile($hFile, DllStructGetPtr($tBuffer), $size, $nBytes) _WinAPI_CloseHandle($hFile) $sText = BinaryToString(DllStructGetData($tBuffer, 1)) $count = StringReplace($sText, $word_str, $word_str) $numreplacements = @extended MsgBox(0, "", "'word' found: " & $numreplacements) UEZ Please don't send me any personal message and ask for support! I will not reply! Selection of finest graphical examples at Codepen.io The own fart smells best! ✌Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!¯\_(ツ)_/¯ ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ Link to comment Share on other sites More sharing options...
AlmarM Posted October 9, 2009 Author Share Posted October 9, 2009 Tested all methods, works fine! Thank you Only, with my scan, it'll find a certain word '5111' times. With these scans '183920' times. These scans are correct, right? Minesweeper A minesweeper game created in autoit, source available. _Mouse_UDF An UDF for registering functions to mouse events, made in pure autoit. 2D Hitbox Editor A 2D hitbox editor for quick creation of 2D sphere and rectangle hitboxes. Link to comment Share on other sites More sharing options...
Valuater Posted October 9, 2009 Share Posted October 9, 2009 5k and 183k is a major difference. If you used my original before the edit, it only replace space.... be sure to test the one that is there now. Did all 3 example scripts find 183k plus? 8) Link to comment Share on other sites More sharing options...
AlmarM Posted October 9, 2009 Author Share Posted October 9, 2009 (edited) 5k and 183k is a major difference. If you used my original before the edit, it only replace space.... be sure to test the one that is there now. Did all 3 example scripts find 183k plus?8)All same results.Results:Mine: 5111Valuater: 183920Melba: 183920UEZ: 182920 Edited October 9, 2009 by AlmarM Minesweeper A minesweeper game created in autoit, source available. _Mouse_UDF An UDF for registering functions to mouse events, made in pure autoit. 2D Hitbox Editor A 2D hitbox editor for quick creation of 2D sphere and rectangle hitboxes. Link to comment Share on other sites More sharing options...
Valuater Posted October 9, 2009 Share Posted October 9, 2009 Well, if 3 different scripts from 3 capable people come up with the same number... it must be correct. 8) Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted October 9, 2009 Moderators Share Posted October 9, 2009 Val,Please be assured that I take "capable" as a compliment of the highest order! M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
UEZ Posted October 9, 2009 Share Posted October 9, 2009 (edited) Another interessting aspect is the benchmark of these 3 codes: expandcollapse popup#include <Timers.au3> #include <WinAPI.au3> Global $nBytes, $Fod, $word_count Global $word_str = "test" $Fod = FileOpenDialog("Open .txt", @DesktopDir, "Text Files (*.txt)") $bench = _Timer_Init() $c3 = Bench3() $bench1 = Round(_Timer_Diff($bench), 4) $bench = _Timer_Init() $c2 = Bench2() $bench2 = Round(_Timer_Diff($bench), 4) $bench = _Timer_Init() $c1 = Bench1() $bench3 = Round(_Timer_Diff($bench), 4) ConsoleWrite($bench1 & " ms. Found: " & $c1 & @CRLF) ConsoleWrite($bench2 & " ms. Found: " & $c2 & @CRLF) ConsoleWrite($bench3 & " ms. Found: " & $c3 & @CRLF) Func Bench1() $fo = FileOpen($Fod, 0) $read = FileRead($fo) StringReplace($read, $word_str, "") $word_count = @extended Return $word_count EndFunc Func Bench2() Local $count $aArray = StringRegExp(FileRead($Fod), "(?i)(" & $word_str & ")", 3) If IsArray($aArray) Then $count = UBound($aArray) Else $count = 0 EndIf Return $count EndFunc Func Bench3() Local $numreplacements Local $size = FileGetSize($Fod) Local $tBuffer = DllStructCreate("byte[" & $size & "]") Local $hFile = _WinAPI_CreateFile($Fod, 2, 2) _WinAPI_SetFilePointer($hFile, 0) _WinAPI_ReadFile($hFile, DllStructGetPtr($tBuffer), $size, $nBytes) _WinAPI_CloseHandle($hFile) $sText = BinaryToString(DllStructGetData($tBuffer, 1)) $count = StringReplace($sText, $word_str, $word_str) $numreplacements = @extended Return $numreplacements EndFunc Here a result of a 2MB text file: 555.6136 ms. Found: 18 69.0422 ms. Found: 18 560.3577 ms. Found: 18 And the winner is... Melba23 UEZ Edited October 9, 2009 by UEZ Please don't send me any personal message and ask for support! I will not reply! Selection of finest graphical examples at Codepen.io The own fart smells best! ✌Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!¯\_(ツ)_/¯ ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted October 9, 2009 Moderators Share Posted October 9, 2009 UEZ,Facinating results. I always knew/believed that the String functions were slow (relatively speaking), but that is an amazing difference. Thank you very much for having taken the trouble to benchmark the 3 versions.M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
Mat Posted October 9, 2009 Share Posted October 9, 2009 I can do it in one line , but this is also to demonstrate how slow the SetError function appears to be... Its the only difference, and its a big difference. $hTimer = TimerInit () $test1 = _Test1 (@Scriptfullpath, "test") $test1time = TimerDiff ($hTimer) $hTimer = TimerInit () $test2 = _Test2 (@Scriptfullpath, "test") $test2time = TimerDiff ($hTimer) MsgBox (0, "results", "1: " & $test1 & @TAB & $test1Time / 1000 & @CRLF & "2: " & $test2 & @TAB & $test2time) Func _Test1 ($sFile, $sString) Return UBound (StringRegExp(FileRead($sFile), "(?i)(" & $sString & ")", 3)) EndFunc ;==>_Test1 Func _Test2 ($sFile, $sString) Return SetError (0, 0, UBound (StringRegExp(FileRead($sFile), "(?i)(" & $sString & ")", 3))) EndFunc ;==>_Test2()Don't ask why its relevent, I was checking if returning a function would return the @Error value, it doesn't, but the time difference is instantly noticeable. Mat AutoIt Project Listing Link to comment Share on other sites More sharing options...
Developers Jos Posted October 9, 2009 Developers Share Posted October 9, 2009 The result could be due to the OP original script only counts the first occurrence on each line, never multiple occurrences. Jos SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past. Link to comment Share on other sites More sharing options...
Mat Posted October 9, 2009 Share Posted October 9, 2009 The string reg exp method does not deal with using special characters. There is a solution for this on the forums somewhere... but that could slow down melba's veresion down a bit (and mine). Mat AutoIt Project Listing Link to comment Share on other sites More sharing options...
AlmarM Posted October 9, 2009 Author Share Posted October 9, 2009 Results (7 mb~ file) 1358.8447 ms. Found: 183920 457.6034 ms. Found: 183920 1396.8972 ms. Found: 183920 Still think its weird... Minesweeper A minesweeper game created in autoit, source available. _Mouse_UDF An UDF for registering functions to mouse events, made in pure autoit. 2D Hitbox Editor A 2D hitbox editor for quick creation of 2D sphere and rectangle hitboxes. Link to comment Share on other sites More sharing options...
Developers Jos Posted October 9, 2009 Developers Share Posted October 9, 2009 Results (7 mb~ file) 1358.8447 ms. Found: 183920 457.6034 ms. Found: 183920 1396.8972 ms. Found: 183920 Still think its weird... What is weird? SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past. Link to comment Share on other sites More sharing options...
AlmarM Posted October 9, 2009 Author Share Posted October 9, 2009 What is weird?Well, the fact that my script counts 5111 and all these ones '180000+'.Guess its just me. Minesweeper A minesweeper game created in autoit, source available. _Mouse_UDF An UDF for registering functions to mouse events, made in pure autoit. 2D Hitbox Editor A 2D hitbox editor for quick creation of 2D sphere and rectangle hitboxes. Link to comment Share on other sites More sharing options...
Developers Jos Posted October 9, 2009 Developers Share Posted October 9, 2009 Well, the fact that my script counts 5111 and all these ones '180000+'.Guess its just me. Did you read my comment about that? SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past. Link to comment Share on other sites More sharing options...
UEZ Posted October 9, 2009 Share Posted October 9, 2009 (edited) Well, the fact that my script counts 5111 and all these ones '180000+'.Guess its just me. As Jos mentioned, you used StringInStr() which counts only 1 occurrence per line -> that means the words appears more than 1 time in one line!Btw, I want to add that Valuater's code and my code are very similar! We both used StringReplace() to count occurrences and that the reason why both benchmark scores are very similar!When I wrote it nobody had replied (I didn't noticed any reply meanwhile. It was just a coincidence that we had a similar idea!).UEZ Edited October 9, 2009 by UEZ Please don't send me any personal message and ask for support! I will not reply! Selection of finest graphical examples at Codepen.io The own fart smells best! ✌Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!¯\_(ツ)_/¯ ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ Link to comment Share on other sites More sharing options...
AlmarM Posted October 9, 2009 Author Share Posted October 9, 2009 Did you read my comment about that?Sorry, missed that one! Minesweeper A minesweeper game created in autoit, source available. _Mouse_UDF An UDF for registering functions to mouse events, made in pure autoit. 2D Hitbox Editor A 2D hitbox editor for quick creation of 2D sphere and rectangle hitboxes. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now