Jump to content
Sign in to follow this  
AlmarM

Tweak my script~

Recommended Posts

AlmarM

Hi,

I needed to count a certain words in a file (7,4 mb~) and I made a little script of it.

$Fod = FileOpenDialog("Open .txt", @DesktopDir, "Text Files (*.txt)")
$word = 0
$word_str = "test"
$fo = FileOpen($fod, 0)
$read = FileRead($fo)
$spl = StringSplit($read, Chr(10))

For $i = 1 To $spl[0]
    $Math = (100 / $spl[0]) * $i
    $readline = FileReadLine($fo, $i)
    ToolTip(Round($Math, 2) & "%", 0, 0)
    If StringInStr($readline, $word_str) Then
        $word += 1
    EndIf
Next
FileClose($fo)

FileWrite(@DesktopDir & "\word_count.txt", $word)
MsgBox(0, "", "'word' found: " & $word)

Its a bit slow, it takes like 4 hours +~ to search all the lines.

Can anyone tweak my script so its a bit faster?

AlmarM


Minesweeper

A minesweeper game created in autoit, source available.

_Mouse_UDF

An UDF for registering functions to mouse events, made in pure autoit.

2D Hitbox Editor

A 2D hitbox editor for quick creation of 2D sphere and rectangle hitboxes.

Share this post


Link to post
Share on other sites
Melba23

AlmarM,

I would do it this way:

$Fod = FileOpenDialog("Open .txt", @DeskTopDir, "Text Files (*.txt)")
$word_str = "test"

$aArray = StringRegExp(FileRead($Fod), "(?i)(" & $word_str & ")", 3)

If IsArray($aArray) Then
    MsgBox(0, "", $word_str & " found " & UBound($aArray) & " times")
Else
    MsgBox(0, "", $word_str & " not found")
EndIf

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
UEZ

What about this method:

;coded by UEZ
#include <WinAPI.au3>
Global $nBytes
$Fod = FileOpenDialog("Open .txt", @DesktopDir, "Text Files (*.txt)")
$word = 0
$word_str = "test"
$size = FileGetSize($Fod)

$tBuffer = DllStructCreate("byte[" & $size & "]")
$hFile = _WinAPI_CreateFile($Fod, 2, 2)
_WinAPI_SetFilePointer($hFile, 0)
_WinAPI_ReadFile($hFile, DllStructGetPtr($tBuffer), $size, $nBytes)
_WinAPI_CloseHandle($hFile)
$sText = BinaryToString(DllStructGetData($tBuffer, 1))
$count = StringReplace($sText, $word_str, $word_str)
$numreplacements = @extended
MsgBox(0, "", "'word' found: " & $numreplacements)

UEZ


Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Share this post


Link to post
Share on other sites
AlmarM

Tested all methods, works fine! Thank you :)

Only, with my scan, it'll find a certain word '5111' times. With these scans '183920' times.

These scans are correct, right?


Minesweeper

A minesweeper game created in autoit, source available.

_Mouse_UDF

An UDF for registering functions to mouse events, made in pure autoit.

2D Hitbox Editor

A 2D hitbox editor for quick creation of 2D sphere and rectangle hitboxes.

Share this post


Link to post
Share on other sites
Valuater

5k and 183k is a major difference. If you used my original before the edit, it only replace space.... be sure to test the one that is there now. Did all 3 example scripts find 183k plus?

8)


NEWHeader1.png

Share this post


Link to post
Share on other sites
AlmarM

5k and 183k is a major difference. If you used my original before the edit, it only replace space.... be sure to test the one that is there now. Did all 3 example scripts find 183k plus?

8)

All same results.

Results:

Mine: 5111

Valuater: 183920

Melba: 183920

UEZ: 182920

Edited by AlmarM

Minesweeper

A minesweeper game created in autoit, source available.

_Mouse_UDF

An UDF for registering functions to mouse events, made in pure autoit.

2D Hitbox Editor

A 2D hitbox editor for quick creation of 2D sphere and rectangle hitboxes.

Share this post


Link to post
Share on other sites
Valuater

Well, if 3 different scripts from 3 capable people come up with the same number...

it must be correct.

8)


NEWHeader1.png

Share this post


Link to post
Share on other sites
Melba23

Val,

Please be assured that I take "capable" as a compliment of the highest order! :)

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
UEZ

Another interessting aspect is the benchmark of these 3 codes:

#include <Timers.au3>
#include <WinAPI.au3>
Global $nBytes, $Fod, $word_count
Global $word_str = "test"
$Fod = FileOpenDialog("Open .txt", @DesktopDir, "Text Files (*.txt)")

$bench = _Timer_Init()
$c3 = Bench3()
$bench1 = Round(_Timer_Diff($bench), 4)

$bench = _Timer_Init()
$c2 = Bench2()
$bench2 = Round(_Timer_Diff($bench), 4)

$bench = _Timer_Init()
$c1 = Bench1()
$bench3 = Round(_Timer_Diff($bench), 4)

ConsoleWrite($bench1 & " ms. Found: " & $c1 & @CRLF)
ConsoleWrite($bench2 & " ms. Found: " & $c2 & @CRLF)
ConsoleWrite($bench3 & " ms. Found: " & $c3 & @CRLF)


Func Bench1()
    $fo = FileOpen($Fod, 0)
    $read = FileRead($fo)
    StringReplace($read, $word_str, "")
    $word_count = @extended
    Return $word_count
EndFunc

Func Bench2()
    Local $count
    $aArray = StringRegExp(FileRead($Fod), "(?i)(" & $word_str & ")", 3)
    If IsArray($aArray) Then
        $count = UBound($aArray)
    Else
        $count = 0
    EndIf
    Return $count
EndFunc

Func Bench3()
    Local $numreplacements
    Local $size = FileGetSize($Fod)
    Local $tBuffer = DllStructCreate("byte[" & $size & "]")
    Local $hFile = _WinAPI_CreateFile($Fod, 2, 2)
    _WinAPI_SetFilePointer($hFile, 0)
    _WinAPI_ReadFile($hFile, DllStructGetPtr($tBuffer), $size, $nBytes)
    _WinAPI_CloseHandle($hFile)
    $sText = BinaryToString(DllStructGetData($tBuffer, 1))
    $count = StringReplace($sText, $word_str, $word_str)
    $numreplacements = @extended
    Return $numreplacements
EndFunc

Here a result of a 2MB text file:

555.6136 ms. Found: 18
69.0422 ms. Found: 18
560.3577 ms. Found: 18

And the winner is... Melba23 :)

UEZ

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Share this post


Link to post
Share on other sites
Melba23

UEZ,

Facinating results. :) I always knew/believed that the String functions were slow (relatively speaking), but that is an amazing difference. Thank you very much for having taken the trouble to benchmark the 3 versions.

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites
Mat

I can do it in one line :), but this is also to demonstrate how slow the SetError function appears to be... Its the only difference, and its a big difference.

$hTimer = TimerInit ()
$test1 = _Test1 (@Scriptfullpath, "test")
$test1time = TimerDiff ($hTimer)

$hTimer = TimerInit ()
$test2 = _Test2 (@Scriptfullpath, "test")
$test2time = TimerDiff ($hTimer)

MsgBox (0, "results", "1: " & $test1 & @TAB & $test1Time / 1000 & @CRLF & "2: " & $test2 & @TAB & $test2time)

Func _Test1 ($sFile, $sString)
   Return UBound (StringRegExp(FileRead($sFile), "(?i)(" & $sString & ")", 3))
EndFunc   ;==>_Test1

Func _Test2 ($sFile, $sString)
   Return SetError (0, 0, UBound (StringRegExp(FileRead($sFile), "(?i)(" & $sString & ")", 3)))
EndFunc   ;==>_Test2()
Don't ask why its relevent, I was checking if returning a function would return the @Error value, it doesn't, but the time difference is instantly noticeable.

Mat

Share this post


Link to post
Share on other sites
Mat

The string reg exp method does not deal with using special characters. There is a solution for this on the forums somewhere... but that could slow down melba's veresion down a bit (and mine).

Mat

Share this post


Link to post
Share on other sites
AlmarM

Results (7 mb~ file)

1358.8447 ms. Found: 183920
457.6034 ms. Found: 183920
1396.8972 ms. Found: 183920

Still think its weird...


Minesweeper

A minesweeper game created in autoit, source available.

_Mouse_UDF

An UDF for registering functions to mouse events, made in pure autoit.

2D Hitbox Editor

A 2D hitbox editor for quick creation of 2D sphere and rectangle hitboxes.

Share this post


Link to post
Share on other sites
Jos

Results (7 mb~ file)

1358.8447 ms. Found: 183920
457.6034 ms. Found: 183920
1396.8972 ms. Found: 183920

Still think its weird...

What is weird?

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource        Forum Rules
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Share this post


Link to post
Share on other sites
AlmarM

What is weird?

Well, the fact that my script counts 5111 and all these ones '180000+'.

Guess its just me. :)


Minesweeper

A minesweeper game created in autoit, source available.

_Mouse_UDF

An UDF for registering functions to mouse events, made in pure autoit.

2D Hitbox Editor

A 2D hitbox editor for quick creation of 2D sphere and rectangle hitboxes.

Share this post


Link to post
Share on other sites
UEZ

Well, the fact that my script counts 5111 and all these ones '180000+'.

Guess its just me. :)

As Jos mentioned, you used StringInStr() which counts only 1 occurrence per line -> that means the words appears more than 1 time in one line!

Btw, I want to add that Valuater's code and my code are very similar! We both used StringReplace() to count occurrences and that the reason why both benchmark scores are very similar!

When I wrote it nobody had replied (I didn't noticed any reply meanwhile. It was just a coincidence that we had a similar idea!).

UEZ

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Share this post


Link to post
Share on other sites
AlmarM

Did you read my comment about that?

Sorry, missed that one! :)

Minesweeper

A minesweeper game created in autoit, source available.

_Mouse_UDF

An UDF for registering functions to mouse events, made in pure autoit.

2D Hitbox Editor

A 2D hitbox editor for quick creation of 2D sphere and rectangle hitboxes.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×