Jump to content

Tweak my script~


 Share

Recommended Posts

Hi,

I needed to count a certain words in a file (7,4 mb~) and I made a little script of it.

$Fod = FileOpenDialog("Open .txt", @DesktopDir, "Text Files (*.txt)")
$word = 0
$word_str = "test"
$fo = FileOpen($fod, 0)
$read = FileRead($fo)
$spl = StringSplit($read, Chr(10))

For $i = 1 To $spl[0]
    $Math = (100 / $spl[0]) * $i
    $readline = FileReadLine($fo, $i)
    ToolTip(Round($Math, 2) & "%", 0, 0)
    If StringInStr($readline, $word_str) Then
        $word += 1
    EndIf
Next
FileClose($fo)

FileWrite(@DesktopDir & "\word_count.txt", $word)
MsgBox(0, "", "'word' found: " & $word)

Its a bit slow, it takes like 4 hours +~ to search all the lines.

Can anyone tweak my script so its a bit faster?

AlmarM

Minesweeper

A minesweeper game created in autoit, source available.

_Mouse_UDF

An UDF for registering functions to mouse events, made in pure autoit.

2D Hitbox Editor

A 2D hitbox editor for quick creation of 2D sphere and rectangle hitboxes.

Link to comment
Share on other sites

  • Moderators

AlmarM,

I would do it this way:

$Fod = FileOpenDialog("Open .txt", @DeskTopDir, "Text Files (*.txt)")
$word_str = "test"

$aArray = StringRegExp(FileRead($Fod), "(?i)(" & $word_str & ")", 3)

If IsArray($aArray) Then
    MsgBox(0, "", $word_str & " found " & UBound($aArray) & " times")
Else
    MsgBox(0, "", $word_str & " not found")
EndIf

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

What about this method:

;coded by UEZ
#include <WinAPI.au3>
Global $nBytes
$Fod = FileOpenDialog("Open .txt", @DesktopDir, "Text Files (*.txt)")
$word = 0
$word_str = "test"
$size = FileGetSize($Fod)

$tBuffer = DllStructCreate("byte[" & $size & "]")
$hFile = _WinAPI_CreateFile($Fod, 2, 2)
_WinAPI_SetFilePointer($hFile, 0)
_WinAPI_ReadFile($hFile, DllStructGetPtr($tBuffer), $size, $nBytes)
_WinAPI_CloseHandle($hFile)
$sText = BinaryToString(DllStructGetData($tBuffer, 1))
$count = StringReplace($sText, $word_str, $word_str)
$numreplacements = @extended
MsgBox(0, "", "'word' found: " & $numreplacements)

UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

Tested all methods, works fine! Thank you :)

Only, with my scan, it'll find a certain word '5111' times. With these scans '183920' times.

These scans are correct, right?

Minesweeper

A minesweeper game created in autoit, source available.

_Mouse_UDF

An UDF for registering functions to mouse events, made in pure autoit.

2D Hitbox Editor

A 2D hitbox editor for quick creation of 2D sphere and rectangle hitboxes.

Link to comment
Share on other sites

5k and 183k is a major difference. If you used my original before the edit, it only replace space.... be sure to test the one that is there now. Did all 3 example scripts find 183k plus?

8)

All same results.

Results:

Mine: 5111

Valuater: 183920

Melba: 183920

UEZ: 182920

Edited by AlmarM

Minesweeper

A minesweeper game created in autoit, source available.

_Mouse_UDF

An UDF for registering functions to mouse events, made in pure autoit.

2D Hitbox Editor

A 2D hitbox editor for quick creation of 2D sphere and rectangle hitboxes.

Link to comment
Share on other sites

  • Moderators

Val,

Please be assured that I take "capable" as a compliment of the highest order! :)

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

Another interessting aspect is the benchmark of these 3 codes:

#include <Timers.au3>
#include <WinAPI.au3>
Global $nBytes, $Fod, $word_count
Global $word_str = "test"
$Fod = FileOpenDialog("Open .txt", @DesktopDir, "Text Files (*.txt)")

$bench = _Timer_Init()
$c3 = Bench3()
$bench1 = Round(_Timer_Diff($bench), 4)

$bench = _Timer_Init()
$c2 = Bench2()
$bench2 = Round(_Timer_Diff($bench), 4)

$bench = _Timer_Init()
$c1 = Bench1()
$bench3 = Round(_Timer_Diff($bench), 4)

ConsoleWrite($bench1 & " ms. Found: " & $c1 & @CRLF)
ConsoleWrite($bench2 & " ms. Found: " & $c2 & @CRLF)
ConsoleWrite($bench3 & " ms. Found: " & $c3 & @CRLF)


Func Bench1()
    $fo = FileOpen($Fod, 0)
    $read = FileRead($fo)
    StringReplace($read, $word_str, "")
    $word_count = @extended
    Return $word_count
EndFunc

Func Bench2()
    Local $count
    $aArray = StringRegExp(FileRead($Fod), "(?i)(" & $word_str & ")", 3)
    If IsArray($aArray) Then
        $count = UBound($aArray)
    Else
        $count = 0
    EndIf
    Return $count
EndFunc

Func Bench3()
    Local $numreplacements
    Local $size = FileGetSize($Fod)
    Local $tBuffer = DllStructCreate("byte[" & $size & "]")
    Local $hFile = _WinAPI_CreateFile($Fod, 2, 2)
    _WinAPI_SetFilePointer($hFile, 0)
    _WinAPI_ReadFile($hFile, DllStructGetPtr($tBuffer), $size, $nBytes)
    _WinAPI_CloseHandle($hFile)
    $sText = BinaryToString(DllStructGetData($tBuffer, 1))
    $count = StringReplace($sText, $word_str, $word_str)
    $numreplacements = @extended
    Return $numreplacements
EndFunc

Here a result of a 2MB text file:

555.6136 ms. Found: 18
69.0422 ms. Found: 18
560.3577 ms. Found: 18

And the winner is... Melba23 :)

UEZ

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

  • Moderators

UEZ,

Facinating results. :) I always knew/believed that the String functions were slow (relatively speaking), but that is an amazing difference. Thank you very much for having taken the trouble to benchmark the 3 versions.

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

I can do it in one line :), but this is also to demonstrate how slow the SetError function appears to be... Its the only difference, and its a big difference.

$hTimer = TimerInit ()
$test1 = _Test1 (@Scriptfullpath, "test")
$test1time = TimerDiff ($hTimer)

$hTimer = TimerInit ()
$test2 = _Test2 (@Scriptfullpath, "test")
$test2time = TimerDiff ($hTimer)

MsgBox (0, "results", "1: " & $test1 & @TAB & $test1Time / 1000 & @CRLF & "2: " & $test2 & @TAB & $test2time)

Func _Test1 ($sFile, $sString)
   Return UBound (StringRegExp(FileRead($sFile), "(?i)(" & $sString & ")", 3))
EndFunc   ;==>_Test1

Func _Test2 ($sFile, $sString)
   Return SetError (0, 0, UBound (StringRegExp(FileRead($sFile), "(?i)(" & $sString & ")", 3)))
EndFunc   ;==>_Test2()
Don't ask why its relevent, I was checking if returning a function would return the @Error value, it doesn't, but the time difference is instantly noticeable.

Mat

Link to comment
Share on other sites

  • Developers

The result could be due to the OP original script only counts the first occurrence on each line, never multiple occurrences.

Jos

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Link to comment
Share on other sites

Results (7 mb~ file)

1358.8447 ms. Found: 183920
457.6034 ms. Found: 183920
1396.8972 ms. Found: 183920

Still think its weird...

Minesweeper

A minesweeper game created in autoit, source available.

_Mouse_UDF

An UDF for registering functions to mouse events, made in pure autoit.

2D Hitbox Editor

A 2D hitbox editor for quick creation of 2D sphere and rectangle hitboxes.

Link to comment
Share on other sites

  • Developers

Results (7 mb~ file)

1358.8447 ms. Found: 183920
457.6034 ms. Found: 183920
1396.8972 ms. Found: 183920

Still think its weird...

What is weird?

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Link to comment
Share on other sites

What is weird?

Well, the fact that my script counts 5111 and all these ones '180000+'.

Guess its just me. :)

Minesweeper

A minesweeper game created in autoit, source available.

_Mouse_UDF

An UDF for registering functions to mouse events, made in pure autoit.

2D Hitbox Editor

A 2D hitbox editor for quick creation of 2D sphere and rectangle hitboxes.

Link to comment
Share on other sites

  • Developers

Well, the fact that my script counts 5111 and all these ones '180000+'.

Guess its just me. :)

Did you read my comment about that?

SciTE4AutoIt3 Full installer Download page   - Beta files       Read before posting     How to post scriptsource   Forum etiquette  Forum Rules 
 
Live for the present,
Dream of the future,
Learn from the past.
  :)

Link to comment
Share on other sites

Well, the fact that my script counts 5111 and all these ones '180000+'.

Guess its just me. :)

As Jos mentioned, you used StringInStr() which counts only 1 occurrence per line -> that means the words appears more than 1 time in one line!

Btw, I want to add that Valuater's code and my code are very similar! We both used StringReplace() to count occurrences and that the reason why both benchmark scores are very similar!

When I wrote it nobody had replied (I didn't noticed any reply meanwhile. It was just a coincidence that we had a similar idea!).

UEZ

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...