andygo

make script faster

36 posts in this topic

hello,

 

i have a script which can handle huge data.

i feed it with textfiles, it reads each line into array, then compare the lines, do some string operations. this all happens in a for.to.next loop.

 

the script only use 15% cpu of my 8core.amd

 

can i force the script to use more cpuload an therefore being faster?

 

would it make a speed difference to compile it as 64bit exe on 64bit systems?

 

thank you for commemts :)

Share this post


Link to post
Share on other sites



the script only use 15% cpu of my 8core.amd

 

1/8 = 12 % 50''

AutoIt is not multi process / multi thread

 


Signature beginning:   Wondering who uses AutoIT and what it can be used for ?
* GHAPI UDF - modest begining - comunication with GitHub REST API *
ADO.au3 UDF     POP3.au3 UDF     XML.au3 UDF    How to use IE.au3  UDF with  AutoIt v3.3.14.x  for other useful stuff click the following button

Spoiler

Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind. 

My contribution (my own projects): * Debenu Quick PDF Library - UDF * Debenu PDF Viewer SDK - UDF * Acrobat Reader - ActiveX Viewer * UDF for PDFCreator v1.x.x * XZip - UDF * AppCompatFlags UDF * CrowdinAPI UDF * _WinMergeCompare2Files() * _JavaExceptionAdd() * _IsBeta() * Writing DPI Awareness App - workaround * _AutoIt_RequiredVersion() * Chilkatsoft.au3 UDF * TeamViewer.au3 UDF * JavaManagement UDF * VIES over SOAP * WinSCP UDF * GHAPI UDF - modest begining - comunication with GitHub REST API *

My contribution to others projects or UDF based on  others projects: * _sql.au3 UDF  * POP3.au3 UDF *  RTF Printer - UDF * XML.au3 - BETA * ADO.au3 UDF SMTP Mailer UDF *

Useful links: * Forum Rules * Forum etiquette *  Forum Information and FAQs * How to post code on the forum * AutoIt Online Documentation * AutoIt Online Beta Documentation * SciTE4AutoIt3 getting started * Convert text blocks to AutoIt code * Games made in Autoit * Programming related sites * Polish AutoIt Tutorial * DllCall Code Generator * 

Wiki: Expand your knowledge - AutoIt Wiki * Collection of User Defined Functions * How to use HelpFile * Best coding practices * 

IE Related:  * How to use IE.au3  UDF with  AutoIt v3.3.14.x * Why isn't Autoit able to click a Javascript Dialog? * Clicking javascript button with no ID * IE document >> save as MHT file * IETab Switcher (by LarsJ ) * HTML Entities * _IEquerySelectorAll() (by uncommon) * 

I encourage you to read: * Global Vars * Best Coding Practices * Please explain code used in Help file for several File functions * OOP-like approach in AutoIt * UDF-Spec Questions *  EXAMPLE: How To Catch ConsoleWrite() output to a file or to CMD *

"Homo sum; humani nil a me alienum puto" - Publius Terentius Afer
"Program are meant to be read by humans and only incidentally for computers and execute" - Donald Knuth, "The Art of Computer Programming"
:naughty:  :ranting:, be  :) and       \\//_.

Anticipating Errors :  "Any program that accepts data from a user must include code to validate that data before sending it to the data store. You cannot rely on the data store, ...., or even your programming language to notify you of problems. You must check every byte entered by your users, making sure that data is the correct type for its field and that required fields are not empty."

Signature last update: 2017-06-04

Share this post


Link to post
Share on other sites

Do not read the file line by line but either as a whole or by reading bigger chunks into an array and then process the array.

1 person likes this

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

If you can use CSV files, I have had really good efficiency with this script (built by czardas).

 

; #FUNCTION# ====================================================================================================================
; Name...........: _CSVSplit
; Description ...: Converts a string in CSV format to a two dimensional array (see comments)
; Syntax.........: CSVSplit ( $aArray [, $sDelim ] )
; Parameters ....: $aArray  - The array to convert
;                  $sDelim  - Optional - Delimiter set to comma by default (see 2nd comment)
; Return values .: Success  - Returns a two dimensional array or a one dimensional array (see 1st comment)
;                  Failure  - Sets @error to:
;                 |@error = 1 - First parameter is not a valid string
;                 |@error = 2 - Second parameter is not a valid string
;                 |@error = 3 - Could not find suitable delimiter replacements
; Author ........: czardas
; Comments ......; Returns a one dimensional array if the input string does not contain the delimiter string
;                ; Some CSV formats use semicolon as a delimiter instead of a comma
;                ; Set the second parameter to @TAB To convert to TSV
; ===============================================================================================================================

Func _CSVSplit($string, $sDelim = ",") ; Parses csv string input and returns a one or two dimensional array
    If Not IsString($string) Or $string = "" Then Return SetError(1, 0, 0) ; Invalid string
    If Not IsString($sDelim) Or $sDelim = "" Then Return SetError(2, 0, 0) ; Invalid string

    $string = StringRegExpReplace($string, "[\r\n]+\z", "") ; [Line Added] Remove training breaks
    Local $iOverride = 63743, $asDelim[3] ; $asDelim => replacements for comma, new line and double quote
    For $i = 0 To 2
        $asDelim[$i] = __GetSubstitute($string, $iOverride) ; Choose a suitable substitution character
        If @error Then Return SetError(3, 0, 0) ; String contains too many unsuitable characters
    Next
    $iOverride = 0

    Local $aArray = StringRegExp($string, '\A[^"]+|("+[^"]+)|"+\z', 3) ; Split string using double quotes delim - largest match
    $string = ""

    Local $iBound = UBound($aArray)
    For $i = 0 To $iBound -1
        $iOverride += StringInStr($aArray[$i], '"', 0, -1) ; Increment by the number of adjacent double quotes per element
        If Mod ($iOverride +2, 2) = 0 Then ; Acts as an on/off switch
            $aArray[$i] = StringReplace($aArray[$i], $sDelim, $asDelim[0]) ; Replace comma delimeters
            $aArray[$i] = StringRegExpReplace($aArray[$i], "(\r\n)|[\r\n]", $asDelim[1]) ; Replace new line delimeters
        EndIf
        $aArray[$i] = StringReplace($aArray[$i], '""', $asDelim[2]) ; Replace double quote pairs
        $aArray[$i] = StringReplace($aArray[$i], '"', '') ; Delete enclosing double quotes - not paired
        $aArray[$i] = StringReplace($aArray[$i], $asDelim[2], '"') ; Reintroduce double quote pairs as single characters
        $string &= $aArray[$i] ; Rebuild the string, which includes two different delimiters
    Next
    $iOverride = 0

    $aArray = StringSplit($string, $asDelim[1], 2) ; Split to get rows
    $iBound = UBound($aArray)
    Local $aCSV[$iBound][2], $aTemp
    For $i = 0 To $iBound -1
        $aTemp = StringSplit($aArray[$i], $asDelim[0]) ; Split to get row items
        If Not @error Then
            If $aTemp[0] > $iOverride Then
                $iOverride = $aTemp[0]
                ReDim $aCSV[$iBound][$iOverride] ; Add columns to accomodate more items
            EndIf
        EndIf
        For $j = 1 To $aTemp[0]
            If StringLen($aTemp[$j]) Then
                If Not StringRegExp($aTemp[$j], '[^"]') Then ; Field only contains double quotes
                    $aTemp[$j] = StringTrimLeft($aTemp[$j], 1) ; Delete enclosing double quote single char
                EndIf
                $aCSV[$i][$j -1] = $aTemp[$j] ; Populate each row
            EndIf
        Next
    Next

    If $iOverride > 1 Then
        Return $aCSV ; Multiple Columns
    Else
        For $i = 0 To $iBound -1
            If StringLen($aArray[$i]) And (Not StringRegExp($aArray[$i], '[^"]')) Then ; Only contains double quotes
                $aArray[$i] = StringTrimLeft($aArray[$i], 1) ; Delete enclosing double quote single char
            EndIf
        Next
        Return $aArray ; Single column
    EndIf

EndFunc ;==> _CSVSplit

; #INTERNAL_USE_ONLY# ===========================================================================================================
; Name...........: __GetSubstitute
; Description ...: Searches for a character to be used for substitution, ie one not contained within the input string
; Syntax.........: __GetSubstitute($string, ByRef $iCountdown)
; Parameters ....: $string   - The string of characters to avoid
;                  $iCountdown - The first code point to begin checking
; Return values .: Success   - Returns a suitable substitution character not found within the first parameter
;                  Failure   - Sets @error to 1 => No substitution character available
; Author ........: czardas
; Comments ......; This function is connected to the function _CSVSplit and was not intended for general use
;                  $iCountdown is returned ByRef to avoid selecting the same character on subsequent calls to this function
;                  Initially $iCountown should be passed with a value = 63743
; ===============================================================================================================================

Func __GetSubstitute($string, ByRef $iCountdown)
    If $iCountdown < 57344 Then Return SetError(1, 0, "") ; Out of options
    Local $sTestChar
    For $i = $iCountdown To 57344 Step -1
        $sTestChar = ChrW($i)
        $iCountdown -= 1
        If Not StringInStr($string, $sTestChar) Then
            Return $sTestChar
        EndIf
    Next
    Return SetError(1, 0, "") ; Out of options
EndFunc ;==> __GetSubstitute

You could make it multi-threaded though by compiling the script as an exe and then build a script that does a run command on the exe (I've got a script that pulls the top record from my database then marks it as being processed and does work on the record which I run 2-3 of them at a time using a run command)

Share this post


Link to post
Share on other sites

@Jewtus, please don't duplicate code around the forum. Instead be polite and post a link back to where you got the original code from.

Providing a backlink is better than copying, because no doubt @czardas will update that particular function in the near future. Therefore by linking, you guarantee an up to date function for those who come across your post. Unless you will be updating this post as well with every update they make?

1 person likes this

_AdapterConnections()_AlwaysRun()_AppMon()_AppMonEx()_BinaryBin()_CheckMsgBox()_CmdLineRaw()_ContextMenu()_ConvertLHWebColor()/_ConvertSHWebColor()_DesktopDimensions()_DisplayPassword()_DotNet_Load()/_DotNet_Unload()_Fibonacci()_FileCompare()_FileCompareContents()_FileNameByHandle()_FilePrefix/SRE()_FindInFile()_GetBackgroundColor()/_SetBackgroundColor()_GetConrolID()_GetCtrlClass()_GetDirectoryFormat()_GetDriveMediaType()_GetFilename()/_GetFilenameExt()_GetHardwareID()_GetIP()_GetIP_Country()_GetOSLanguage()_GetSavedSource()_GetStringSize()_GetSystemPaths()_GetURLImage()_GIFImage()_GoogleWeather()_GUICtrlCreateGroup()_GUICtrlListBox_CreateArray()_GUICtrlListView_CreateArray()_GUICtrlListView_SaveCSV()_GUICtrlListView_SaveHTML()_GUICtrlListView_SaveTxt()_GUICtrlListView_SaveXML()_GUICtrlMenu_Recent()_GUICtrlMenu_SetItemImage()_GUICtrlTreeView_CreateArray()_GUIDisable()_GUIImageList_SetIconFromHandle()_GUIRegisterMsg()_GUISetIcon()_Icon_Clear()/_Icon_Set()_IdleTime()_InetGet()_InetGetGUI()_InetGetProgress()_IPDetails()_IsFileOlder()_IsGUID()_IsHex()_IsPalindrome()_IsRegKey()_IsStringRegExp()_IsSystemDrive()_IsUPX()_IsValidType()_IsWebColor()_Language()_Log()_MicrosoftInternetConnectivity()_MSDNDataType()_PathFull/GetRelative/Split()_PathSplitEx()_PrintFromArray()_ProgressSetMarquee()_ReDim()_RockPaperScissors()/_RockPaperScissorsLizardSpock()_ScrollingCredits_SelfDelete()_SelfRename()_SelfUpdate()_SendTo()_ShellAll()_ShellFile()_ShellFolder()_SingletonHWID()_SingletonPID()_Startup()_StringCompact()_StringIsValid()_StringRegExpMetaCharacters()_StringReplaceWholeWord()_StringStripChars()_Temperature()_TrialPeriod()_UKToUSDate()/_USToUKDate()_WinAPI_Create_CTL_CODE()_WinAPI_CreateGUID()_WMIDateStringToDate()/_DateToWMIDateString()Au3 script parsingAutoIt SearchAutoIt3 PortableAutoIt3WrapperToPragmaAutoItWinGetTitle()/AutoItWinSetTitle()CodingDirToHTML5FileInstallrFileReadLastChars()GeoIP databaseGUI - Only Close ButtonGUI ExamplesGUICtrlDeleteImage()GUICtrlGetBkColor()GUICtrlGetStyle()GUIEventsGUIGetBkColor()Int_Parse() & Int_TryParse()IsISBN()LockFile()Mapping CtrlIDsOOP in AutoItParseHeadersToSciTE()PasswordValidPasteBinPosts Per DayPreExpandProtect GlobalsQueue()Resource UpdateResourcesExSciTE JumpSettings INISHELLHOOKShunting-YardSignature CreatorStack()Stopwatch()StringAddLF()/StringStripLF()StringEOLToCRLF()VSCROLLWM_COPYDATAMore Examples...

Updated: 04/09/2015

Share this post


Link to post
Share on other sites

I have tried to find the original link sometime back to no avail. I'm now in the habit of putting the URL in the header when I copy code for that very reason.

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

This?

I found the link this way:

Edited by water

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

ok, i tried 64 compile and process priority, both didnt change the speed.

icleaned my loop to the needed times:

$totalloops = 0
for x = 1 to array[0] - 1
$totalloops += 1
next


for a 3.000 Items array i need (2.999 + 2.998 + 2.997.... + 1) loops and this takes now only 13 seconds.

i can deal with that :)

Share this post


Link to post
Share on other sites

Post your code: 13s for 3000 items seems pretty slow.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

i will post the code today afternoon, still at work now. 

Edited by andygo
written some wrong

Share this post


Link to post
Share on other sites
#include <File.au3>
#include <StaticConstants.au3>
#include <EditConstants.au3>
#include <ComboConstants.au3>
#include <GuiConstants.au3>
#include <WindowsConstants.au3>
#include <ButtonConstants.au3>
#include <IE.au3>
#include <Timers.au3>
#include <Process.au3>
#Include <Misc.au3>
#include <Constants.au3>
Break(0)
Opt("TrayAutoPause",0)
Opt("GuiOnEventMode", 1)
if msgbox(65, "Hinweis", "Das Tool wird ggf. von einigen Schutzprogrammen negativ erkannt."& @CRLF & @CRLF &"Ich distanziere mich hiermit ausdrücklich von Schadsoftware"& @CRLF &"in jeglicher Form!"& @CRLF & @CRLF &"©2015 andygo") = 2 then exit
global $loopstop = 0, $v = 0, $what, $writecount, $aRecords, $starttime

$pre = GuiCreate("EMM.Tool", 290, 150)
$laden = GUICtrlCreateButton ("Datei laden", 4, 4, 90, 20)
GUICtrlSetOnEvent($laden, "_verarbeiten")
$breakup = GUICtrlCreateCheckbox("STOP", 100, 4, 90, 20)
GUICtrlSetState($breakup, $GUI_DISABLE)
$save = GUICtrlCreateButton ("Datei speichern", 195, 4, 90, 20)
GUICtrlSetState($save, $GUI_DISABLE)
GUICtrlSetOnEvent($save, "_ausgabe")
$inf1 = GUICtrlCreateLabel("", 4, 55, 282, 20, $SS_CENTER)
$inf2 = GUICtrlCreateLabel("", 4, 75, 282, 20, $SS_CENTER)
$label = GUICtrlCreateLabel("Bereit...", 4, 110, 282, 40, $SS_CENTER)
GUICtrlSetFont($label, 16, 500)
GUISetOnEvent($GUI_EVENT_CLOSE, "_Quit2")
GUISetState(@SW_SHOW)

while 1
    Sleep(50)
wend
Func _Quit2()
    if msgbox(68,"EMM.Tool","Wirklich beenden?") = 6 then Exit
EndFunc

func addup($a, $b, $c, $d)
    GUICtrlSetData($label, $what & $v)
    GUICtrlSetData($inf2, "In: "&$aRecords[0] & " Datensätze ** " & "Out: " &$writecount & " Datensätze")
    if GUICtrlRead($breakup) = 1 then $loopstop = 1
endfunc
func addsave($a, $b, $c, $d)
    GUICtrlSetData($label, "Schreibe... " & $writecount)
endfunc

func _verarbeiten()
    Opt("GuiOnEventMode", 0)
    GUICtrlSetState($laden, $GUI_DISABLE)
    GUICtrlSetState($save, $GUI_DISABLE)
    GUICtrlSetState($breakup, $GUI_UNCHECKED)
    GUICtrlSetData($label, "Bereit...")
    GUICtrlSetData($inf1, "")
    GUICtrlSetData($inf2, "")
    $infile = FileOpenDialog("Datei", @ScriptDir & "\", "emm-log (*.*)", 1)
    If @error Then
        ;MsgBox(4096, "", "No File(s) chosen")
    Else
        If Not _FileReadToArray($infile, $aRecords) Then
            MsgBox(4096, "Fehler", "Fehlerhafte Datei")
        Else
            $writecount = 0
            $v = StringSplit($infile, "\")
            GUICtrlSetData($inf1, $infile)
            GUICtrlSetData($inf2, $aRecords[0] & " Datensätze gelesen")
            $v = $aRecords[0]
            $what = "Prüfe... "
            GUICtrlSetState($breakup, $GUI_enABLE)
            global $dRecords[$aRecords[0]+1], $eRecords[$aRecords[0]+1], $i, $x, $y, $date
            _Timer_SetTimer($pre,50,"addup")
            For $x = 1 To $aRecords[0]
                if $loopstop = 1 then exitloop
                $v -= 1
                if StringLen ($aRecords[$x]) > 0 Then
                    $bRecords = StringLeft($aRecords[$x], 59)
                    $bRecords = StringReplace($bRecords, StringMid ( $bRecords, 22, 20), " ")
                    $dRecords[$x] = StringReplace($bRecords, 33, "xxxxxx")
                    $bRecords = StringSplit($aRecords[$x], "   ", 1)
                    $eRecords[$x] = $bRecords[3]
                endif
            Next
            $v = 0
            for $x = 1 to ($aRecords[0] - 1)
                $v += $x
            next
            $what = "Vergleiche... "
            $i = 1
            For $y = 1 To $aRecords[0]-1
                if $loopstop = 1 then ExitLoop
                For $x = $y+1 To $aRecords[0]
                    if $loopstop = 1 then exitloop 2
                    $v -= 1
                    if $eRecords[$y] = $eRecords[$x] and StringLen ($eRecords[$y]) > 0 then
                        $i += 1
                        $eRecords[$x] = ""
                        $date = StringLeft($dRecords[$x], 19)
                    endif
                next
                if StringLen($eRecords[$y]) > 0 then
                    $writecount += 1
                    if $i = 1 then $dRecords[$y] = $dRecords[$y] &"   "&  $i
                    if $i > 1 then $dRecords[$y] = $dRecords[$y] &"   "&  $i &"   "& $date
                endif
                $i = 1
            next
            _Timer_KillAllTimers($pre)
            GUICtrlSetData($inf2, "In: "&$aRecords[0] & " Datensätze ** " & "Out: " &$writecount & " Datensätze")
            GUICtrlSetState($breakup, $GUI_DISABLE)
            if $loopstop = 1 Then
                GUICtrlSetData($label, "Abbruch durch Benutzer")
                GUICtrlSetState($breakup, $GUI_UNCHECKED)
                GUICtrlSetData($inf1, "")
                GUICtrlSetData($inf2, "")
                $loopstop = 0
            Else
                GUICtrlSetData($label, "Fertig :-)")
                GUICtrlSetState($save, $GUI_enABLE)
            endif
        endif
    endif
    GUICtrlSetState($laden, $GUI_enABLE)
    Opt("GuiOnEventMode", 1)
endfunc

func _ausgabe()
    GUICtrlSetState($save, $GUI_DISABLE)
    GUICtrlSetState($laden, $GUI_disABLE)
    Local $file = FileSaveDialog("Datei", @ScriptDir, "emm-log (*.txt)", 2+16)
    If @error Then
        GUICtrlSetData($label, "Fertig :-)")
    Else
        if StringInStr(StringRight($file, 4), ".") = 0 then $file = $file & ".txt"
        FileOpen($file, 2)
        If $file = -1 Then
            GUICtrlSetData($label, "Fertig :-)")
            FileClose($file)
            MsgBox(0, "Fehler", "Speichern nicht möglich.")
        else
            _Timer_SetTimer($pre,50,"addsave")
            GUICtrlSetState($breakup, $GUI_enABLE)
            FileWriteLine($file, "[code]")
            For $x = 1 To $aRecords[0]
                if StringLen($eRecords[$x]) > 0 then
                    FileWriteLine($file, $dRecords[$x])
                    $writecount -= 1
                endif
            next
            FileWriteLine($file, "[/code]")
            FileClose($file)
            _Timer_KillAllTimers($pre)
            GUICtrlSetState($breakup, $GUI_DISABLE)
            if $loopstop = 1 Then
                GUICtrlSetData($label, "Abbruch durch Benutzer")
                GUICtrlSetState($breakup, $GUI_UNCHECKED)
                $loopstop = 0
            Else
                GUICtrlSetData($label, "Datei gespeichert")
            endif
        Endif
    endif
    GUICtrlSetState($save, $GUI_enABLE)
    GUICtrlSetState($laden, $GUI_enABLE)
endfunc

The main loop-inloop is between line 91 and line 108. this is what takes time.

 

here is an example of 5 lines, how a file looks like to feed the script with. you can create a 3.000 lines file from it.

the lines always start with date-time-stamp in always same format. then 3 spaces, then a block of 16 chars, then again 3 spaces.

in each line this is always the same count of chars. now it follows a long long string, this could be different long in each line. it is followed always by 3 spaces and then the word 'written' or 'blocked'

 

so, now what my script does: it counts how many times the long long string exists in a file, so each line will be compared with each. it counts the same lines,

cut them of a little bit, writes the count behind the cutted string and behind the count, the date-time from the last same string.

 

the output file then looks like this:

2015/02/15 04:48:55   8270870102xxxxxx81   1
2015/02/15 04:48:56   8270300102xxxxxx0F   1
2015/02/15 04:49:03   8270870102xxxxxx16   1
2015/02/15 04:49:07   8270870102xxxxxx70   1
2015/02/15 04:49:07   8270200102xxxxxx0F   1
2015/02/15 04:39:57   8270870102xxxxxx81   142   2015/02/20 12:01:20
2015/02/15 04:39:58   8270870102xxxxxx81   144   2015/02/20 11:51:56
2015/02/15 04:39:58   8270310102xxxxxx0F   1
2015/02/15 04:40:05   8270870102xxxxxx81   142   2015/02/20 11:51:57
2015/02/15 04:40:09   8270970102xxxxxx91   77   2015/02/20 11:57:40

 

inexample.txt

Share this post


Link to post
Share on other sites

#13 ·  Posted (edited)

Sorting the data will make counting much faster. About 10 times faster.

#include <Array.au3>

Opt( "MustDeclareVars", 1 )

Example()


Func Example()
  ; 1. Read file
  Local $aArray1 = FileReadToArray( "inexample1.txt" )
  ;_ArrayDisplay( $aArray1 )

  ; 2. Create 2D-array
  Local $iRows = UBound( $aArray1 )
  Local $aArray2[$iRows][4], $aLine
  For $i = 0 To $iRows - 1
    $aLine = StringSplit ( $aArray1[$i], "   ", 3 ) ; 3 = $STR_ENTIRESPLIT + $STR_NOCOUNT
    $aArray2[$i][0] = $aLine[0] ; Col 0 : Time
    $aArray2[$i][1] = $aLine[1] ; Col 1 : 16 chars
    $aArray2[$i][2] = $aLine[2] ; Col 2 : Long string
    $aArray2[$i][3] = $aLine[3] ; Col 3 : blocked/written
  Next
  ;_ArrayDisplay( $aArray2 )

  ; 3. Index based sort of 2D-array, $aArray2 ($aArray2 is not changed)
  Local $tIndex = DllStructCreate( "uint[" & $iRows & "]" )
  Local $pIndex = DllStructGetPtr( $tIndex )
  ; Sort 2D-array by column 2 (long string)
  ; Sort duplicates by column 0 (time)
  Local $aCmps[2][3] = [ _
    [ 2, 1, +1 ], _ ; Col 2: Compared as strings, asc
    [ 0, 1, +1 ] ]  ; Col 0: Compared as strings, asc
  SortArray( $aArray2, $pIndex, $tIndex, $aCmps )
  ; http://www.autoitscript.com/forum/index.php?showtopic=173279
  ; See post 15

  ; 4. Extract rows in sorted order
  Local $aSorted[$iRows][5], $k
  For $i = 0 To $iRows - 1
    $k = DllStructGetData($tIndex,1,$i+1)
    $aSorted[$i][0] = $aArray2[$k][0] ; Col 0 : Time
    $aSorted[$i][1] = $aArray2[$k][1] ; Col 1 : 16 chars
    $aSorted[$i][2] = $aArray2[$k][2] ; Col 2 : Long string
    $aSorted[$i][3] = $aArray2[$k][3] ; Col 3 : blocked/written
    $aSorted[$i][4] = $k              ; Col 4 : Index in $aArray2 (to display results in file order instead of sort order)
  Next
  ;_ArrayDisplay( $aSorted )

  ; 5. Count equal strings in col 2
  Local $aResults[$iRows][5], $iRes = 0, $iCnt
  For $i = 0 To $iRows - 2
    $iCnt = 1
    $aResults[$iRes][0] = $aSorted[$i][0] ; First time
    $aResults[$iRes][1] = StringLeft( $aSorted[$i][2], 10 ) & "xxxxxx" & StringMid ( $aSorted[$i][2], 17, 2 )
    $aResults[$iRes][4] = $aSorted[$i][4] ; File index
    While $i < $iRows - 1 And $aSorted[$i][2] = $aSorted[$i+1][2]
      $i += 1
      $iCnt += 1
    WEnd
    $aResults[$iRes][2] = $iCnt ; Count
    $aResults[$iRes][3] = $aSorted[$i][0] ; Last time
    $iRes += 1
  Next
  ReDim $aResults[$iRes][5]
  ;_ArrayDisplay( $aResults )

  ; If order of results doesn't matter, you can skip step 6.
  ; and create $aResults2 from $aResults wihtout a sorting.

  ; 6. Sort results in file order instead of sort order
  Local $tIndex = DllStructCreate( "uint[" & $iRes & "]" )
  Local $pIndex = DllStructGetPtr( $tIndex )
  ; Sort 2D-array by column 4 (file index)
  Local $aCmps[1][3] = [ _
    [ 4, 0, +1 ] ]  ; Col 4: Compared as numbers, asc
  SortArray( $aResults, $pIndex, $tIndex, $aCmps )

  ; 7. Extract the rows in file order
  Local $aResults2[$iRes], $k
  For $i = 0 To $iRes - 1
    $k = DllStructGetData($tIndex,1,$i+1)
    $aResults2[$i] = $aResults[$k][0] & "   " & _ ; First time
                     $aResults[$k][1] & "   " & _ ; xxxxxx-string
                     StringFormat( "%3d", $aResults[$k][2] ) ; Count
    If $aResults[$k][2] > 1 Then $aResults2[$i] &= "   " & $aResults[$k][3] ; Last time
    ConsoleWrite( $aResults2[$i] & @CRLF )
  Next
  ;_ArrayDisplay( $aResults2 )

  ; 8. Store results
  Local $s = "[code]" & @CRLF
  For $i = 0 To $iRes - 1
    $s &= $aResults2[$i] & @CRLF
  Next
  $s &= "[/code]" & @CRLF
  FileWrite( "inexample2.txt", $s )
EndFunc


Func SortArray( ByRef $aItems, $pIndex, $tIndex, $aCmps )
  Local $iCmps = UBound( $aCmps ), $c, $r, $v[$iCmps]
  Local $lo, $hi, $mi
  For $i = 0 To UBound( $aItems ) - 1
    For $j = 0 To $iCmps - 1
      $v[$j] = $aItems[$i][$aCmps[$j][0]] ; Values
    Next
    $lo = 0
    $hi = $i - 1
    While $lo <= $hi ; Binary search
      $r = 0 ; Compare result (-1,0,1)
      $j = 0 ; Index in $aCmps array
      $mi = Int( ( $lo + $hi ) / 2 )
      While Not $r And $j < $iCmps                    ; This While-loop handles sorting by multiple
        $c = $aCmps[$j][0]   ; Column                 ; columns. Column values of the two rows are
        Switch $aCmps[$j][1] ; Number/string          ; compared until a difference is found.
          Case 0 ; Compare column values as numbers. The following line is an implementation of the spaceship or three-way comparison operator for numbers like StringCompare is for strings.
            $r = ( $v[$j] < $aItems[DllStructGetData($tIndex,1,$mi+1)][$c] ? -1 : $v[$j] = $aItems[DllStructGetData($tIndex,1,$mi+1)][$c] ? 0 : 1 ) * $aCmps[$j][2] ; * $iCmpAsc
          Case 1 ; Compare column values as strings. StringCompare is a spaceship or three-way comparison operator for strings.
            $r = StringCompare( $v[$j], $aItems[DllStructGetData($tIndex,1,$mi+1)][$c] ) * $aCmps[$j][2] ; * $iCmpAsc
        EndSwitch
        $j += 1
      WEnd
      Switch $r
        Case -1
          $hi = $mi - 1
        Case  1
          $lo = $mi + 1
        Case  0 ; Equal
          ExitLoop
      EndSwitch
    WEnd
    If $i > $mi Then _ ; Make space for the row number in index
      DllCall( "kernel32.dll", "none", "RtlMoveMemory", "struct*", $pIndex+($mi+1)*4, "struct*", $pIndex+$mi*4, "ulong_ptr", ($i-$mi)*4 )
    DllStructSetData( $tIndex, 1, $i, $mi+1+($lo=$mi+1) ) ; Insert row number $i at position $mi+1+($lo=$mi+1) in index
  Next
EndFunc

The zip contains inexample1.txt with 3000 lines and results in inexample2.txt and inexample2-a.txt. Code in tst00.au3 and tst00-a.au3. The a-versions are your versions.

Files.7z

Edited by LarsJ
1 person likes this

Share this post


Link to post
Share on other sites

#14 ·  Posted (edited)

LarsJ,

Beware you've a little glitch in grabbing the end of the hex part.

Here's a completely different approach, using SQLite:

#include <SQLite.au3>
;~ #include <SQLite.Dll.au3>    ; comment this after installing the SQLite3.dll library

_SQLite_Startup()
Local $hDB = _SQLite_Open()     ; memory DB

_SQLite_Exec($hDB, "begin")
_SQLite_Exec($hDB, _
"CREATE TABLE andygo (" & _
"  Date CHAR, " & _
"  Hex CHAR, " & _
"  Count INTEGER DEFAULT 1, " & _
"  LastDate CHAR DEFAULT '');" & _
"" & _
"CREATE INDEX ixHex ON andygo (Hex);" & _
"" & _
"CREATE TRIGGER trIns " & _
"BEFORE INSERT " & _
"ON andygo " & _
"WHEN exists (select 1 from andygo A where A.hex = new.hex) " & _
"BEGIN " & _
"     update andygo set count = count + 1, lastdate = new.date where hex = new.hex;" & _
"     select raise(ignore);" & _
"END;" _
)

Local $aIn = StringRegExp(FileRead("inexample1.txt"), "(?m)^(.{19})(?:.{22})([^ ]+)", 3)
Local $iRounds = 20 * Int(UBound($aIn) / 20)
Local $iRest = Mod(UBound($aIn), 20)
For $i = 0 To $iRest - 1 Step 2
    _SQLite_Exec($hDB, "insert into andygo (date, hex) values ('" & $aIn[$i] & "','" & $aIn[$i + 1] & "')")
Next
For $i = $iRest To UBound($aIn) - 1 Step 20
    _SQLite_Exec($hDB, "insert into andygo (date, hex) values " & _
        "('" & $aIn[$i     ] & "','" & $aIn[$i +  1] & "')," & _
        "('" & $aIn[$i +  2] & "','" & $aIn[$i +  3] & "')," & _
        "('" & $aIn[$i +  4] & "','" & $aIn[$i +  5] & "')," & _
        "('" & $aIn[$i +  6] & "','" & $aIn[$i +  7] & "')," & _
        "('" & $aIn[$i +  8] & "','" & $aIn[$i +  9] & "')," & _
        "('" & $aIn[$i + 10] & "','" & $aIn[$i + 11] & "')," & _
        "('" & $aIn[$i + 12] & "','" & $aIn[$i + 13] & "')," & _
        "('" & $aIn[$i + 14] & "','" & $aIn[$i + 15] & "')," & _
        "('" & $aIn[$i + 16] & "','" & $aIn[$i + 17] & "')," & _
        "('" & $aIn[$i + 18] & "','" & $aIn[$i + 19] & "')" _
    )
Next
_SQLite_Exec($hDB, "commit")

Local $aOut, $iRows, $iCols
; you may use the order by clause of your choice or remove the clause if you don't need any sort order
_SQLite_GetTable($hDB,  "select Date || '   ' || substr(hex, 1, 10) || 'xxxxxx' || substr(hex, -2, 2) || '   '" & _
                        " || count || '   ' || lastdate from andygo order by count desc, lastdate", $aOut, $iRows, $iCols)
_ArrayDelete($aOut, "0-1")
ConsoleWrite(_ArrayToString($aOut, @LF) & @LF)

_SQLite_Close($hDB)
_SQLite_Shutdown()

Here, we use a "before insert" trigger to check whether the hex part already exists. If it does, we simply increment the count in the existing row and drop the insertion, else we insert the new row with default count = 1. If the row already exists, we also sore the new date in the lastdate column of the existing row.

We also use multiple VALUES per insert statement (here, 20) to speed up insertion. Wrapping the whole insertion block in a transaction also speeds things up.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

No jchd, you have a little glitch. The two characters after "xxxxxx" are not the two last characters of the long string, but character no. 17 and 18.

No matter what, an SQL-solution is always interesting. And in this case pretty fast too. Too much overhead is often a problem with SQL and small amounts of data.

Share this post


Link to post
Share on other sites

Ha, sorry for misinterpreting. Then retrieval goes like this:

_SQLite_GetTable($hDB,  "select Date || '   ' || substr(hex, 1, 10) || 'xxxxxx' || substr(hex, 17, 2) || '   '" & _
                        " || count || '   ' || lastdate from andygo -- order by count desc, lastdate", $aOut, $iRows, $iCols)

SQLite triggers are a bit slow compared to some other engines (e.g. client/servers) and this shows here. It isn't named "lite" without reason. Yet the speed is still acceptable and I liked the idea of only inserting rows having unique hex strings. Another nice feature is that if the use case requires some odd formatting and/or sorting and/or selection within a given date range, then SQL is quite powerful and efficient.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

LarsJ, this is pretty cool :thumbsup: . tried it and works. also with a 25.000 lines file. now i will try to use it with GUI.

Feedback follows in 2 days latest.

Share this post


Link to post
Share on other sites

jchd, I like your idea too: Only inserting rows having unique hex strings.

andygo, If you need even more speed because of large files, there is probably room for more optimization by using jchd's idea. Especially if there is a large number of duplicates. The sorting code is just a standard binary sorting. All rows are sorted. Including duplicates.

Share this post


Link to post
Share on other sites

realisitc input size in most cases is ~ 3.000 to 5.000 lines. to test a 25.000 lines file was a pal's idea just to see the scripts result and time. but i like it and no matter if a user dont recognize timediffernce between yours and jchd sql solution, i will test it too :) feedback follows.

Share this post


Link to post
Share on other sites

#20 ·  Posted (edited)

Good evening, here are the results with a 25.000 lines input.

 

LarsJ code needs ~ 10 seconds

jchd sql code needs ~ 4,5 seconds.

:thumbsup:

 

special test with a 156.000 lines file (34mb input): solved in 21 seconds. crazy fast

Edited by andygo
additional information

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now

  • Similar Content

    • iXX
      By iXX
      Hi!
      Looking for working code to  get full path of process  - both 32 & 64 bit.
      I tryed this bellow, but it works only for 32-bit processes, even if compiled for x64...
      Thanx for suggestions!
       
      Func _ProcessGetPath($vProcess) ;get the program path done by MrCreatoR Local $iPID = ProcessExists($vProcess) If NOT $iPID Then Return SetError(1, 0, -1) Local $aProc = DllCall('kernel32.dll', 'hwnd', 'OpenProcess', 'int', BitOR(0x0400, 0x0010), 'int', 0, 'int', $iPID) If NOT IsArray($aProc) OR NOT $aProc[0] Then Return SetError(2, 0, -1) Local $vStruct = DllStructCreate('int[1024]') Local $hPsapi_Dll = DllOpen('Psapi.dll') If $hPsapi_Dll = -1 Then $hPsapi_Dll = DllOpen(@SystemDir & '\Psapi.dll') If $hPsapi_Dll = -1 Then $hPsapi_Dll = DllOpen(@WindowsDir & '\Psapi.dll') If $hPsapi_Dll = -1 Then Return SetError(3, 0, '') DllCall($hPsapi_Dll, 'int', 'EnumProcessModules', _ 'hwnd', $aProc[0], _ 'ptr', DllStructGetPtr($vStruct), _ 'int', DllStructGetSize($vStruct), _ 'int_ptr', 0) Local $aRet = DllCall($hPsapi_Dll, 'int', 'GetModuleFileNameEx', _ 'hwnd', $aProc[0], _ 'int', DllStructGetData($vStruct, 1), _ 'str', '', _ 'int', 2048) DllClose($hPsapi_Dll) If NOT IsArray($aRet) OR StringLen($aRet[3]) = 0 Then Return SetError(4, 0, '') Return $aRet[3] EndFunc  
    • j0kky
      By j0kky
      Hi guys,
      Bitwise operations in Autoit is possible only till 32 bit integers, but sometimes WinAPI requires to process 64 bit vectors... so?
      So you can use this little UDF to handle properly those integers!
      Func _BitAND64($iValue1, $iValue2) If Not ((VarGetType($iValue1) = "Int64") Or (VarGetType($iValue2) = "Int64")) Then Return BitAND($iValue1, $iValue2) $iValue1 = __DecToBin64($iValue1) $iValue2 = __DecToBin64($iValue2) Local $aValueANDed[64], $i For $i = 0 To 63 $aValueANDed[$i] = ($iValue1[$i] And $iValue2[$i]) ? 1 : 0 Next Return __BinToDec64($aValueANDed) EndFunc ;==>_BitAND64 Func _BitOR64($iValue1, $iValue2) If Not ((VarGetType($iValue1) = "Int64") Or (VarGetType($iValue2) = "Int64")) Then Return BitOR($iValue1, $iValue2) $iValue1 = __DecToBin64($iValue1) $iValue2 = __DecToBin64($iValue2) Local $aValueORed[64], $i For $i = 0 To 63 $aValueORed[$i] = ($iValue1[$i] Or $iValue2[$i]) ? 1 : 0 Next Return __BinToDec64($aValueORed) EndFunc ;==>_BitOR64 Func _BitXOR64($iValue1, $iValue2) If Not ((VarGetType($iValue1) = "Int64") Or (VarGetType($iValue2) = "Int64")) Then Return BitXOR($iValue1, $iValue2) $iValue1 = __DecToBin64($iValue1) $iValue2 = __DecToBin64($iValue2) Local $aValueXORed[64], $i For $i = 0 To 63 $aValueXORed[$i] = (($iValue1[$i] And (Not $iValue2[$i])) Or ((Not $iValue1[$i]) And $iValue2)) ? 1 : 0 Next Return __BinToDec64($aValueXORed) EndFunc ;==>_BitXOR64 Func _BitNOT64($iValue) If Not (VarGetType($iValue) = "Int64") Then Return BitNOT($iValue) $iValue = __DecToBin64($iValue) For $i = 0 To 63 $iValue[$i] = Not $iValue[$i] Next Return __BinToDec64($iValue) EndFunc ;==>_BitNOT64 Func __DecToBin64($iDec) Local $tDec = DllStructCreate("int64 num"), $aBin[64], $bBit, $i $tDec.num = $iDec For $i = 0 To 63 $bBit = (Mod($tDec.num, 2) ? 1 : 0) $aBin[63 - $i] = $bBit $tDec.num = Floor($tDec.num / 2) Next Return $aBin EndFunc ;==>__DecToBin64 Func __BinToDec64($aBin) Local $tDec = DllStructCreate("int64 num"), $i If $aBin[0] Then $tDec.num += 0x8000000000000000 ;2^63 = 9223372036854775808, but for Autoit the world ends at 9223372036854775807 (2^63 - 1) For $i = 1 To 63 If $aBin[$i] Then $tDec.num += 2 ^ (63 - $i) Next Return $tDec.num EndFunc ;==>__BinToDec64 If you are working with unsigned 64 bit integers and these functions return a negative value, don't worry, bitwise operations come out well, but Autoit manages all numbers as signed integers.
    • AutID
      By AutID
      Hello,

      I will put it as simple as possible
      Why this code runs perfectly on 32bit and it fails on 64bit?
      Local $hWND = WinGetProcess("[CLASS:LSS_app]") ConsoleWrite($hWND & @LF) Local $hModuleList = _WinAPI_EnumProcessModules($hWND) If @error Then ConsoleWrite("Error: " & @error & @LF) For $i = 0 To $hModuleList[0][0] - 1 ;~ If StringInStr($hModuleList[$i][1], "sysCap64.dll") Then ConsoleWrite($hModuleList[$i][0] & @LF) ;~ EndIf Next As the title says EnumProcessModules returns error 10 which I have no clue what it is.
      It must be something with autoit or my lack of coding because a similar code in C# will work like a charm on both x86 and x64
      Process[] Processes = Process.GetProcessesByName("winLSS64Cap"); Process nProcess = Processes[0]; Handle = OpenProcess(0x10, true, (uint)nProcess.Id); for(int i = 0; i < nProcess.Modules.Count; i++) { Console.WriteLine(nProcess.Modules[i].ModuleName); }  
    • Michiel
      By Michiel
      As documented here, it is possible to bypass registry redirection when running a 32bit application on a 64bit Windows installation, using HKLM64 or HKCR64. I quote:
       
       
      In this thread, >this feature's existence is denied.

      Also, if this feature exists and works, does it work on both production and beta? And can I also specificy HKEY_LOCAL_MACHINE64 and HKEY_CLASSES_ROOT64 instead of HKLM64 and HKCR64?
    • drego
      By drego
      During the AutoIt installer it gives the option to install 32bit or 64bit tools. Most systems these days are 64bit. Should I be creating autoit programs for both? Do autoit compiled applications really benefit from being 64 vs 32? If compiled for 64bit will there be a problem using the mass of UDFs available or does that not matter?
      Under what circumstances is compiling for 64bit going to make my program run better? Are there any downsides? Any differences in how I develop, use UDFs, DLLs, anything?
      Also, if I install 64bit Autoit tools can I still compile for 32bit and what about IE browser controls will they then be the 64bit IE?
      Thanks.