Jump to content

StringInStr - can this script be faster? (texts files included)


Recommended Posts

Hello,

As always, sorry for my bad english.

here is the code i have

#include <File.au3>
#include <String.au3>

$file1 = "d:\doppioniautoit\international.txt"
FileOpen($file1, 0)

$file2 = "d:\doppioniautoit\standard.txt"
FileOpen($file2, 0)

For $i = 1 to _FileCountLines($file1)
   $line = FileReadLine($file1, $i)
   $aExtract = _StringBetween($line, "(", ")")

;MsgBox(0, $line, $aExtract[0])
$itime = TimerInit()
      For $x = 1 to _FileCountLines($file2)
         $line2 = FileReadLine($file2, $x)
         Local $iPosition = StringInStr($line2, $aExtract[0], 1)
         ;Local $iPosition = StringRegExp($line2,$aExtract[0], 0)
         if $iPosition <> 0 then
            ;MsgBox(0, "Trovato", $aExtract & " " & $line2)
         endif
         ConsoleWrite($line2  & @CRLF)
      Next

  ConsoleWrite(@TAB&'Str='&TimerDiff($itime)&' ms'&@lf)
  MsgBox(0, "TIME", @TAB&'Str='&TimerDiff($itime)&' ms'&@lf)
Next
FileClose($file1)

So, what do i want to do? I try to explain with my poor english :)  Basically, i have 2 text files (see attachments below). They both contains movie titles with Director and Year  in this form

Movie Title (Director, Year)

"Standard.txt" contains, mostly, italian titles. "International.txt", as you can image, contains the internationals one. With the script i would like to search for the Director, Year of "international.txt" in the "standard.txt" file.

For example... first row of "international.txt" is "¡Atraco! (Cortés, 2012)". The script takes just the "Cortés, 2012" and it searches for it in the standard. txt file.

The simple code i wrote works...  I tried using StringInStr and using StringRegExp.. they both need about 2 minutes and 30 seconds (stringinstr is little faster) to process one row.

I was wondering... is there any other method to make it faster using autoit? Any help would be much appreciated, thx!

 

 

standard.txt

international.txt

Edited by Italiano
typo
Link to comment
Share on other sites

HI, i would make some changes, for example, i wouldn't re-declare Local $iPosition in the loop like that, and i'd move the linecount to the top like so:

#include <File.au3>
#include <String.au3>
Local $iPosition

$file1 = "d:\doppioniautoit\international.txt"
FileOpen($file1, 0)
Local $LinesC1 = _FileCountLines($file1)

$file2 = "d:\doppioniautoit\standard.txt"
FileOpen($file2, 0)
Local $LinesC2 = _FileCountLines($file2)

For $i = 1 to $LinesC1
   $line = FileReadLine($file1, $i)
   $aExtract = _StringBetween($line, "(", ")")
$itime = TimerInit()
      For $x = 1 to $LinesC2
         $line2 = FileReadLine($file2, $x)
         $iPosition = StringInStr($line2, $aExtract[0], 1)
         if $iPosition <> 0 then
            ConsoleWrite($aExtract & " " & $line2)
         endif
         ConsoleWrite($line2  & @CRLF)
      Next
  ConsoleWrite(@TAB&'Str='&TimerDiff($itime)&' ms'&@lf)
  MsgBox(0, "TIME", @TAB&'Str='&TimerDiff($itime)&' ms'&@lf)
Next
FileClose($file1)
FileClose($file2)

Oh, and i'd change timers for @‌MIN&':'&@MSEC in msgbox and consolewrite.

Edited by careca
Spoiler

Renamer - Rename files and folders, remove portions of text from the filename etc.

GPO Tool - Export/Import Group policy settings.

MirrorDir - Synchronize/Backup/Mirror Folders

BeatsPlayer - Music player.

Params Tool - Right click an exe to see it's parameters or execute them.

String Trigger - Triggers pasting text or applications or internet links on specific strings.

Inconspicuous - Hide files in plain sight, not fully encrypted.

Regedit Control - Registry browsing history, quickly jump into any saved key.

Time4Shutdown - Write the time for shutdown in minutes.

Power Profiles Tool - Set a profile as active, delete, duplicate, export and import.

Finished Task Shutdown - Shuts down pc when specified window/Wndl/process closes.

NetworkSpeedShutdown - Shuts down pc if download speed goes under "X" Kb/s.

IUIAutomation - Topic with framework and examples

Au3Record.exe

Link to comment
Share on other sites

Something along these lines.

#include <File.au3>
#include <String.au3>

Local $aFile1
Local $aFile2
_FileReadToArray( @ScriptDir & "\international.txt", $aFile1)
_FileReadToArray(@Scriptdir & "\standard.txt", $aFile2)

For $i = 1 to $aFile1[0]
   $aExtract = _StringBetween($aFile1[$i], "(", ")")


$itime = TimerInit()
      For $i2= 1 to $aFile2[0]
         Local $iPosition = StringInStr($aFile2[$i2], $aExtract[0], 1)
         ;Local $iPosition = StringRegExp($line2,$aExtract[0], 0)
         if $iPosition <> 0 then
            ;MsgBox(0, "Trovato", $aExtract & " " & $line2)
         endif
         ConsoleWrite($aFile2[$i2]  & @CRLF)
      Next

  ConsoleWrite(@TAB&'Str='&TimerDiff($itime)&' ms'&@lf)
  MsgBox(0, "TIME", @TAB&'Str='&TimerDiff($itime)&' ms'&@lf)
Next

Not perfect, just the foundation. 

Link to comment
Share on other sites

ViciousXUSMC faster than me!

#include <File.au3>
#include <String.au3>
Opt("TrayAutoPause", 0)

Global $sFile1 = @ScriptDir & "\international.txt"
Global $sFile2 = @ScriptDir & "\standard.txt"

Global $aContentFile1 = FileReadToArray($sFile1)
Global $aContentFile2 = FileReadToArray($sFile2)
Local $iExtract, $pTime, $iPosition, $sOut, $iLine
For $i = 0 To UBound($aContentFile1) - 1
    $iExtract = _StringBetween($aContentFile1[$i], "(", ")")
    If Not IsArray($iExtract) Then ContinueLoop
    $pTime = TimerInit()
    $iLine = "-> " & "LineInFile1 [" & $i & "] : " & $aContentFile1[$i] & @CRLF & "-> TextFind: " & $iExtract[0] & @CRLF
    For $x = 0 To UBound($aContentFile2) - 1
        $iPosition = StringInStr($aContentFile2[$x], $iExtract[0], 1)
        If $iPosition <> 0 Then $sOut &= "+> LineInFile2: [" & $x & "] : " & $aContentFile2[$x] & @CRLF
        ;ConsoleWrite($aContentFile2[$x] & @CRLF)
    Next
    If $sOut <> "" Then
        ConsoleWrite('Time finded: ' & TimerDiff($pTime) & ' ms' & @CRLF & "!" & $iLine & $sOut)
        MsgBox(0, "Time finded: " & TimerDiff($pTime) & ' ms', $iLine & @CRLF & $sOut)
    Else
        ConsoleWrite('Time finded: ' & TimerDiff($pTime) & ' ms' & @CRLF)
    EndIf
    $sOut = ""
Next

 

Edited by Trong

Regards,
 

Link to comment
Share on other sites

Hi,

i am not great at  RegEx, but your problem cries for....

#include <array.au3>
#include <string.au3>  
;
$international = FileRead("international.txt")                         ;files
$standard = FileRead("standard.txt")

$searcharray1 = StringRegExp($international, "(?m)^.*\((.*)\).*$", 3)  ;find all autors between ( and ) and write them into an array
;or 
$searcharray1=_stringbetween($international,"(",")")
_ArrayDisplay($searcharray1)

For $author In $searcharray1                                           ;every author in the array
    ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $author = ' & $author & @CRLF & '>Error code: ' & @error & @CRLF) ;### Debug Console

    $searcharray2 = StringRegExp($standard, "(?m)(^.*" & $author & ".*$)", 3)
    _ArrayUnique($searcharray2)                                        ;doesn´t work everytime because of white-spaces...something to do for you with the regex;)
    _ArrayDisplay($searcharray2)
Next

this also could be done with stringinstr(), but has also more code....

If you want to know (explanation!!) what the RegEx does, take a look here https://regex101.com

Edited by AndyG
Link to comment
Share on other sites

I slow!

 

For me it was processing about 2x faster, scrolling so fast that it was skipping letters in the console write.

Problem here is we need to find the better way to do what you want to, not so much make it faster the way your doing it.

Kind of like a pivot table type logic. 

Link to comment
Share on other sites

only some seconds to find all authors in standard.txt

#include <array.au3>
#include <string.au3>
;

$authors_in_standard = ""

$international = FileRead("international.txt")        ;files
$standard = FileRead("standard.txt")

$searcharray1 = StringRegExp($international, "(?m)^.*\((.*)\).*$", 3) ;find all autors between ( and ) and write them into an array
;or
$searcharray1 = _StringBetween($international, "(", ")") ;double time vs RegEx!
_ArrayDisplay($searcharray1)

$t = TimerInit()                                      ;timer start
For $author In $searcharray1                          ;every author in the array
    ; ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $author = ' & $author & @CRLF & '>Error code: ' & @error & @CRLF) ;### Debug Console

    If StringInStr($standard, $author, 1) Then        ;is faster than RegEx
        $author = StringReplace($author, "|", "/")
        $searcharray2 = StringRegExp($standard, "(?m)^(.*" & $author & ".*)$", 3);finds line
        If IsArray($searcharray2) Then                ;only if matched
            $searcharray2 = _ArrayUnique($searcharray2);number of arrayitems, eliminate equals
            For $i = 1 To UBound($searcharray2) - 1
                $authors_in_standard &= $searcharray2[$i] & @CRLF;sum all
            Next
            ; _ArrayDisplay($searcharray2)

        EndIf
    EndIf
Next

$time = TimerDiff($t)
ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $time = ' & $time & @CRLF & '>Error code: ' & @error & @CRLF) ;### Debug Console

ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $authors_in_standard = ' & $authors_in_standard & @CRLF & '>Error code: ' & @error & @CRLF) ;### Debug Console

There are some "errors" in the files, sometimes there are missing commas between the author and the year, sometimes a pipe | (RegEx means OR) and so on...results are double/multiple matches

The sourcefiles have to be edited so that the results become better.

 

I guess there is a much faster solution maybe with a scripting.dictionary or something like this (maybe database?!)

Edited by AndyG
Link to comment
Share on other sites

What I would do is take what I have above and add an extra step.

Once a match is found, remove that matching entry from the 2nd array, so each time a match is made the comparison set gets smaller and smaller causing the loop to get faster and faster until completion. 

Just not sure the fast way to do this as redim type usage is slow for large arrays.  I think _ArrayDelete() uses redim.

Something to play with though, maybe tomorrow I will find some time. 

Edited by ViciousXUSMC
Link to comment
Share on other sites

Or.. that extra verification extends the time instead of reducing. lol

 

Spoiler

Renamer - Rename files and folders, remove portions of text from the filename etc.

GPO Tool - Export/Import Group policy settings.

MirrorDir - Synchronize/Backup/Mirror Folders

BeatsPlayer - Music player.

Params Tool - Right click an exe to see it's parameters or execute them.

String Trigger - Triggers pasting text or applications or internet links on specific strings.

Inconspicuous - Hide files in plain sight, not fully encrypted.

Regedit Control - Registry browsing history, quickly jump into any saved key.

Time4Shutdown - Write the time for shutdown in minutes.

Power Profiles Tool - Set a profile as active, delete, duplicate, export and import.

Finished Task Shutdown - Shuts down pc when specified window/Wndl/process closes.

NetworkSpeedShutdown - Shuts down pc if download speed goes under "X" Kb/s.

IUIAutomation - Topic with framework and examples

Au3Record.exe

Link to comment
Share on other sites

Italiano,

This lists everything from Standard.txt that is also in International.txt using SQLite...

#include <array.au3>
#include <sqlite.au3>

Local $st = TimerInit(), $total = TimerInit()
Local $aStandard = StringRegExp(FileRead(@ScriptDir & '\standard.txt'), '(?:\((.*?)\))', 3)
Local $aInternational = StringRegExp(FileRead(@ScriptDir & '\international.txt'), '(?:\((.*?)\))', 3)
ConsoleWrite('Time to split files to arrays = ' & StringFormat('%2.4f seconds', TimerDiff($st) / 1000) & @CRLF)

$st = TimerInit()

_SQLite_Startup()
_SQLite_Open()
_SQLite_Exec(-1, 'create table t1 (c1); create table t2 (c1);')

Local $sql
For $i = 0 To UBound($aStandard) - 1
    $sql &= ( mod($i,500) = 0 ) ? ';insert into t1 values(' & _SQLite_FastEscape($aStandard[$i]) & ')' : ',(' & _SQLite_FastEscape($aStandard[$i]) & ')'
Next
_SQLite_Exec(-1, $sql)

$sql = ''
For $i = 0 To UBound($aInternational) - 1
    $sql &= ( mod($i,500) = 0 ) ? ';insert into t2 values(' & _SQLite_FastEscape($aInternational[$i]) & ')' : ',(' & _SQLite_FastEscape($aInternational[$i]) & ')'
Next
_SQLite_Exec(-1, $sql)

ConsoleWrite('Time to load SQLite = ' & StringFormat('%2.4f seconds', TimerDiff($st) / 1000) & @CRLF)

$st = timerinit()
Local $ret, $arows, $irow, $icol
_SQLite_GetTable2d(-1, 'select distinct t1.[c1] from t1 join t2 on t2.[c1] = t1.[c1] order by t1.[c1];', $arows, $irow, $icol)
ConsoleWrite('Time to get international entries that are also in standard = ' & StringFormat('%2.4f seconds', TimerDiff($st) / 1000) & @CRLF)

ConsoleWrite('Total time = ' & StringFormat('%2.4f seconds', TimerDiff($total) / 1000) & @CRLF)

_ArrayDisplay($arows)

Warning, my SQL is marginal at best.

kylomas

Edited by kylomas
streamlined the code a bit

Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Link to comment
Share on other sites

Revised my original script.
 
This is my interpretation of what you required. It will output to the console, all lines of standard.txt which have a matching "Director" and "Year" from each line of international.txt.
 
This one handles the differences in the lines. I believe it's 100% accurate.
 
; by Hydranix@gmx.com
;
; Assumes .txt files are not editable
;
; Using contents of .txt files as a basis,
;  it was determined that only the year part
;  of each line is constant enough to use to
;  programatically extract the required data
;  100% of the time without effecting performance
;  too negatively.
#NoTrayIcon

$File1 = FileOpen("T:\international.txt")
$aSearchOrig = FileReadToArray("T:\standard.txt")
$Len = UBound($aSearchOrig)-1
$aSearch = $aSearchOrig
; Preprocess array
For $i = 0 To $Len
  $line = $aSearchOrig[$i]
  $line = StringReverse($line)
  $line = StringTrimLeft($line, StringInStr($line,")"))
  $line = StringReverse(StringTrimRight($line,StringLen($line) - StringInStr($line,"(")+1))
  $aSearch[$i] = $line
Next

While 1
  $line = ""
  ; Process line to ensure acccurate results
  $line = FileReadLine($File1)
  if @error = -1 Then ExitLoop
  $line = StringReverse($line)
  $line = StringTrimLeft($line, StringInStr($line,")"))
  $line = StringTrimRight($line,StringLen($line) - StringInStr($line,"(")+1)
  ;year
  $y = StringStripWS(StringReverse(StringLeft($line,4)),3)
  $line = StringReverse($line)
  ;director
  If StringInStr($line, ", ") <> 0 Then
    $d = StringStripWS(StringTrimRight($line,6),3)
  Else
    $d = StringStripWS(StringTrimRight($line,5),3)
  EndIf

  For $i = 0 To $Len
    ; Only care if we find director
    If StringInStr($aSearch[$i],$d) <> 0 Then
      ; If director found, then ensure year is the same
      If StringInStr($aSearch[$i], $y) <> 0 Then
        ; Good match
        ConsoleWrite($aSearchOrig[$i]&@CRLF)
      EndIf
    EndIf
  Next
WEnd
FileClose($File1)

 

On my tablet this script took just over 2.5 minutes to finish.

Edited by hydranix
Link to comment
Share on other sites

I guess there is a much faster solution maybe with a scripting.dictionary or something like this (maybe database?!)

Warning, my SQL is marginal at best.

Time to get international entries that are also in standard = 0.2661 seconds
Total time = 1.4269 seconds (/EDIT Laptop in sleepmode @800Mhz )

qed...:thumbsup:

Revised my original script.
 

On my tablet this script took just over 2.5 minutes to finish.

100 times slower...but works also...so, who cares :thumbsup:

 

 

Link to comment
Share on other sites

kylomas,
SQLite is a great idea - with the ability to get the line numbers if necessary  :D

#include <SQLite.au3>
#include <Array.au3>

$t0 = TimerInit()
Local $aStandard = StringRegExp(FileRead(@ScriptDir & '\standard.txt'), "(?m)^(.*)\(([^\)]+).*$", 3)
;_ArrayDisplay($aStandard)
Local $aStandard2D[UBound($aStandard)/2][2]
For $i = 0 to UBound($aStandard)-1 step 2
   $aStandard2D[$i/2][0] = $aStandard[$i]
   $aStandard2D[$i/2][1] = $aStandard[$i+1]
Next
;_ArrayDisplay($aStandard2D)
Local $aInternational = StringRegExp(FileRead(@ScriptDir & '\international.txt'), '\(([^\)]+)', 3)
;_ArrayDisplay($aInternational)

ConsoleWrite('Building arrays = ' & StringFormat('%2.4f seconds', TimerDiff($t0) / 1000) & @CRLF)

$t1 = TimerInit()
  Local $array, $aTemp, $iRows, $iColumns
  _SQLite_Startup()
  _SQLite_Open()   ; ':memory:'
  _SQLite_Exec (-1, "CREATE TABLE table1 (id, names, authors); CREATE TABLE table2 (id, authors);") 
  _SQLite_Exec(-1, "Begin;")
  For $i = 0 to UBound($aStandard2D)-1
        _SQLite_Exec(-1, "INSERT INTO table1 VALUES (" & $i & ", " & _SQLite_FastEscape($aStandard2D[$i][0]) & ", " & _SQLite_FastEscape($aStandard2D[$i][1]) & ");")
  Next
  For $i = 0 to UBound($aInternational)-1
        _SQLite_Exec(-1, "INSERT INTO table2 VALUES (" & $i & ", " & _SQLite_FastEscape($aInternational[$i]) & ");")
  Next
  _SQLite_Exec(-1, "Commit;")
  _SQLite_GetTable2d(-1, "SELECT * FROM table1 WHERE authors IN (SELECT authors FROM table2) ;", $array, $iRows, $iColumns) 

ConsoleWrite('SQLite global work = ' & StringFormat('%2.4f seconds', TimerDiff($t1) / 1000) & @CRLF)
;_ArrayDisplay($array, "end")

$t2 = TimerInit()
Local $result[$iRows]  
For $i = 1 to $iRows
   $result[$i-1] = $array[$i][1] & "(" & $array[$i][2] & ")"
Next
;_ArrayDisplay($result, "end")
$result = _ArrayUnique($result)
ConsoleWrite('Formatting = ' & StringFormat('%2.4f seconds', TimerDiff($t2) / 1000) & @CRLF)

ConsoleWrite('Total time = ' & StringFormat('%2.4f seconds', TimerDiff($t0) / 1000) & @CRLF)
_ArrayDisplay($result, "end")
_SQLite_Close ()
_SQLite_Shutdown ()

 

Edited by mikell
typo
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...