Jump to content

Faster String in file search?


DaLiMan
 Share

Recommended Posts

Hi,

Below is a piece of my script which searches for a product number from one file in the other file.
When this is found I need to valuate if something is changed and then write this back to a file with only the changed products.

It is working. But very very slow! ( aprox.10 minutes for 2000 lines.)
Now it is to be mentioned that the each file containes more than > 330000 lines.

 

Does anyone know a better / faster way to code this?

 

$File_1 = FileOpenDialog("Select file 1.", "C:\", "Text (File*.txt)", $FD_FILEMUSTEXIST)
$File_1_hndl = FileOpen($File_1)

$File_2 = FileOpenDialog("Select file 2.", "C:\", "Text (File*.txt)", $FD_FILEMUSTEXIST)
$File_2_hndl = FileOpen($File_2)
$File_2_String = FileRead($File_2_hndl)

_FileReadToArray($File_1, $File_1_Array, $FRTA_NOCOUNT)


ToolTip(@CRLF & "Looping." & @CRLF & @CRLF & "", 1500, 100, $Name_Versie_Info, 1, 1)
For $i = 0 To UBound($File_1_Array) - 1
    If Mod($i, 1000) = 0 Then
        ToolTip(@CRLF & "Looping....> " & $i & @CRLF & @CRLF & "", 1500, 100, $Name_Versie_Info, 1, 1)
    EndIf
    $File_1_lineread = $File_1_Array[$i]
    $File_1_ART = StringMid($File_1_lineread, 2, 7)
    $File_1_OMSCHRIJVING = StringMid($File_1_lineread, 268, 32)
    $File_1_HGS = StringMid($File_1_lineread, 525, 6)

    $File_2_Found = StringRegExp($File_2_String, "(1" & $File_1_ART & "      .*)", 1)
    $File_2_HGS = StringMid($File_2_Found[0], 525, 6)

    If $File_1_HGS <> $File_2_HGS Then
        $HGSwijzig_String = $HGSwijzig_String & $File_1_ART & ";" & $File_1_OMSCHRIJVING & ";" & $File_1_HGS & ";" & $File_2_HGS & @CRLF
    EndIf

Next

 

Link to comment
Share on other sites

im still enjoying compare-object.  reliable output, reasonable speed on large objects.  For file contents it looks like so:

#include <AutoItConstants.au3>

FileDelete("A.txt")
FileDelete("B.txt")

filewrite("A.txt" , "test" & @CRLF & "Atestline1" & @CRLF & "line2")
filewrite("B.txt" , "test" & @CRLF & "Btestline1" & @CRLF & "line2")

$iPid = run('powershell Compare-Object $(Get-Content A.txt) $(Get-Content B.txt)"' ,"" ,@SW_HIDE, $STDOUT_CHILD)

$sOut = ""

While 1
    $sOut &= StdoutRead($iPID)
    If @error Then ExitLoop
WEnd

msgbox(0, '' , $sOut)

 

Edited by iamtheky

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

12 hours ago, iamtheky said:

im still enjoying compare-object.  reliable output, reasonable speed on large objects.  For file contents it looks like so:

 

Seems promissing, however this would give me to much output because the file has different data for the product which change within time.
I only need to know if a particulair piece is changed.  ( There a more changes that are not of importance )

Anyway to check only for that piece of change in the string?
Or maybe first write a new set of data files with only the 2 columns that matter?

 

Edited by DaLiMan
typo
Link to comment
Share on other sites

Your file1 seems to be fixed length records

This example shows how to split this easier in only needed columns

#include <array.au3>
local $DummyStr="AAABBBCCCCCCDDD"& @CRLF

;~ Create a dummy teststring
;~ for $i=1 to 18    ;~ do not forget to remove consolewrite and _arraydisplay
for $i=1 to 2
   $DummyStr=$DummyStr & $dummystr
Next
$T=stringreplace($dummystr,@CRLF,@CRLF)
consolewrite("Number of lines " & @extended & @CRLF)

Local $hTimer = TimerInit()
consolewrite("Timer started " & TimerDiff($hTimer) & @CRLF)
$newString= StringRegExpReplace($DummyStr,"(.{3})(.{3})(.{6})(.{3})(\r\n)","$4;$3;$2;$1;$5")
consolewrite("Switched the columns and added a semicolon " & TimerDiff($hTimer) & @CRLF)
consolewrite($newString)

local $aRetArray
$aRetArray=stringsplit($newString, ";", $STR_ENTIRESPLIT)
consolewrite("Splitted in a nice array " & TimerDiff($hTimer) & @CRLF)
_arraydisplay($aRetArray)

So you can do 

$File_1_String = FileRead($File_1_hndl)

and

;~ Read away field 1 of length 1, read field 2 of length 7, read unneeded stuff, read field 4, ....
$newString= StringRegExpReplace($File_1_String,"(.{1})(.{7})(.{260})(.{32})(.{257})(.{6})(\r\n)","$2;$4;$6;")

Now you have directly only $file1 relevant fields in the string and that you can easily split on the semicolon and iterate in the array with step 3

Link to comment
Share on other sites

Hi Junkew,

I was just testing you wonderfull piece of code.
This is brilliant. I wish I understood RegEx better....
This is really fast!

I changed a little thing, because it was removing the @CRLF as well.
( I'm writing a new file from the output, also for use in other apps.)

$newString= StringRegExpReplace($File_1_String,"(.{1})(.{7})(.{259})(.{70})(.{187})(.{35})(.{64})","$2;$4;$6;")

$newString= StringRegExpReplace($File_1_String,"(.{1})(.{7})(.{259})(.{70})(.{187})(.{35})(.{64})","$2;$4;$6;")

 

Also, maybe I can play around with SQLite now like mikell suggested.
Since I read it needs csv files to be imported.
 

@iamtheky Your Poweshell code works nice on small files. (tested)
However, in this case it is running for ~30 minutes and still not done.

 

Thanx for all the help! :lmao:

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...