Sign in to follow this  
Followers 0
blink314

StringInStr alternatives?

14 posts in this topic

I have to search through a textfile. I have read the entire text file into an array using _FileToArray and am now looping through the array (reading the file in is only taking 1-2 seconds... not a problem). I've been using StringInStr to check to see if my search term is in the current array line, however, this is proving to be very slow. Is there an alternative to StringInStr? Are there any dll Find functions I could use (I am an extreme novice on dll's)? Or, is there any way I could optomize the process? I have taken out as much junk as possible from my code and the slow up occurs with StringInStr. Would stripping whitespace from both ends make it faster? Any other ideas? The text files can be up to 400,000 lines.... meaning any small speed increase would greatly help!! Thanks,

Kevin

Share this post


Link to post
Share on other sites



What makes you read the file into an array?

What I would do if you really dont have any reason is try to read the file by each line... (FileReadLine() in the helpfile). It uses a loop. You can in the same loop have it search the string... if it finds the search string then it outputs that line to another file or calls a function or whatever...

JS


AutoIt Links

File-String Hash Plugin Updated! 04-02-2008 Plugins have been discontinued. I just found out.

ComputerGetInfo UDF's Updated! 11-23-2006

External Links

Vortex Revolutions Engineer / Inventor (Web, Desktop, and Mobile Applications, Hardware Gizmos, Consulting, and more)

Share this post


Link to post
Share on other sites

Right... I was under the impression that reading from an array would be quicker than reading from a file. The time required to load the file into the array is not a problem, I'm just looking to make the looping through the array quicker. I'll try it and let you know. THanks!

Kevin

Share this post


Link to post
Share on other sites

Right... I was under the impression that reading from an array would be quicker than reading from a file.  The time required to load the file into the array is not a problem, I'm just looking to make the looping through the array quicker.  I'll try it and let you know.  THanks!

Kevin

<{POST_SNAPBACK}>

Please do let me know. I was thinking it would save you a step. You could analize each line as you go down...

JS


AutoIt Links

File-String Hash Plugin Updated! 04-02-2008 Plugins have been discontinued. I just found out.

ComputerGetInfo UDF's Updated! 11-23-2006

External Links

Vortex Revolutions Engineer / Inventor (Web, Desktop, and Mobile Applications, Hardware Gizmos, Consulting, and more)

Share this post


Link to post
Share on other sites

Well, I save the step of reading in the array, but it is slower each time I read a line from the textfile. Like I said, I dont mind the time required to read in the array, I'm trying to cut down the time within the loop, which looks something like this:

For $linecount = 1 to $Filelength

$Currentline = $logarray[$linecount]

if StringInStr($currentline, $SearchString, 1,1) <> 0 then

...

...

...

endif

next for

If I run the script without reading it into an array and give it a searchstring that is NOT in the file (so none of the additional processin in the ... lines takes place), it takes about 24 seconds to search the first 100,000 lines. If I read the file to an array it takes about 25 seconds... including the time to read the file into the array. Is there any way to make the loop faster?

Kevin

Share this post


Link to post
Share on other sites

That code looks good, but the next should not have for after it. Try reading the whole file into a variable if it is small enough to fit in one string and then run StringInStr on that. Do you need to know the line that the search string is in, or just whether it is there or not?


David Nuttall
Nuttall Computer Consulting

An Aquarius born during the Age of Aquarius

AutoIt allows me to re-invent the wheel so much faster.

I'm off to write a wizard, a wonderful wizard of odd...

Share this post


Link to post
Share on other sites

Sorry mistype on that Next For. Yes, I do have to know what line it's on. I'm reading in an AutoCAD log file and I need to find other info around the search match. THanks though,

Kevin

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

I would like to see your full script if that be possible. I am not sure where you are getting some of your variables and that may be why it is going a bit slow. I have never really had speed troubles. I use the method that I have described. I have used it on a 3MB file and it ended in 5 seconds. I dont know how many lines there were. I can certainly look it up. It is a very fast method so long as there arent other things slowing it down.

Edit: The 3MB file was 146408 lines. I have since increased the file size to 7,466,758 with 20MB and am currently testing speed on it. Just finished it ended up being 904.605 seconds. Which is just over 15 minutes.

JS

Edited by JSThePatriot

AutoIt Links

File-String Hash Plugin Updated! 04-02-2008 Plugins have been discontinued. I just found out.

ComputerGetInfo UDF's Updated! 11-23-2006

External Links

Vortex Revolutions Engineer / Inventor (Web, Desktop, and Mobile Applications, Hardware Gizmos, Consulting, and more)

Share this post


Link to post
Share on other sites

Sorry mistype on that Next For.  Yes, I do have to know what line it's on.  I'm reading in an AutoCAD log file and I need to find other info around the search match.  THanks though,

Kevin

<{POST_SNAPBACK}>

I've been tailing a log file in a game, and what I found realy sped up my script was reading the file like 8k at a time and using 1 StringInStr to see if what I was looking for was anywhere in that chunk.

If it's not, throw out everything before the last carrige return and read another 8k.

If it is, process everything between the carrige return before what I was looking for and the one after and then throw out everything before the carrige return after.

If what you're looking for doesn't occur often in the log file, you should get a lot more speed going that way. I think I got on the order of 10x faster than checking each line individually.

If it sounds helpful, I'll try and make enough sense of my code to post a bit.

Share this post


Link to post
Share on other sites

Ok, here it is. The For...Next is of primary concern here. Basically, every object in AutoCAD is listed in blocks of text. Each line is an attribute of that block. I am looking for text that matches my search term in the lines that would contain the text shown on the screen. Then, I have to back up a few lines (no more than 14) to find Page and coordinate info.

Speed is key because I am making a find function. I dont want to have to sit for 5 minutes waiting for the find function to work. I realize it may be a minute or slightly more for large files, but Excel VBA can clean up the same textfile in less than half a minute. Would there be any way to use VBA functions in AutoIT (doubting...). Thanks

Kevin

Func SearchLogFile($SearchTerm, $SearchFile)

Dim $logarray[1];The array that stores the logfile

If GUICtrlRead($StatusLabel) = " No index yet..." Then  ;Ensures that an index exists

  MsgBox(0, "User ERROR", "You must first make an index file for the selected drawing!")

Else

  HotKeySet("{esc}", "StopSearch")      ;Allow user to exit search

  $Stop = 0

 

  $SearchFile = $DumpDir & $SearchFile & "-Index.log"

  $FileLength = _FileCountLines($SearchFile)

  _FileReadToArray($SearchFile, $logarray)    ;I am NOT concerned with the time taken here

  _GUICtrlListViewDeleteAllItems ($ResultsList)  ;Deletes previous search results from listview

  $LineCount = 14

 

  $Track = 4999          ;Counter for screen update

  $LineCount = 1

  GUICtrlSetData($StatusLabel, " Line: 1 of " & $FileLength)

  For $LineCount = 1 To $FileLength

  If $LineCount > $Track Then      ;Only increments display every 5000 lines

    GUICtrlSetData($StatusLabel, " Line: " & $LineCount & " of " & $FileLength)

    $Track = $Track + 5000

  EndIf

  If $Stop = 1 Then        ;Catches hotkey (via another function)

    ExitLoop

  EndIf

  $fstring = 0          ;Flag for search result

  $CurrentLine = StringStripWS($logarray[$LineCount], 3)

  Select            ;Only lines that might contain search results need to be looked at

    Case StringLeft($CurrentLine, 4) = "text"

    $fstring = 1

    Case StringLeft($CurrentLine, 5) = "Conte"

    $fstring = 1

    Case StringLeft($CurrentLine, 5) = "value"

    $fstring = 1

    Case Else

    $fstring = 0        ;No search result on this line!

  EndSelect

   

  If $fstring = 1 Then

    If StringInStr(StringStripWS($CurrentLine, 3), $SearchTerm, 1, 1) <> 0 Then

    $Found = 0

    $PreCount = 1

    $Coord = "NA"        ;Default the fields so weird entries are noticed

    $Page = "NA"

    $text = StringTrimLeft($CurrentLine, StringInStr($CurrentLine, " ", 0, 1))

    $text = StringStripWS($text, 3)

    Do          ;If the search term is found, I need to find two pieces of info in surrounding lines

      $NewCount = $LineCount - $PreCount

      If StringInStr($logarray[$NewCount], "Y= ", 1, 1) Then ;Finds coordinates of object

      $Coord = StringReplace($logarray[$NewCount], " ", "")

      $CoordChar = StringInStr($Coord, "point,", 1, 1)

      If $CoordChar <> 0 Then

        $Coord = StringTrimLeft($Coord, $CoordChar + 5)

        $Coord = StringReplace($Coord, "X=", "")

        $Coord = StringReplace($Coord, "Y=", ",")

        $Coord = StringReplace($Coord, "Z=", ",")

        $Coord = StringStripWS($Coord, 2)

      EndIf

      ElseIf StringInStr($logarray[$NewCount], "layout:", 0, 1) <> 0 Then ;Finds Page of object in drawing

      $Page = StringStripWS(StringReplace($logarray[$NewCount], "layout:", ""), 3)

      $Found = 1

      EndIf

      If $PreCount = 14 Then    ;Each text block is about 14 lines long... no need to search any further

      $Found = 1

      EndIf

      $PreCount = $PreCount + 1

    Until $Found = 1

    $Temp = $Page & "|" & $text & "|" & $Coord

    $ItemNum = GUICtrlCreateListViewItem($Temp, $ResultsList)

    EndIf

  EndIf

  Next

  HotKeySet("{Esc}")          ;Release Hotkey so AutoCAD can use it

  Dim $logarray          ;Clear Array

  GUICtrlSetData($StatusLabel, " Line: " & $LineCount & " of " & $FileLength)

EndIf

EndFunc  ;==>SearchLogFile

Share this post


Link to post
Share on other sites

with the new Obj/COM stuff in the new AutoIt beta, you can control excel, access, word, IE, etc fast and easy.

So if you can do it from excel, you can do it in AutoIt.

Pretty easy to set up as well.


AutoIt3, the MACGYVER Pocket Knife for computers.

Share this post


Link to post
Share on other sites

Right, but the com stuff just controls the objects I thought. Can you actually use the functions as well??

Kevin

Share this post


Link to post
Share on other sites

@blink314

It is like I thought and as was explained in the other topic you were in. You dont need to get the amount of lines in the file. That is taking your time and that is exactly why it is taking more time per line read. I am going to write some code below that you need to implement instead of _FileCountLines().

Also, if anything I bet removing spaces just takes up more time. Not sure if you are worried about that. I would bet this should take a max of 30 seconds for 400k lines. Possibly up to 1 min, but I doubt it.

You also are setting the same variable multiple times. $LineCount = 14 then 1 then 1 again in the For...Next loop. Remove the first 2. Setting its value in the For...Next loop is fine.

Dim $fileO = "somefile.log"
$file = FileOpen($fileO, 0)
If $file = -1 Then
    MsgBox(0, "Error", "Unable to open file " & $fileO & ".")
    Exit
EndIf
While 1
    $line = FileReadLine($file)
    If @error = -1 Then ExitLoop
    If StringInStr($line, "SomeText") Then
       ;Do your stuff here...
    EndIf
WEnd
FileClose($file)

That will read till it reaches the end of the file. Line by line. I dont believe there is any reason to read to the array or know how many lines there are. You tell me. Now try to use that it will be much faster.

JS


AutoIt Links

File-String Hash Plugin Updated! 04-02-2008 Plugins have been discontinued. I just found out.

ComputerGetInfo UDF's Updated! 11-23-2006

External Links

Vortex Revolutions Engineer / Inventor (Web, Desktop, and Mobile Applications, Hardware Gizmos, Consulting, and more)

Share this post


Link to post
Share on other sites

Ok, I tried reading from the file like you said... still takes ~ 23 seconds to go through the first 80,000 lines. Incidentally, I put a timer on the filecountlines function and got ~80 ticks.... not a whole lot of time being spent there. As I've said before, the filecountlines and _filereadtoarray arent the problem; it's in the looping.

The reason I used an array is because accessing memory is faster than accessing the disk. For some reason AutoIt doesnt seem to benefit much from it, but in VBA you can get a HUGE speed increase by reading things into an array and working on the array. Even if (in Excel) you read in a 26x40000 range, search through it, and write the results to a gui after processing it takes about 5 seconds using array, maybe 10 or more reading cells from the spreadsheet.

Here is the redone function. Not really cleaned up... though I did change my logic slightly. I wrote a function that cleans the logfile, getting rid of most of the waste info. This gets the 400,000 line file down to ~80,000. It still takes 2 minutes to clean the log file... but I can accept that I suppose.

Since VBA can parse the logfile much quicker (a co-worker has an Excel script to do it... takes about 5 seconds on the 400,000 line logfile) I'm thinking about letting excel do the cleaning. Anyone know of any good way to control Excel (NOT the objects but the scripting, mind) from AutoIt?

Here is the code from my newer search function:

Func SearchLogFile($SearchTerm, $SearchFile)

    Dim $logarray[1]

    If GUICtrlRead($StatusLabel) = " No index yet..." Then
        MsgBox(0, "User ERROR", "You must first make an index file for the selected drawing!")
    Else
        HotKeySet("{esc}", "StopSearch")
        $Stop = 0
        
        $CleanFile = $DumpDir & $SearchFile & "-Clean.log"
        
        $time1 = timerinit()
        
        $FileLength = _filecountlines($cleanfile)
        $Diff1 = timerdiff($time1)
        msgbox(0,"",$diff1)
        
        dim $currentline1
        dim $currentline2
        dim $currentline
        
        if fileexists($cleanfile) Then      
            MsgBox(0,"",$cleanfile)
            
            $cleanpath = FileOpen($cleanfile,0)
            
            _GUICtrlListViewDeleteAllItems ($ResultsList)
            
            $LineCount = 1
            $track = 4999
            GUICtrlSetData($StatusLabel, " Line: 1 of " & $FileLength)
            
            while 1
                $currentline2 = $currentline1 
                $currentline1 = $currentline
                $currentline = stringstripws(filereadline($cleanpath),3)                
                if @error = -1 then ExitLoop
                
                If $LineCount > $Track Then
                    GUICtrlSetData($StatusLabel, " Line: " & $LineCount & " of " & $FileLength)
                    $Track = $Track + 5000
                EndIf
    
                If $Stop = 1 Then
                    ExitLoop
                EndIf
                

                
                
                if stringinstr($currentline,$searchterm,1,1) <> 0 then
                    $text = StringTrimLeft($currentline,stringinstr($currentline," ",1,1))
                    
                    $Page = stringstripws(StringTrimLeft($currentline2,stringinstr($currentline2,":",1,1)+1),3)
            
                    $Coord = StringTrimLeft($currentline1,stringinstr($currentline1,",",1,1)+1)
                    $Coord = StringReplace($coord, " ", "")

                    $Coord = StringReplace($Coord, "X=", "")
                    $Coord = StringReplace($Coord, "Y=", ",")
                    $Coord = StringReplace($Coord, "Z=", ",")
                    $Coord = StringStripWS($Coord, 2) 

                    $Temp = $Page & "|" & $text & "|" & $Coord
    
                    $ItemNum = GUICtrlCreateListViewItem($Temp, $ResultsList)
                EndIf
                
                $linecount = $linecount+1
            WEnd
            
            
            FileClose($cleanpath)
            
        EndIf   
            
        
        HotKeySet("{Esc}")

        Dim $logarray

        GUICtrlSetData($StatusLabel, " Line: " & $LineCount & " of " & $FileLength)
    EndIf
EndFunc

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0