Jump to content

getting last between start and end string of a big file


tom13
 Share

Recommended Posts

Hi there,

I've got a very big .txt file of more then 20MB of raw text which I need to get the last in between value of a start and end string of, eg.:

$handle = FileOpen($chatlog, 0)
$sStringb = FileRead($handle)
FileClose($handle)
$pos = _StringBetween($sString, "[zWB_POS]", "[/zWB_POS]")
$pos = $pos[(UBound($pos) - 1)]oÝ÷ Ù8b²Ë«{
+Æî¶Ö¤zÉ¢ë^±·jë¶W¢²Ê§v+­æ­zÆ¥¢ÚjºÚÉé^éí³bâ(§yçméZ²Ú'x(ëb¢x¬¶¢YhÂ)àjëh×6$handle = FileOpen($chatlog, 0)
$sStringb = FileRead($handle)
FileClose($handle)
$i = 1
Do
    $sString = StringRight($sStringb, 1000 * $i)
    $pos = _StringBetween($sString, "[zWB_POS]", "[/zWB_POS]")
    $i = $i + 1
Until $pos <> 0
$pos = $pos[(UBound($pos) - 1)]

This takes a bit less then a second I think. But, I still need it to go faster. Do you guys have any suggestion on how to do this faster? Note that I only need to know the last occurance, so it should be possible to get it very fast.

Link to comment
Share on other sites

  • Replies 63
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

May be able to find something in the forum like APIFileOpen, APIFileSetPos and APIFileRead... then you could try...

$a = FileGetSize

APIFileOpen

APIFileSetPos $a-100 or $a-1000 or something

APIFileRead 1000

APIFileClose

get the idea?

Lar.

f_mrcleansmalm_77ce002.jpgAutoIt has helped make me wealthy

Link to comment
Share on other sites

@ Larry,

Thanks but are you sure using all those functions will make it faster?

Also, I still need a do loop since I can never know whether the APIFileSetPos should be 100 or 1000 right?

Can't I just read the file from right to left order (instead of left to right) and stop reading once an occurance has been found?

Link to comment
Share on other sites

Your choice is to try to find a reasonable way to know how deep these "tags" will be and use APIFileSetPos and only load a portion of the file into memory. Or load it all into memory and find your tags... I can't think of an alternative.

Lar.

f_mrcleansmalm_77ce002.jpgAutoIt has helped make me wealthy

Link to comment
Share on other sites

I used a utility called in one of my scripts that would search text files from the end rather than from the start.

Maybe that would help you.

I also had large text files 20-30MB's.

The data I needed was towards the end of the file.

Basically I would pull that last few lines into a string and parse them for what I needed.

Which was very quick compared to 5 minutes to search from the beginning.

Found this on google, looks like the same one: http://www.winsite.com/bin/Info?500000031647

I can dig up some sample code if you need it.

-Kenny

Edited by ken82m

 "I believe that when we leave a place, part of it goes with us and part of us remains... Go anywhere, when it is quiet, and just listen.. After a while, you will hear the echoes of all our conversations, every thought and word we've exchanged.... Long after we are gone our voices will linger in these walls for as long as this place remains."

Link to comment
Share on other sites

I used a utility called in one of my scripts that would search text files from the end rather than from the start.

Maybe that would help you.

I also had large text files 20-30MB's.

The data I needed was towards the end of the file.

Basically I would pull that last few lines into a string and parse them for what I needed.

Which was very quick compared to 5 minutes to search from the beginning.

Found this on google, looks like the same one: http://www.winsite.com/bin/Info?500000031647

I can dig up some sample code if you need it.

-Kenny

Thanks but when I try

$rc = _RunDos("tail 100 " & $chatlog & " > result.log")

It returns the following:

ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ

ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ

ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍýýýý

Which definately is not in the text file.

Any idea?

Link to comment
Share on other sites

I am now trying this with a batch file but the problem is that I can not use spaces in the file name paths. Anyone knows how I can do that?

@echo off
if {%1}=={} @echo FileName parameter required.&goto :EOF
if not exist %1 @echo %1 does NOT exist.&goto :EOF
setlocal
set file=%1
set /a number=10
if not {%2}=={} set /a number=%2
for /f %%i in ('find /v /c "" ^< %file%') do set /a lines=%%i
if %number% GEQ %lines% set /a start=0&goto console
set /a start=%lines% - %number%
:console
more /e +%start% %file%
endlocal

$rc = _RunDos(@ScriptDir & '\tail.bat ' & $chatlog & ' > ' & @ScriptDir & '\result.txt')
MsgBox(0, "", $rc)
exit

It returns the following:

timer1: 218

timer2: 185

timer3: 262

Timer3 should be max 20. Used a 13MB file here.

edit; I assume that the batch file actually does load the full file in memory

I found http://www.autoitscript.com/forum/index.ph...st&p=336438 but I have no idea how to use that one. It's with bytes instead of lines I think, confused..

Edited by tom13
Link to comment
Share on other sites

I have searched and searched and searched and finally got to this example:

#RequireAdmin
#include "APITailRW.au3"
$log = "C:\WAR\logs\testchat.log"
$copyLog = "Testlog2.txt"
$var1 = fileGetSize($log)
ConsoleWrite($var1 & @crlf)
While 1
    $var2 = FileGetSize($log)
    If $var2 <> $Var1 then
        ConsoleWrite("Read in " & $var2 - $var1 & " bytes" &  @crlf)
        ConsoleWrite("New data" & @crlf)
        $file =_FileOpenAPI($log)
        $read = _FileReadAPI($file, $var2 - $var1, $var1)
        _FileCloseAPI($file)
        ConsoleWrite($read & @crlf)
        $var1 = $var2
    EndIf
WEnd

While it works for ChrisL in http://www.autoitscript.com/forum/index.ph...937&hl=tail - it is giving invalid returns to me, eg. if I add "hello" to the bottom of the log file I get 3 very weird characters one of which is not on my keyboard.. other 2 look like this: ÿ[. Any clues?

Edited by tom13
Link to comment
Share on other sites

I have searched and searched and searched and finally got to this example:

#RequireAdmin
#include "APITailRW.au3"
$log = "C:\WAR\logs\testchat.log"
$copyLog = "Testlog2.txt"
$var1 = fileGetSize($log)
ConsoleWrite($var1 & @crlf)
While 1
    $var2 = FileGetSize($log)
    If $var2 <> $Var1 then
        ConsoleWrite("Read in " & $var2 - $var1 & " bytes" &  @crlf)
        ConsoleWrite("New data" & @crlf)
        $file =_FileOpenAPI($log)
        $read = _FileReadAPI($file, $var2 - $var1, $var1)
        _FileCloseAPI($file)
        ConsoleWrite($read & @crlf)
        $var1 = $var2
    EndIf
WEnd

While it works for ChrisL in http://www.autoitscript.com/forum/index.ph...937&hl=tail - it is giving invalid returns to me, eg. if I add "hello" to the bottom of the log file I get 3 very weird characters one of which is not on my keyboard.. other 2 look like this: ÿ[. Any clues?

I don't know but maybe you are forgetting that the first position is at 0 so you need

$read = _FileReadAPI($file, $var2 - $var1, $var1-1)
Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.
Link to comment
Share on other sites

I don't know but maybe you are forgetting that the first position is at 0 so you need

$read = _FileReadAPI($file, $var2 - $var1, $var1-1)
Thanks for the reply but I am still getting the 3 weird characters every time, and not what has changed.

I have tried another UDF from http://www.autoitscript.com/forum/index.ph...mp;#entry353757

#RequireAdmin
#include "tailrw.au3"
$file = "C:\WAR\logs\testchat.log"
$prevsize = FileGetSize($file)
While 1
    $newsize = FileGetSize($file)
    if $newsize <> $prevsize then
        $handle = _APIFileOpen ($file)
        _APIFileSetPos($handle, $prevsize)
        $tmp_msg = _APIFileRead($handle, $newsize - $prevsize)
        If @error <> 0 Then
            _APIFileClose($handle)
            MsgBox(0, "", $tmp_msg)
            $prevsize = $newsize
        else
            MsgBox(0, "error return from _APIFileRead", _LastErr (), 1)
        endif
    endif
Wend

However this one is crashing once I edit the log file... the error value is 0 - any idea?

Link to comment
Share on other sites

StringInStr() can read from right to left.

Couldn't you find the first occurrences of "[zWB_POS]" and "[/zWB_POS]" from the right and just use StringMid()?

Of course but to do so you have to load the whole file into memory, so that makes no sense.

I have also tried http://www.autoitscript.com/forum/index.ph...464&hl=tail

And it crashes, again!

Anyone understands why it keeps crashing here while it works for other people? I do have the latest stable and beta version.

Edit: I got vista ultimate x64

Edited by tom13
Link to comment
Share on other sites

Thanks for the reply but I am still getting the 3 weird characters every time, and not what has changed.

I have tried another UDF from http://www.autoitscript.com/forum/index.ph...mp;#entry353757

#RequireAdmin
#include "tailrw.au3"
$file = "C:\WAR\logs\testchat.log"
$prevsize = FileGetSize($file)
While 1
    $newsize = FileGetSize($file)
    if $newsize <> $prevsize then
        $handle = _APIFileOpen ($file)
        _APIFileSetPos($handle, $prevsize)
        $tmp_msg = _APIFileRead($handle, $newsize - $prevsize)
        If @error <> 0 Then
            _APIFileClose($handle)
            MsgBox(0, "", $tmp_msg)
            $prevsize = $newsize
        else
            MsgBox(0, "error return from _APIFileRead", _LastErr (), 1)
        endif
    endif
Wend

However this one is crashing once I edit the log file... the error value is 0 - any idea?

No idea but haven't you still made the same mistake

_APIFileSetPos($handle, $prevsize);<----------should be $prevsize -1
Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.
Link to comment
Share on other sites

No idea but haven't you still made the same mistake

_APIFileSetPos($handle, $prevsize);<----------should be $prevsize -1
Thanks but it did not fix it. Still crashes.

I also tried the following which is crashing too:

#RequireAdmin
#include "APITailRW.au3"
$file = "C:\WAR\logs\testchat.log"
$s_Test = $file
$prevsize = FileGetSize($file)
While 1
    $newsize = FileGetSize($file)
    if $newsize <> $prevsize then
        $i_WriteLineNumTail = 10
        $s_ReadLine = _FileReadLineAPI($s_Test)
        MsgBox(0, "", $s_ReadLine)
        $prevsize = $newsize
    endif
Wend

For this I used the APITailRW.au3 UDF.

Link to comment
Share on other sites

Outside of poorly written UDF... make sure you check this line for error...

$handle = _APIFileOpen ($file)

That $handle could be bad if it fails.

Lar.

Thanks for helping.

The following:

$handle = _APIFileOpen($file)
        MsgBox(0, "", $handle)

returns this: 0x00000140

And then, it crashes once I use _APIFileRead.

Any ideas?

edit: do the example scripts work for you guys?

Edited by tom13
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...