Sign in to follow this  
Followers 0
tom13

getting last between start and end string of a big file

64 posts in this topic

Hi there,

I've got a very big .txt file of more then 20MB of raw text which I need to get the last in between value of a start and end string of, eg.:

$handle = FileOpen($chatlog, 0)
$sStringb = FileRead($handle)
FileClose($handle)
$pos = _StringBetween($sString, "[zWB_POS]", "[/zWB_POS]")
$pos = $pos[(UBound($pos) - 1)]oÝ÷ Ù8b²Ë«{
+Æî¶Ö¤zÉ¢ë^±·jë¶W¢²Ê§v+­æ­zÆ¥¢ÚjºÚÉé^éí³bâ(§yçméZ²Ú'x(ëb¢x¬¶¢YhÂ)àjëh×6$handle = FileOpen($chatlog, 0)
$sStringb = FileRead($handle)
FileClose($handle)
$i = 1
Do
    $sString = StringRight($sStringb, 1000 * $i)
    $pos = _StringBetween($sString, "[zWB_POS]", "[/zWB_POS]")
    $i = $i + 1
Until $pos <> 0
$pos = $pos[(UBound($pos) - 1)]

This takes a bit less then a second I think. But, I still need it to go faster. Do you guys have any suggestion on how to do this faster? Note that I only need to know the last occurance, so it should be possible to get it very fast.

Share this post


Link to post
Share on other sites



May be able to find something in the forum like APIFileOpen, APIFileSetPos and APIFileRead... then you could try...

$a = FileGetSize

APIFileOpen

APIFileSetPos $a-100 or $a-1000 or something

APIFileRead 1000

APIFileClose

get the idea?

Lar.


f_mrcleansmalm_77ce002.jpgAutoIt has helped make me wealthy

Share this post


Link to post
Share on other sites

@ Larry,

Thanks but are you sure using all those functions will make it faster?

Also, I still need a do loop since I can never know whether the APIFileSetPos should be 100 or 1000 right?

Can't I just read the file from right to left order (instead of left to right) and stop reading once an occurance has been found?

Share this post


Link to post
Share on other sites

Your choice is to try to find a reasonable way to know how deep these "tags" will be and use APIFileSetPos and only load a portion of the file into memory. Or load it all into memory and find your tags... I can't think of an alternative.

Lar.


f_mrcleansmalm_77ce002.jpgAutoIt has helped make me wealthy

Share this post


Link to post
Share on other sites

Okay, thanks Larry.

So I guess there is no way to read a file from right to left. :P

Share this post


Link to post
Share on other sites

Okay, thanks Larry.

So I guess there is no way to read a file from right to left. :P

Change your OS language to Yiddish?

f_mrcleansmalm_77ce002.jpgAutoIt has helped make me wealthy

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

I used a utility called in one of my scripts that would search text files from the end rather than from the start.

Maybe that would help you.

I also had large text files 20-30MB's.

The data I needed was towards the end of the file.

Basically I would pull that last few lines into a string and parse them for what I needed.

Which was very quick compared to 5 minutes to search from the beginning.

Found this on google, looks like the same one: http://www.winsite.com/bin/Info?500000031647

I can dig up some sample code if you need it.

-Kenny

Edited by ken82m

My Contributions _StringMultiReplace PC Builders Console - Secure PDF Creator - Cisco VPN Installer MS DNS Server Backup Script - MS DHCP Backup Script IT Admin Console - Toggle Admin Mode - MyMovies-Add Discs Script - IT Help Desk and System Information Tool - Set On Lid Close Power Option - Streaming Media Server & Website "I believe that when we leave a place, part of it goes with us and part of us remains... Go anywhere, when it is quiet, and just listen.. After a while, you will hear the echoes of all our conversations, every thought and word we've exchanged.... Long after we are gone our voices will linger in these walls for as long as this place remains."

Share this post


Link to post
Share on other sites

I used a utility called in one of my scripts that would search text files from the end rather than from the start.

Maybe that would help you.

I also had large text files 20-30MB's.

The data I needed was towards the end of the file.

Basically I would pull that last few lines into a string and parse them for what I needed.

Which was very quick compared to 5 minutes to search from the beginning.

Found this on google, looks like the same one: http://www.winsite.com/bin/Info?500000031647

I can dig up some sample code if you need it.

-Kenny

Thanks but when I try

$rc = _RunDos("tail 100 " & $chatlog & " > result.log")

It returns the following:

ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ

ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ

ÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍýýýý

Which definately is not in the text file.

Any idea?

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

I am now trying this with a batch file but the problem is that I can not use spaces in the file name paths. Anyone knows how I can do that?

@echo off
if {%1}=={} @echo FileName parameter required.&goto :EOF
if not exist %1 @echo %1 does NOT exist.&goto :EOF
setlocal
set file=%1
set /a number=10
if not {%2}=={} set /a number=%2
for /f %%i in ('find /v /c "" ^< %file%') do set /a lines=%%i
if %number% GEQ %lines% set /a start=0&goto console
set /a start=%lines% - %number%
:console
more /e +%start% %file%
endlocal

$rc = _RunDos(@ScriptDir & '\tail.bat ' & $chatlog & ' > ' & @ScriptDir & '\result.txt')
MsgBox(0, "", $rc)
exit

It returns the following:

timer1: 218

timer2: 185

timer3: 262

Timer3 should be max 20. Used a 13MB file here.

edit; I assume that the batch file actually does load the full file in memory

I found http://www.autoitscript.com/forum/index.ph...st&p=336438 but I have no idea how to use that one. It's with bytes instead of lines I think, confused..

Edited by tom13

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

I have searched and searched and searched and finally got to this example:

#RequireAdmin
#include "APITailRW.au3"
$log = "C:\WAR\logs\testchat.log"
$copyLog = "Testlog2.txt"
$var1 = fileGetSize($log)
ConsoleWrite($var1 & @crlf)
While 1
    $var2 = FileGetSize($log)
    If $var2 <> $Var1 then
        ConsoleWrite("Read in " & $var2 - $var1 & " bytes" &  @crlf)
        ConsoleWrite("New data" & @crlf)
        $file =_FileOpenAPI($log)
        $read = _FileReadAPI($file, $var2 - $var1, $var1)
        _FileCloseAPI($file)
        ConsoleWrite($read & @crlf)
        $var1 = $var2
    EndIf
WEnd

While it works for ChrisL in http://www.autoitscript.com/forum/index.ph...937&hl=tail - it is giving invalid returns to me, eg. if I add "hello" to the bottom of the log file I get 3 very weird characters one of which is not on my keyboard.. other 2 look like this: ÿ[. Any clues?

Edited by tom13

Share this post


Link to post
Share on other sites

I have searched and searched and searched and finally got to this example:

#RequireAdmin
#include "APITailRW.au3"
$log = "C:\WAR\logs\testchat.log"
$copyLog = "Testlog2.txt"
$var1 = fileGetSize($log)
ConsoleWrite($var1 & @crlf)
While 1
    $var2 = FileGetSize($log)
    If $var2 <> $Var1 then
        ConsoleWrite("Read in " & $var2 - $var1 & " bytes" &  @crlf)
        ConsoleWrite("New data" & @crlf)
        $file =_FileOpenAPI($log)
        $read = _FileReadAPI($file, $var2 - $var1, $var1)
        _FileCloseAPI($file)
        ConsoleWrite($read & @crlf)
        $var1 = $var2
    EndIf
WEnd

While it works for ChrisL in http://www.autoitscript.com/forum/index.ph...937&hl=tail - it is giving invalid returns to me, eg. if I add "hello" to the bottom of the log file I get 3 very weird characters one of which is not on my keyboard.. other 2 look like this: ÿ[. Any clues?

I don't know but maybe you are forgetting that the first position is at 0 so you need

$read = _FileReadAPI($file, $var2 - $var1, $var1-1)

Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.

Share this post


Link to post
Share on other sites

I don't know but maybe you are forgetting that the first position is at 0 so you need

$read = _FileReadAPI($file, $var2 - $var1, $var1-1)
Thanks for the reply but I am still getting the 3 weird characters every time, and not what has changed.

I have tried another UDF from http://www.autoitscript.com/forum/index.ph...mp;#entry353757

#RequireAdmin
#include "tailrw.au3"
$file = "C:\WAR\logs\testchat.log"
$prevsize = FileGetSize($file)
While 1
    $newsize = FileGetSize($file)
    if $newsize <> $prevsize then
        $handle = _APIFileOpen ($file)
        _APIFileSetPos($handle, $prevsize)
        $tmp_msg = _APIFileRead($handle, $newsize - $prevsize)
        If @error <> 0 Then
            _APIFileClose($handle)
            MsgBox(0, "", $tmp_msg)
            $prevsize = $newsize
        else
            MsgBox(0, "error return from _APIFileRead", _LastErr (), 1)
        endif
    endif
Wend

However this one is crashing once I edit the log file... the error value is 0 - any idea?

Share this post


Link to post
Share on other sites

#13 ·  Posted (edited)

StringInStr() can read from right to left.

Couldn't you find the first occurrences of "[zWB_POS]" and "[/zWB_POS]" from the right and just use StringMid()?

Edited by trancexx

♡♡♡

.

eMyvnE

Share this post


Link to post
Share on other sites

#14 ·  Posted (edited)

StringInStr() can read from right to left.

Couldn't you find the first occurrences of "[zWB_POS]" and "[/zWB_POS]" from the right and just use StringMid()?

Of course but to do so you have to load the whole file into memory, so that makes no sense.

I have also tried http://www.autoitscript.com/forum/index.ph...464&hl=tail

And it crashes, again!

Anyone understands why it keeps crashing here while it works for other people? I do have the latest stable and beta version.

Edit: I got vista ultimate x64

Edited by tom13

Share this post


Link to post
Share on other sites

Outside of poorly written UDF... make sure you check this line for error...

$handle = _APIFileOpen ($file)

That $handle could be bad if it fails.

Lar.


f_mrcleansmalm_77ce002.jpgAutoIt has helped make me wealthy

Share this post


Link to post
Share on other sites

Thanks for the reply but I am still getting the 3 weird characters every time, and not what has changed.

I have tried another UDF from http://www.autoitscript.com/forum/index.ph...mp;#entry353757

#RequireAdmin
#include "tailrw.au3"
$file = "C:\WAR\logs\testchat.log"
$prevsize = FileGetSize($file)
While 1
    $newsize = FileGetSize($file)
    if $newsize <> $prevsize then
        $handle = _APIFileOpen ($file)
        _APIFileSetPos($handle, $prevsize)
        $tmp_msg = _APIFileRead($handle, $newsize - $prevsize)
        If @error <> 0 Then
            _APIFileClose($handle)
            MsgBox(0, "", $tmp_msg)
            $prevsize = $newsize
        else
            MsgBox(0, "error return from _APIFileRead", _LastErr (), 1)
        endif
    endif
Wend

However this one is crashing once I edit the log file... the error value is 0 - any idea?

No idea but haven't you still made the same mistake

_APIFileSetPos($handle, $prevsize);<----------should be $prevsize -1

Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script.

Share this post


Link to post
Share on other sites

Of course but to do so you have to load the whole file into memory, so that makes no sense.

Actually, that makes no sense.

AutoIt reads like, what, 30 - 40 MB/sec.

Try it.


♡♡♡

.

eMyvnE

Share this post


Link to post
Share on other sites

Actually, that makes no sense.

AutoIt reads like, what, 30 - 40 MB/sec.

Try it.

Yeah. And the file is like 200 MB. Makes sense right?

Needs to go within a few milliseconds.

Share this post


Link to post
Share on other sites

No idea but haven't you still made the same mistake

_APIFileSetPos($handle, $prevsize);<----------should be $prevsize -1
Thanks but it did not fix it. Still crashes.

I also tried the following which is crashing too:

#RequireAdmin
#include "APITailRW.au3"
$file = "C:\WAR\logs\testchat.log"
$s_Test = $file
$prevsize = FileGetSize($file)
While 1
    $newsize = FileGetSize($file)
    if $newsize <> $prevsize then
        $i_WriteLineNumTail = 10
        $s_ReadLine = _FileReadLineAPI($s_Test)
        MsgBox(0, "", $s_ReadLine)
        $prevsize = $newsize
    endif
Wend

For this I used the APITailRW.au3 UDF.

Share this post


Link to post
Share on other sites

#20 ·  Posted (edited)

Outside of poorly written UDF... make sure you check this line for error...

$handle = _APIFileOpen ($file)

That $handle could be bad if it fails.

Lar.

Thanks for helping.

The following:

$handle = _APIFileOpen($file)
        MsgBox(0, "", $handle)

returns this: 0x00000140

And then, it crashes once I use _APIFileRead.

Any ideas?

edit: do the example scripts work for you guys?

Edited by tom13

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0