Sign in to follow this  
Followers 0
Marlo

Fastest way to parse data from a large file?

3 posts in this topic

#1 ·  Posted (edited)

So I have a ~6Mb that is formatted like so:

{
"realm":{"name":"Someserver","slug":"someserver"},
"side1":{"data":[
{"auc":9999999999,"item":01234,"owner":"SomeNáme","bid":999999,"buyout":999999,"quantity":999,"timeLeft":"VERY_LONG"},
{"auc":9999999999,"item":01234,"owner":"SomeNáme","bid":999999,"buyout":999999,"quantity":999,"timeLeft":"VERY_LONG"},
{"auc":9999999999,"item":01234,"owner":"SomeNáme","bid":999999,"buyout":999999,"quantity":999,"timeLeft":"VERY_LONG"}]},
"Side2":{"data":[
{"auc":9999999999,"item":01234,"owner":"SomeNáme","bid":999999,"buyout":999999,"quantity":999,"timeLeft":"VERY_LONG"},
{"auc":9999999999,"item":01234,"owner":"SomeNáme","bid":999999,"buyout":999999,"quantity":999,"timeLeft":"VERY_LONG"},
{"auc":9999999999,"item":01234,"owner":"SomeNáme","bid":999999,"buyout":999999,"quantity":999,"timeLeft":"VERY_LONG"}]},

"Side3":{"data":[
{"auc":9999999999,"item":01234,"owner":"SomeNáme","bid":999999,"buyout":999999,"quantity":999,"timeLeft":"VERY_LONG"},
{"auc":9999999999,"item":01234,"owner":"SomeNáme","bid":999999,"buyout":999999,"quantity":999,"timeLeft":"VERY_LONG"},
{"auc":9999999999,"item":01234,"owner":"SomeNáme","bid":999999,"buyout":999999,"quantity":999,"timeLeft":"VERY_LONG"}]}
}

Now bearing in mind that the file can oft times contain 50k lines of this stuff.

I started by reading the file line by line and parsing it with a simple regexp string which extracted the basic info and pushed it into a SQLite memory database but even so it takes upwards of 30 seconds to process a whole file (and it takes about 30-50% CPU usage).

Here is the RegExp i used;

^.*?{.?auc":(d*).*?"item":(d*),"owner":"([w]+)","bid":(d*),"buyout":(d*),"quantity":(d*),"timeLeft":"([a-zA-Z_]+)"}

I am new to RegExp so my method is probably very bad : /

So does anyone know a better way for me to be doing this? My way feels way too clunky.

Edited by Marlo

Click here for the best AutoIt help possible.Currently Working on: Autoit RAT

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Processing Reading the file line by line is a bottleneck. A simple way is to use _FileReadToArray(), that should be able to handle 6MB input files. Loop through the resulting array and apply your RegEx. For larger file I would recommend your own parser, reading e.g. 1MB chunks. Look for the last linebreak in the buffer (stringinstr -1) and parse the data up to that point, transfer the rest to a new buffer and read the next 1MB chunk. Splitting the lines with a RegExp should alreay be quite fast. Edited by KaFu

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

$sText = _

        '{' & @CRLF & _

        '"realm":{"name":"Someserver","slug":"someserver"},' & @CRLF & _

        '"side1":{"data":[' & @CRLF & _

        '{"auc":9999999999,"item":01234,"owner":"SomeName","bid":999999,"buyout":999999,"quantity":999,"timeLeft":"VERY_LONG"},' & @CRLF & _

        '{"auc":9999999999,"item":01234,"owner":"SomeName","bid":999999,"buyout":999999,"quantity":999,"timeLeft":"VERY_LONG"},' & @CRLF & _

        '{"auc":9999999999,"item":01234,"owner":"SomeName","bid":999999,"buyout":999999,"quantity":999,"timeLeft":"VERY_LONG"}]},' & @CRLF & _

        '"Side2":{"data":[' & @CRLF & _

        '{"auc":9999999999,"item":01234,"owner":"SomeName","bid":999999,"buyout":999999,"quantity":999,"timeLeft":"VERY_LONG"},' & @CRLF & _

        '{"auc":9999999999,"item":01234,"owner":"SomeName","bid":999999,"buyout":999999,"quantity":999,"timeLeft":"VERY_LONG"},' & @CRLF & _

        '{"auc":9999999999,"item":01234,"owner":"SomeName","bid":999999,"buyout":999999,"quantity":999,"timeLeft":"VERY_LONG"}]},' & @CRLF & _

        @CRLF & _

        '"Side3":{"data":[' & @CRLF & _

        '{"auc":9999999999,"item":01234,"owner":"SomeName","bid":999999,"buyout":999999,"quantity":999,"timeLeft":"VERY_LONG"},' & @CRLF & _

        '{"auc":9999999999,"item":01234,"owner":"SomeName","bid":999999,"buyout":999999,"quantity":999,"timeLeft":"VERY_LONG"},' & @CRLF & _

        '{"auc":9999999999,"item":01234,"owner":"SomeName","bid":999999,"buyout":999999,"quantity":999,"timeLeft":"VERY_LONG"}]}' & @CRLF & _

        '}'

; MsgBox(0, "Сообщение", $sText)

$aText = StringRegExp($sText, '(?m)^.*?{"auc":(d*).*?"item":(d*),"owner":"([w]+)","bid":(d*),"buyout":(d*),"quantity":(d*),"timeLeft":"(w+)"}', 3)

If Not @error Then

    $n = UBound($aText)

    Local $aText2D[$n / 7 + 1][7] = [[$n / 7]]

    For $i = 0 To $n - 1 Step 7

        $d = $i / 7 + 1

        $aText2D[$d][0] = $aText[$i]

        $aText2D[$d][1] = $aText[$i + 1]

        $aText2D[$d][2] = $aText[$i + 2]

        $aText2D[$d][3] = $aText[$i + 3]

        $aText2D[$d][4] = $aText[$i + 4]

        $aText2D[$d][5] = $aText[$i + 5]

        $aText2D[$d][6] = $aText[$i + 6]

    Next

EndIf

#include <Array.au3>

_ArrayDisplay($aText2D, 'Array')

Edited by AZJIO

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0