Jump to content

Can this script run faster?


Recommended Posts

It's too slow man, it's too slow!

It works fine for 4-5 entries, but I am dealing here almost always with more then a 10 000 entries...

I do not have a clear idea of ​​what's going on, but in my experience, more or less

Local $i_log = 2 ; log.xlsx reading start ponit
Do ; loop
$IP = $aArray_log[$i_log][1] ; taking IP from log.xlsx
$temp = StringSplit($IP,'.') ; splits up a string into substrings
$decIP = ($temp[1]*256*256*256)+($temp[2]*256*256)+($temp[3]*256)+$temp[4] ; converting IP to decimal value, more info https://www.ultratools.com/tools/decimalCalc

For $i_geo = 2 To $Max_geo ; geo.xlsx reading start ponit, Loop
If $decIP >= $aArray_geo[$i_geo][1] And $decIP <= $aArray_geo[$i_geo][2] Then $dummy = 1 ; commparing decimal IP with values from geo.xlsx and define walue just to script go on
Next

$i_log = $i_log + 1
Until $i_log = $Max_log ; Do until read all rows in log.xlsx

this code should proceed, an average of 1000-10000 item per second, so the maximum will take 10 seconds, and not 150 seconds

apps-odrive.pngdrive_app_badge.png box-logo.png new_logo.png MEGA_Logo.png

Link to comment
Share on other sites

  • Replies 47
  • Created
  • Last Reply

Top Posters In This Topic

First, I work in an old pc x32, very slow (everything is fine, but I want to say is not a PC where you can do test ect ect), with Windows 8 & Office 2013 ect ect

Autoit + Excel object does not work well, here I can not open the geo.xlsx, waited 5 minutes and nothing hmmmm it seems that Excel works (CPU 80%) mahhhh nothing seems to want a very long time to open the geo.xlsx, instead _XLSXReadToArray() will serve more or less 30 seconds to open geo.xlsx

I left in geo.xlsx only 260 Rows, and Autoit + Excel object opened it in 22 seconds, instead _XLSXReadToArray() opened it in 1.22 seconds

the file geo.xlsx are present 180,000 rows and not 10,000 ehhh, so the loop that you run them and equal to 110.697.900, autoit seems to work almost 300 items per second (in my opinion AutoIt works Very Very fast, they're giving for each loop 1000 array to control ehhhhh), so serve 600 seconds to finish the job

I am not an expert, but for what they saw or generally autoit and very fast, the .NET is not faster in some loop about AutoIt, so I doubt that even C++++ will run this loop in 5 seconds hmmmmmmmmm

however, code more understandable should be so

#include <_XLSXReadToArray.au3>
#include <array.au3>

Local $inColum, $dIP, $t
$t = TimerInit()

$aGeo = _XLSXReadToArray(@ScriptDir & "\geo.xlsx", 0, 0, 0)
;~ If @Error Or @Extended < 3 Then
;~   ;ect ect ect
;~ EndIf
$aLog = _XLSXReadToArray(@ScriptDir & "\log.xlsx", 0, 0, 0)
;~ If @Error Then
;~   ;ect ect ect
;~ EndIf
$inColum = @Extended
ReDim $aLog[$aLog[0][0] + 1][$inColum + 1]
$t = TimerDiff($t)
MsgBox(0, "Time Difference", $t)

$t = TimerInit()

For $i = 2 To $aLog[0][0]
    $aIP = StringSplit($aLog[$i][0], ".")
    $dIP = ($aIP[1] * 256 * 256 * 256) + ($aIP[2] * 256 * 256) + ($aIP[3] * 256) + $aIP[4]
    For $y = 2 To $aGeo[0][0]
        If $dIP >= $aGeo[$y][0] And $dIP <= $aGeo[$y][1] Then $aLog[$i][$inColum] = $aGeo[$y][2]
    Next
Next

$t = TimerDiff($t)
MsgBox(0, "Time Difference", $t)
_ArrayDisplay($aLog)

Ciao.

Edited by DXRW4E

apps-odrive.pngdrive_app_badge.png box-logo.png new_logo.png MEGA_Logo.png

Link to comment
Share on other sites

I tested your script and I get 259.378 sec.

So you're telling me that my 142 seconds is the fastest solution?!?

I do not think the script in geoip.rar

#include <Excel.au3>

$Log= @ScriptDir & "\log.xlsx"
$oExcel_log= _ExcelBookOpen($Log, 0)

$aArray_log = _ExcelReadSheetToArray($oExcel_log, 1, 1, 0, 1)
$Max_log = $aArray_log[0][0]

$Geo= @ScriptDir & "\geo.xlsx"
$oExcel_geo = _ExcelBookOpen($Geo, 0)

$aArray_geo = _ExcelReadSheetToArray($oExcel_geo, 1, 1, 0, 3)
$Max_geo = $aArray_geo[0][0]

Local $begin = TimerInit()

Local $i_log = 2
Do
   $IP = $aArray_log[$i_log][1]
   $temp = StringSplit($IP,'.')
   $decIP = ($temp[1]*256*256*256)+($temp[2]*256*256)+($temp[3]*256)+$temp[4]

   For $i_geo = 2 To $Max_geo
      If $decIP >= $aArray_geo[$i_geo][1] And $decIP <= $aArray_geo[$i_geo][2] Then _ExcelWriteCell($oExcel_log, $aArray_geo[$i_geo][3], $i_log, 2) 
   Next

   $i_log = $i_log + 1
Until $i_log = $Max_log + 1

Local $dif = TimerDiff($begin)
MsgBox(0, "Time Difference", $dif) ; 142 sec <<<<<<< TO SLOW!!!

_ExcelBookSave($oExcel_Log)
_ExcelBookClose($oExcel_Log, 1, 0)
$oExcel_log.Application.Quit
_ExcelBookClose($oExcel_geo, 0, 0)
$oExcel_geo.Application.Quit

Exit

is more faster than

#include <_XLSXReadToArray.au3>
#include <array.au3>

Local $inColum, $dIP, $t
$t = TimerInit()

$aGeo = _XLSXReadToArray(@ScriptDir & "\geo.xlsx", 0, 0, 0)
;~ If @Error Or @Extended < 3 Then
;~   ;ect ect ect
;~ EndIf
$aLog = _XLSXReadToArray(@ScriptDir & "\log.xlsx", 0, 0, 0)
;~ If @Error Then
;~   ;ect ect ect
;~ EndIf
$inColum = @Extended
ReDim $aLog[$aLog[0][0] + 1][$inColum + 1]
$t = TimerDiff($t)
MsgBox(0, "Time Difference", $t)

$t = TimerInit()

For $i = 2 To $aLog[0][0]
    $aIP = StringSplit($aLog[$i][0], ".")
    $dIP = ($aIP[1] * 256 * 256 * 256) + ($aIP[2] * 256 * 256) + ($aIP[3] * 256) + $aIP[4]
    For $y = 2 To $aGeo[0][0]
        If $dIP >= $aGeo[$y][0] And $dIP <= $aGeo[$y][1] Then $aLog[$i][$inColum] = $aGeo[$y][2]
    Next
Next

$t = TimerDiff($t)
MsgBox(0, "Time Difference", $t)
_ArrayDisplay($aLog)

anyway is not autoit that works slow but the same code that performs more than 100 million of loop

sorry for my english

Ciao.

Edited by DXRW4E

apps-odrive.pngdrive_app_badge.png box-logo.png new_logo.png MEGA_Logo.png

Link to comment
Share on other sites

For the record, I downloaded the large version of the geoip free offer, which includes country, region, city, lattitude, longitude, postal code and area code. It contains 2,261,104 IPv4 ranges for 403,081 locations.

I loaded it in SQLite Expert as two tables, converted the full decimal IP values to classical xxx.yyy.zzz.ttt format using an update statement. The resulting DB with its coumpound index is 405 MiB.

When I do a simple search like

select IPfrom, IPto, Country, City from ipblocks B natural join iplocation L where '109.015.194.154' between ipfrom and ipto;

I get the expected result

IPfrom IPto country city

--------------- --------------- ------- -----------

109.015.194.000 109.015.194.255 FR La Couronne

in ~3.5 s

I bet that using the more simple {IP range, country} geoip file, the search will be way faster. It was easy to verify this obvious assumption.

Using the geo file you posted, I loaded it into a single table, still using Expert. Again I converted decimal IPs to readable string format (even if I know it will be slower). The resulting table of 178545 rows and its size is 18 Mib.

Unsurprisingly, the above query produces the same result in ~235 ms

Using only decimal IP values the time decreases to 225 ms. In my biaised view the small gain doesn't justify the burden and inconvenience of full decimal IP unreadable values.

EDIT: I don't believe that for routine use loading this small (mostly read-only) DB in :memory: would be much beneficial. It is small enough to be fully kept in both OS and SQLite caches after only few queries, so the only difference would be to avoid double cacheing.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

The guide line to my judgment what is fast and what is not was this .Net app Apache Log Viewer, which analyze Apache logs and it load my log file (access_log-2013-01-18.txt) with 620 IP's and all other information from it in matter of seconds in form like this:

Posted Image

It use GeoIP base in form of GeoIP.dat and GeoIPv6.dat files.

Since that app not completely satisfy my needs, I decide to adjust it and I succeed in all the cases except in GeoIp loading speed...

access_log-2013-01-18.txt

Link to comment
Share on other sites

Reading 180,000 rows from an Excel spreadsheet using this method is incredibly slow. I converted the spreadsheet into a CSV. Reading in the file and parsing it into an array using StringSplit and StringRegExp takes around 3 seconds.

Obviously the bulk of the time you're spending is in this enormous loop. There are tons of array search optimizations you can put in that would help you not have to search the whole array. For example:

First off, the array is already sorted for you. You should be using this to your advantage. One timesaver I came up with off the top of my head is an array index. I saved the first value (16777216) and every time the row's low value was greater than a multiple of that number, I saved that position in the Geo array. This means later when I'm searching for the value, I know a good place to start (arrayIndex[Floor(decIP / firstvalue)]) and I should only have to search a fraction of the total array. If you do the math: (total rows / (last low value / first low value)), which in this case is a maximum of about 800 rows per log entry.

Second, you need to look at the information you're searching for. There are a lot of duplicate IPs. Save what you've found and look through those first before looping through the big loop.

After optimizing your code with only the above improvements, I have cut the time down to this:

Geo File Read Time: 0.14s

Geo File Parse Time: 2.95s

Log Parse Time: 0.09s

Total Time: 3.18s

IPs Found: 620

Edited by sentry07
Link to comment
Share on other sites

As i told before in this topic, I am not a full time programmer, I use AutoIt occasionally to make my full time job easier.

So the script that I wrote was a peak of my AutoIt knowledge.

Thank you very much, this is more than I expected for! :bye:

Edited by tempman
Link to comment
Share on other sites

Here's my contribution, using the included SQLite DB (which contains data from the lite version of GeoIP, just IP range and country). Since you say you may have to process a large number of lines in your logs, I believe the approach used herein is beneficial: it extracts all IPs from the log file, removes duplicates and sorts them in order to use a binary search later. From the timing displayed, the lookup in the database is fairly fast.

What was the slow part of your initial code (the geo IP lookup) boils down to circa 2 ms for every distinct IP address in the log file. Overall processing time takes less than 2.5 s now for the same 621 entries log file.

Now beware that the IP geo data seems unreliable or at least questionable. For instance the lite version of geoip says 159.253.130.4 is in Croatia, the large version (with city and more details) says it belongs to "SoftLayer Dutch Holdings BV" and other sites like http://www.geoiptool.com/fr/?IP=159.253.130.4 pretend it's in Netherland.

Anyway, here's the self-contained ZIP. There is little to no error checking. The country name is simply appended at every line. You may want to reformat the log lines to suit your actual needs.

SQLiteGeoIP.zip

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

  • Moderators

sentry07 and jchd,

Service above and beyond the call of duty from both of you there - thank you very much for having made the effort! :thumbsup:

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

Now beware that the IP geo data seems unreliable or at least questionable. For instance the lite version of geoip says 159.253.130.4 is in Croatia, the large version (with city and more details) says it belongs to "SoftLayer Dutch Holdings BV" and other sites like http://www.geoiptool.com/fr/?IP=159.253.130.4 pretend it's in Netherland.

You have this misfortune to bump to one of the outdated GeoIP sites.

IP 159.253.130.4 is really in Croatia

https://apps.db.ripe.net/search/query.html?searchtext=159.253.130.4#resultsAnchor

http://whois.domaintools.com/159.253.130.4

http://www.infosniper.net/index.php?ip_address=159.253.130.4

http://myip.ms/info/whois/159.253.130.4

As your script concerning, i didn't have time to try it, I feedback you as soon as possible.

Thank you in advance for your contribution!

Link to comment
Share on other sites

That's bad luck: this site was the first Google.fr came up with and I wanted to double-check a few IPs using it. Their own IP is now in my host file so that I won't make the same blemish again. Sorry for FUD.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Never mind about that, your SQLiteGeoIP work like a charm!

Which app you used to convert csv to db3?

If it was SQLite Expert Professional, how did you manage to import csv base in form of:

IPfrom IPto Country

002.000.000.000 002.015.255.255 France

When I try to import I get this:

IPfrom IPto Country

2.0.0.0 2.15.255.255 France

because of that my Traced_access_log-2013-01-18.txt have less resolved IP's

Link to comment
Share on other sites

Yes, there's the little catch that common format of IPv4 doesn't carry leading zeroes, which are needed for correct collation. Without leading zeroes, collation is guaranteed to get wrong results, so don't even try.

I imported using Expert (my all-time favorite SQLite companion) then used a little extension of mine to reformat the IPs with leading zeroes.

Of course you can as well do the parsing/reformatting with AutoIt code.

Here's the statement I used, where ipmin and ipmax are the decimal values loaded from the CSV file and ipfrom, ipto are two columns with IPv4 as strings as you see them in my example DB:

update geo set ipfrom =printf('%03lli', cast(ipmin / (256 * 256 * 256) as integer)) || '.' ||printf('%03lli', cast((ipmin % (256 * 256 * 256)) / (256 * 256) as integer)) || '.' ||printf('%03lli', cast((ipmin % (256 * 256)) / 256 as integer)) || '.' ||printf('%03lli', ipmin % 256) ,ipto =printf('%03lli', cast(ipmax / (256 * 256 * 256) as integer)) || '.' ||printf('%03lli', cast((ipmax % (256 * 256 * 256)) / (256 * 256) as integer)) || '.' ||printf('%03lli', cast((ipmax % (256 * 256)) / 256 as integer)) || '.' ||printf('%03lli', ipmax % 256)

The printf() extension function comes from the SQLite extension discussed and its loading procedure for use from AutoIt Search unifuzz author jchd in the forum to find more related posts. If you use it be sure to read and understand the lengthy comments at the head of its C source.

Feel free to chime if you hit a wall or need any guidance regarding this.

EDIT: I now remember than Expert freeware doesn't include extension autoloading. You'll have to load it manually thru an SQL tab before use. Be aware that loading the extension overloads several internal functions and collations besides adding its own set. The overloading versions merely cover unaccentuation of a large subset of Unicode but should be 99.9% compatible with vanilla functions (e.g. german 'ß' and friends: 'ß' LIKE 'ss' is now true). I guess that those linguistic changes don't interfere with your use case.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

There probably isn't a person who cares, but I took this as a challenge to see how fast I could get Excel to do this and it's still not as fast as jchd's solution, but I cut it down to under 5 seconds, anyway. water's is required:

#include <Excel ReWrite.au3>
#include <Array.au3>

$oExcel = _Excel_Open(False)
$Log = @ScriptDir & "\log.xlsx"
$oExcel_log = _Excel_BookOpen($oExcel, $Log)
$aArray_log = _Excel_RangeRead($oExcel, $oExcel_log)

$Geo = @ScriptDir & "\geo.xlsx"
$oExcel_geo = _Excel_BookOpen($oExcel, $Geo)
$aArray_geo = _Excel_RangeRead($oExcel, $oExcel_geo)
_Excel_BookClose($oExcel, $oExcel_geo, False)

Global $aiWritten
$begin = TimerInit()
For $i_log = 1 To UBound($aArray_log) - 1
    If Not IsArray($aiWritten) Or _ArraySearch($aiWritten, $i_log) = -1 Then
        $IP = $aArray_log[$i_log]
        $temp = StringSplit($IP, '.')
        $decIP = (Int($temp[1]) * 256 * 256 * 256) + (Int($temp[2]) * 256 * 256) + (Int($temp[3]) * 256) + Int($temp[4])
        $aiSearch = _ArrayFindAll($aArray_log, $IP)
        For $i_geo = 0 To UBound($aArray_geo) - 1
            If $decIP >= Int($aArray_geo[$i_geo][0]) And $decIP <= Int($aArray_geo[$i_geo][1]) Then
                For $iFound = 0 To UBound($aiSearch) - 1
                    _Excel_RangeWrite($oExcel, $oExcel_log, Default, $aArray_geo[$i_geo][2], $aiSearch[$iFound] + 1, 2)
                    If Not IsArray($aiWritten) Then
                        Global $aiWritten[1] = [$aiSearch[$iFound]]
                    Else
                        _ArrayAdd($aiWritten, $aiSearch[$iFound])
                    EndIf
                Next
                ExitLoop
            EndIf
        Next
    EndIf
Next
$dif = Round(TimerDiff($begin) / 1000, 2)
MsgBox(0, "Time Difference", $dif)

_Excel_BookSave($oExcel, $oExcel_log)
_Excel_BookClose($oExcel, $oExcel_log)
_Excel_Close($oExcel)
P.S. You'll need my edit of the page__st__120#entry1060576 function. Edited by GMK
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...