Jump to content

Sqlite question


Pain
 Share

Recommended Posts

I'm having some serious trouble with a sql question.

I know it's not really Autoit related but I'm using sqlite in autoit so...

I'll explain more. I'm logging some things for a game in a table that looks like this:

"CREATE TABLE IF NOT EXISTS `user` (`id` INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL, " & _
    "`account` TEXT NOT NULL UNIQUE," & _
    "`ip1` TEXT," & _
    "`ip2` TEXT," & _
    "`ip3` TEXT)")

I've removed some columns that's not important in this case.

An account can have 1 and up to 3 IP's related to that account.

The same IP can appear several time, for same or another account.

Let's say the table looks like this:

1   test1   127.0.0.1   127.0.0.2   127.0.0.1
2   test2   127.0.0.1   127.0.0.3
3   test3   127.0.0.3   127.0.0.4
4   test4   127.0.0.5
5   test5   127.0.0.6   127.0.0.7   127.0.0.4

Now I like to find all accounts that somehow are related to each other.

As you can see account test1 can be connected to test2 since both have the IP 127.0.0.1 in common. The IP 127.0.0.2 can not be connected to any more account, this far.

Indirect these two account (test1, test2) can be connected to test3 because of the ip 127.0.0.3 and so on.

In the end these accounts can be connected to each other in one way or another:

test1

test2

test3

test5

Account test4 can not be related to any other accounts, this far.

Atm I'm using something like this (pseudo code):

SELECT "ip1", "ip2", "ip3" FROM "main"."user" WHERE account = "test1";

To get all IP's to the first account.

127.0.0.1

127.0.0.2

SELECT "account" FROM "main"."user" WHERE ("ip1" = "127.0.0.1") OR ("ip2" = "127.0.0.1") OR ("ip3" = "127.0.0.1");

To get all accounts with the first IP.

test1

test2

SELECT "account" FROM "main"."user" WHERE ("ip1" = "127.0.0.2") OR ("ip2" = "127.0.0.2") OR ("ip3" = "127.0.0.2");

test1

If returned results are greater than 1 AND account NOT = "test1" Then

We found account "test2"

SELECT "ip1", "ip2", "ip3" FROM "main"."user" WHERE account = "test2";

........

EndIf

Log "127.0.0.2" in a text file for further use, and in case account "testX" would contain the IP.

That's pretty much how far I've come.

I don't know if this really in the best way to save the IP's, or if even sqlite (or any other sql, like mysql) is the best way to accomplish it.

I'm very new with this whole database thing. I've been thinking in ways of using some kind of hash table, tuple etc., but the problem is the multiply IP's.

Atm I'm using a normal text file where I write, or copy, the information to then going through it manually.

The thing is that the number of account, and IP's, increase exponentially for every "new" IP.

As you might understand this is very time consuming, especially since I have thousands of possible accounts to go through.

I would appreciate any kind of help that can cut down the time it would take to do this manually. Whether it's another way to implement it, or a totally new approach.

Edited by Pain
Link to comment
Share on other sites

You're right in that SQl is the right language, SQLite can do it (and much more btw) but ... this isn't the good way to solve this problem efficiently.

In the same way you wouldn't store order lines in the orders table, you wouldn't store all songs by a given artist as columns of an artist table, ...

Sounds like (but like only, I'm not sure) you want to enumerate cliques in a decorated graph.

Beware to your terminology: user, account and computer are three distinct beasts and you seem to mix them. If ever your project grows up, you'll get lost, or future projects will take longer than needed, if things are not clearer in your mind. Don't take that remark bad, I intend nothing insulting.

BTW you should explain: "The same IP can appear several time, for same or another account." <-- that's unclear

Another point: are you talking about IP or subnets (IP ranges)? That could change the design completely.

Store IPs in three digits (add leading zeroes if needed). Then you can sort on the column.

From what you said, you need:

-) a table of IPs with IPid (integer primary key), unique IP as text (create an index on it).

-) a user (or is it account, or computer) table with its UserId unique integer primary key (UIPK)

-) a table for User <--> IP relations, Id as UIPK, UserId as foreign key to User table, IPid as foreign key to IP table. There will be 1, 2 or 3 rows per account.

Now you can solve about all questions. Nonetheless be aware that finding extremum (maximum, or minimum) cliques in such graph can be expensive if the graph gets large.

But before rushing to code anything, think twice about your model and try to answer my questions above to clarify things. Then and before rushing coding anything in a programming language, download and get fluent with a good SQLite manager. I can only recommend once more the excellent SQLite Expert. If this is in enterprise context, go for the Pro version as it will save you days/weeks of work.

EDIT: corrected typo (decorated tree --> decorated graph) I'm talking to much about trees currently. Must be a Freudian slip: my garden needs real care!

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Thanks jchd, I believe that helped be quite a bit in the right direction. I'll look closer into it in the morning.

To answer some of your questions, I believe I only mentioned accounts, user is only the table name I chosed.

"The same IP can appear several time, for same or another account." <-- that's unclear

Sorry about that. I meant like in the example table where 127.0.0.1 can be seen twice for test1 (same), and once for test2 (another).

I'm only interested in single IP's, no ranges.

Your way to build up the tables seems indeed to be better than my way. However, I were actually thinking about to move the IP's into it's own table to avoid redundancy.

Atm I'm using SQLiteman, but I will for sure take a look at SQLite Expert too.

Link to comment
Share on other sites

But what's the purpose of having dupplicate IPs for the same user?

No range means _much_ simpler stuff!

Read again, I was talking about 3 tables: IP, user, relations (call it "links").

BTW, you don't have to use "main"."user" in SQLite. user will do. But warning that text litterals in SQL use single quotes 'I am a text' with doubling to escape 'I''m a text as well' while column, table or database names _may_ use double quotes or square brackets: [column1] or "column1".

Also, SQLite supports it happily but I warn against using Unicode (non-ASCII) characters in table or column names as that can lead to problems if ever you need to use an ODBC layer or something that doesn't cope with such set.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

But what's the purpose of having duplicate IPs for the same user?

The columns stands for Register IP, Last used IP (web) and Last used IP (game), hence if you register and log in (game or web) several of the columns will have same value. But if you log on at a later time the Last used IP will change, but Register will only be set once and will never change for the specific account.

Yea, I know you used 3 tables. I just told what I had in mind before. The thing was that I didn't know how to link them properly. Now, thanks to you I do.

BTW, you don't have to use "main"."user" in SQLite. user will do. But warning that text litterals in SQL use single quotes 'I am a text' with doubling to escape 'I''m a text as well' while column, table or database names _may_ use double quotes or square brackets: [column1] or "column1".

That was SQLiteman adding the database name :mellow:

I noticed the single quote when I used _SQLite_Escape, but it won't really be a problem since account names are restricted to a-z0-9.

Edited by Pain
Link to comment
Share on other sites

The columns stands for Register IP, Last used IP (web) and Last used IP (game), hence if you register and log in (game or web) several of the columns will have same value. But if you log on at a later time the Last used IP will change, but Register will only be set once and will never change for the specific account.

Ok, got it. In the example of your first post, you say that users are "connected" if they share anyone of the 1, 2 or 3 IPs they may have. So as I see it, there is little point in handling them separately. It suffice to enter them with redondancy if needed (creating another row in the link table if ever test3 obtains 127.0.0.3 as new "last game IP"). Since the registered IP won't change, it's entry will stay there once created. Maybe you have other requirements which dictate to handle with the 3 IP types differently.

OK, now, what problem will you have to solve, beside finding cliques (group of people connected)?

If you won't have other special needs, then I believe there is a much more efficient way to organize, query and view/retrieve data. I already have "almost fit" code for that ready, just need a bit of review and corner case debugging. It uses a tree implemented using integer intervals.

I noticed the single quote when I used _SQLite_Escape, but it won't really be a problem since account names are restricted to a-z0-9.

I meant your example uses "127.0.0.1", while it should have been '127.0.0.1'. SQLite correctly interprets this correctly (i.e. by correcting this mistake), but not all RDBMS are that smart!

BTW, instead of _SQLite_Escape, I use more efficient functions:

Func X($s)
    Return ("'" & StringReplace($s, "'", "''", 0, 1) & "'")
EndFunc ;==>X

Func XX($s)
    Return (",'" & StringReplace($s, "'", "''", 0, 1) & "'")
EndFunc ;==>XX

Func Y($s)
    Return ("X'" & Hex($s) & "'")
EndFunc ;==>Y

Func YY($s)
    Return (",X'" & Hex($s) & "'")
EndFunc ;==>YY

Use them this way:

_SQLite_Exec($hADB, "insert into colis (numcolis, datTimbreAdate, zip, pays, flgSuiviFreq, flgSAVemail) values(" & _
                                                X($ncolis) & _
                                                XX($dateExp) & _
                                                XX($cp) & _
                                                ",(select paysIso from pays where nom like "&X($pays&'%')&")" & _
                                                XX($aSuivre) & _
                                                XX($ReclameAuto) & _
                                                ");")

X($var) escapes $var

XX($var) ditto but prepends a comma (second and subsequent values)

Y($bin) escapes binary variable

yy($bin) ditto with a comma prepended

That saves typing and keeps SQL readable.

I'll be there tomorrow if you're interested by some of my code, but I don't know when exactly.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Only tested with values given will need some more added to test it further :mellow:

#include <sqlite.au3>
#include <sqlite.dll.au3>
#include <Array.au3>

Global $aIpList1[4] = ["test1", "127.0.0.1", "127.0.0.2", "127.0.0.1"]
Global $aIpList2[4] = ["test2", "127.0.0.3", "127.0.0.2", "127.0.0.1"]
Global $aIpList3[4] = ["test3", "127.0.0.3", "127.0.0.4", ""]
Global $aIpList4[4] = ["test4", "127.0.0.5", "", ""]
Global $aIpList5[4] = ["test5", "127.0.0.6", "127.0.0.7", "127.0.0.4"]

Local $hQuery, $aRow
_SQLite_Startup()
_SQLite_Open(@ScriptDir & "\log ip.db")
_SQLite_Exec(-1, "CREATE TABLE Accounts (Name TEXT , IpId INTEGER);")
_SQLite_Exec(-1, "CREATE TABLE IpNos (IpId INTEGER PRIMARY KEY, IpText TEXT);")

_InsertIp($aIpList1)
_InsertIp($aIpList2)
_InsertIp($aIpList3)
_InsertIp($aIpList4)
_InsertIp($aIpList5)

$sResult = _matches("test1")

MsgBox(0, "String matches", $sResult)

_SQLite_Close()
_SQLite_Shutdown()
Exit

Func _InsertIp($aIpList)
    Local $hQuery, $iCheckIp, $aIpUnique
    $aIpUnique = _ArrayUnique($aIpList)
    For $i = 2 To $aIpUnique[0] Step 1
        If $aIpUnique[$i] <> "" Then
            _SQLite_Query(-1, "SELECT IpId FROM IpNos WHERE IpText = '" & $aIpUnique[$i] & "';", $hQuery)
            _SQLite_FetchData($hQuery, $iCheckIp)
            _SQLite_QueryFinalize($hQuery)
            If $iCheckIp[0] = "" Then
                _SQLite_Exec(-1, "INSERT INTO IpNos ('IpText') VALUES ('" & $aIpUnique[$i] & "');")
                _SQLite_Query(-1, "SELECT IpId FROM IpNos WHERE IpText = '" & $aIpUnique[$i] & "';", $hQuery)
                _SQLite_FetchData($hQuery, $iCheckIp)
                _SQLite_QueryFinalize($hQuery)
            EndIf
            _SQLite_Exec(-1, "INSERT INTO Accounts ('Name','IpId') VALUES ('" & $aIpUnique[1] & "','" & $iCheckIp[0] & "');")
        EndIf
    Next

EndFunc   ;==>_InsertIp

Func _matches($sName)
    Local $iMatches = 1, $iNewMatches, $sNameMatches, $sIpMatches, $aResult, $iRows, $iColumns
    $sNameMatches = $sName
    Do
        $iMatches = $iNewMatches
        _SQLite_GetTable(-1, "SELECT DISTINCT IpText FROM Accounts,IpNos Where Name IN ('" & $sNameMatches & "') AND IpNos.IpId = Accounts.IpId ; ", $aResult, $iRows, $iColumns)
        _ArrayDelete($aResult, 0)
        _ArrayDelete($aResult, 0)
        $sIpMatches = _ArrayToString($aResult, "','")
        _SQLite_GetTable(-1, "SELECT DISTINCT Name FROM Accounts,IpNos Where IpText IN ('" & $sIpMatches & "') AND IpNos.IpId = Accounts.IpId ; ", $aResult, $iRows, $iColumns)
        _ArrayDelete($aResult, 0)
        _ArrayDelete($aResult, 0)
        $sNameMatches = _ArrayToString($aResult, "','")
        $iNewMatches = UBound($aResult)
    Until $iNewMatches = $iMatches
    Return StringReplace($sNameMatches, "','", @CR)
EndFunc   ;==>_matches
GDIPlusDispose - A modified version of GDIPlus that auto disposes of its own objects before shutdown of the Dll using the same function Syntax as the original.EzMySql UDF - Use MySql Databases with autoit with syntax similar to SQLite UDF.
Link to comment
Share on other sites

OK, now, what problem will you have to solve, beside finding cliques (group of people connected)?

If you won't have other special needs, then I believe there is a much more efficient way to organize, query and view/retrieve data. I already have "almost fit" code for that ready, just need a bit of review and corner case debugging. It uses a tree implemented using integer intervals.

That's the only thing I'm looking for.

All 3 IP's are treated the same, no matter if it's a IP1, 2 or 3.

I already have another script that takes regular dumps (once a week) of our main database to get the latest IP2 and 3 (Last used). All I have to do is to go through the dump and find "cliques" and list the accounts.

I'd love to see if you have a even better way solving it.

I meant your example uses "127.0.0.1", while it should have been '127.0.0.1'. SQLite correctly interprets this correctly (i.e. by correcting this mistake), but not all RDBMS are that smart!

I typed on free hand and most of the code were pseudo code. The real code looked similar to this.

"INSERT INTO `user` (`account`, `ip1`, `ip2`, `ip3`) " & _
        "VALUES (" & $account & "," & $ip1 & "," & $ip2 & "," & $ip3 & ")")

Where $account, $ip etc. are all cleaned up and then _SQLite_Escape'd.

$ip1 = "127.0.0.1"

$ip1 = _SQLite_Escape($ip1)

@Yoriz

Thanks a lot. Seems like a nice and clean start, even if it failed to find the connection to test5 (127.0.0.4).

Only returns test1, test2, test3.

I haven't debugged the code yet, I'll see what jchd had in his mind first, but else I'm sure I can make the final adjustments to make it work properly.

Link to comment
Share on other sites

Not for me. I'll see if an update of my Autoit will help. Haven't used Autoit for a while so I'm still at version 3.3.0.0 :mellow:

I'll get back to you whether it solved it or not.

Edit: Yea, that made it.

Edited by Pain
Link to comment
Share on other sites

Geez, 3.3.0.0 is centuries old in regard of the rate of releases/betas. There have been a full lot of changes (both fixes for more or less corner cases and huge improvements in speed for large tables) in the SQLite area in the meantime and I know what I'm talking about :mellow:. Also the joined SQLite.dll (v3.6.22) is way better than before. Latest version 3.6.23 released tonight works well also, but uprading is optional for most users.

There's no question that a simple design can be made to work (your first design will do as well), but the harder part is enumerating cliques and cliques contents. What I think about is certainly more complex at first, but has the advantage to group cliques naturally so that the hard work is already done on the fly by SQL while inserting user-ip rows. I'm kind of purist or rather haircutting pervert, you know! When I come up with an algorithm working for N=5, I tend to ask myself: "how will it behave with N=10^6?". That's how I discovered that old _SQLiteGetTable* functions had complexity in O(N^2), which made them unpractical even for medium tables. The new versions are in O(N), like they should be, with only a little more code involved.

To get back to you, there is a choice: either have simple design with simple insertion/deletion strategy but complex enumeration, or have a less obvious design, less obvious insertion/deletion code but straightforward enumeration. Since design and code writeup is a one time thing while enumeration is routine, I tend to favor the latter approach.

To make an analogy, think about the structure of a simple .INI file. Imagine the sections are cliques which build up while entering data and say the key=value correspond to the user=IP in your case. Then it's clear that listing the ini file gives you the answers you need. The catch is inserting user=IP pairs in the right clique, possibly merging them if needed. That's what I think about. From what you say, you rebuild the graph each time, so there is no deletion, just inserts and merges.

I've just finished a set of functions to handle INI-style functionality for very-large datasets. I'm going to clone this and change two or thre details/names so it fits your framework and post it.

Cheers,

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

@Pain,

Sorry for the delay. I've had a huge arrival of goods and that ate most of my time/forces up to now.

Nevertheless, it forced me to think to something else and that proved healthy, as when I look at your post again, I discover my mistake. You are _not_ looking for cliques but for connected subgraphs. (A clique is a part of a graph where any two vertices are connected by an edge.)

That makes your problem eminently easier, by a large margin! My bad: I've been probably obsessed by solving hard questions lately having cliques as part of the solution.

I wrote a small bit of code to show how you can do what you want efficiently. Here's is how it works in plain language.

First start with an empty base, since if you populate it at once, _you_ will have to write code to perform the computation, instead of using the power of SQL to do it in your place! That may seem pedantic, but why write several loops in AutoIt (slow interpreted language) when a single SQL (SQLite is in fine-tuned C) statement can do the very same work?

Our table will hold:

account as text

IP as text

usergroup as integer

We insert every pair of {account, IP} data in a loop. We also insert a first "version" of usergroup. It's an integer (groupnumber) that we increment at each insert, which will finally determine which users are in which connected group. We choose to insert either [the groupnumber of a previous row using the same IP] or [the next available groupnumber] if there is no such row. From now on, we won't have to touch IPs anymore. This is taken care of with:

insert into groups (account, ip, Usergroup) values ('&X($testData[$i][0])&XX($testData[$i][1])&', coalesce((select min(Usergroup) from groups g where g.ip = '&X($testData[$i][1])&'), '&$GroupCount&'));

And we are done! No, I'm only half joking: there's is no more code than that! But the rest is achieved by the hard worker: we have an "after insert" trigger, which fires, obviously, after each insert above.

The triggers does the following, in plain english: if there exists a previous row having the same account and a different usergroup, then we update all rows holding that previous usergroup value to the one of the row we just inserted. End. Of course it sounds less "plain" once put in SQL, but it's easier than one think.

Rationale: when we have a new {account, IP} pair, we have 4 cases:

  • account is a new one and IP is a new one
  • account already exists and IP is a new one
  • account is a new one and IP already exists
  • account already exists and IP already exists
During the insert,

In cases 1. and 2. we simply store a new groupumber as usergroup

In cases 3. and 4. we store the usergroup of the previously recorded IP

Having done so, we have mapped IPs to IP-groups. We now need to map "groups of same users" to the IP-groups. This is done by the trigger, as explain above. If a user has more than one IP in two groups, we merge the groups by updating the previous IP-group value to the value of the new IP-group.

Of course these groupnumbers don't mean anything, they simply serve to differentiate groups.

I've written a quick bit of code to print a sensible output, or write it to a text file.

I hope this all makes sense. Forgive my temporary "obsession" for wrong reading of your question!

#include <SQLite.au3>

#include <SQLite.dll.au3> ; comment out to use you own sqlite3.dll

Local $testData[44][2] = [ _

['user01', '111.000.000.001'], _

['user02', '111.000.000.002'], _

['user03', '111.000.000.104'], _

['user04', '111.000.000.002'], _

['user05', '111.000.000.005'], _

['user06', '111.000.000.004'], _

['user07', '111.000.000.055'], _

['user08', '111.000.000.028'], _

['user09', '111.000.000.009'], _

['user10', '111.000.000.000'], _

['user11', '111.000.000.011'], _

['user12', '111.000.000.062'], _

['user13', '111.000.000.008'], _

['user14', '111.000.000.071'], _

['user15', '111.000.000.054'], _

['user16', '111.000.000.003'], _

['user17', '111.000.000.204'], _

['user18', '111.000.000.054'], _

['user19', '111.000.000.004'], _

['user20', '111.000.000.104'], _

['user01', '111.000.000.011'], _

['user02', '111.000.000.012'], _

['user03', '111.000.000.013'], _

['user04', '111.000.000.014'], _

['user05', '111.000.000.015'], _

['user06', '111.000.000.066'], _

['user07', '111.000.000.017'], _

['user08', '111.000.000.018'], _

['user09', '111.000.000.019'], _

['user10', '111.000.000.010'], _

['user11', '111.000.000.101'], _

['user12', '111.000.000.084'], _

['user13', '111.000.000.003'], _

['user14', '111.000.000.041'], _

['user01', '111.000.000.006'], _

['user02', '111.000.000.005'], _

['user03', '111.000.000.000'], _

['user04', '111.000.000.008'], _

['user05', '111.000.000.001'], _

['user06', '111.000.000.026'], _

['user07', '111.000.000.014'], _

['user08', '111.000.000.003'], _

['user09', '111.000.000.007'], _

['user10', '111.000.000.000'] _

]

; wake up SQLite to put it to work

_SQLite_Startup()

; create a memory database (or a _new_ disk database)

;~ Local $hdl = _groupDbCreate() ; <-- memory database

Local $hdl = _groupDbCreate("test.db3") ; <-- you can give a filename here, so that the database is disk-based and can be used later

; place everything in a transaction (will speed up things in case we have a disk-based base)

_SQLite_Exec($hdl, "begin immediate;")

; Used to give account-groups unique numbers; it isn't the final count of groups

Local $GroupCount = 1

; data entry loop, which incidentally also triggers account-groups computation transparently

For $i = 0 To UBound($testData) - 1

; avoid entering twice the same information

If _Sl3Answer($hdl, 'select exists(select * from groups where account='&X($testData[$i][0])&' and ip='&X($testData[$i][1])&');') = '1' Then ContinueLoop

; that creates a new row with {account, IP} data ... and fires the insert trigger which does all the dirty job for us

_SQLite_Exec($hdl, 'insert into groups (account, ip, Usergroup) values ('& _

X($testData[$i][0])& _

XX($testData[$i][1])& _

', coalesce((select min(Usergroup) from groups g where g.ip = '&X($testData[$i][1])&'), '&$GroupCount&'));')

; wow, that's the _only_ "computation" we have to do at the application level

$GroupCount += 1

Next

; commit all changes to the base

_SQLite_Exec($hdl, "commit;")

; a little writeup on console (or text file)

_Report($hdl)

;~ _Report($hdl, "test.txt")

; SQLite deserve a good rest now

_SQLite_Close($hdl)

; unload .dll (optional but cleaner)

_SQLite_Shutdown()

Func _groupDbCreate($sFilename = '')

If $sFilename <> '' Then FileDelete($sFilename)

Local $handle = _SQLite_Open($sFilename)

; creation statement. This is where the smart code resides, in the "after insert" trigger.

_SQLite_Exec($handle, 'CREATE TABLE IF NOT EXISTS Groups (' & _

"Account TEXT, " & _

"IP TEXT, " & _

"UserGroup INTEGER);" & _

'CREATE INDEX IF NOT EXISTS ixaccount ON groups (account);' & _

'CREATE INDEX IF NOT EXISTS ixIP ON groups (IP);' & _

'CREATE TRIGGER IF NOT EXISTS trGroupUpdate AFTER INSERT ON Groups FOR EACH ROW ' & _

'WHEN exists(select rowid from groups g where g.account = new.account and g.Usergroup <> new.Usergroup) ' & _

'BEGIN update groups set Usergroup = ' & _

'(select min(Usergroup) from groups gr where gr.account = new.account) ' & _

'where Usergroup in (select max(Usergroup) from groups gr where gr.account = new.account);' & _

'END;')

Return $handle

EndFunc

Func _Report($hDb, $sFilename = Default)

Local $hfile, $rows, $nrows, $ncols, $report, $group, $groupnb

; fetch data and build counters at the same time for a smarter result

_SQLite_GetTable2d($hDb, 'select distinct usergroup, account, (select count(distinct account) from groups gr where gr.usergroup = g.usergroup) as usercount, ' & _

'(select count(distinct IP) from groups grp where grp.usergroup = g.usergroup) as IPcount ' & _

'from Groups g order by usercount desc, IPcount desc;', $rows, $nrows, $ncols)

For $i = 1 To UBound($rows) - 1

If $group <> $rows[$i][0] Then

$group = $rows[$i][0]

$groupnb += 1

$report &= StringFormat("Account Group %03i (%i account%s using %i IP%s)\n", $groupnb, $rows[$i][2], _plural($rows[$i][2]), $rows[$i][3], _plural($rows[$i][3]))

EndIf

$report &= StringFormat(" %s\n", $rows[$i][1])

Next

If IsKeyword($sFilename) or $sFilename = '' Then

ConsoleWrite($report & @CRLF)

Else

FileWrite($sFilename, $report)

EndIf

EndFunc

Func _plural($n)

If $n = 1 Then Return ''

Return 's'

EndFunc

Func X($s)

Return ("'" & StringReplace($s, "'", "''", 0, 1) & "'")

EndFunc ;==>X

Func XX($s)

Return (",'" & StringReplace($s, "'", "''", 0, 1) & "'")

EndFunc ;==>XX

; returns the first element of a single-row result as a string (helps keeping higher applicative levels clear)

Func _Sl3Answer($hDB, $sql)

Local $row

Local $rtn = _SQLite_QuerySingleRow($hDB, $sql, $row)

If $rtn = $SQLITE_OK Then

Return ($row[0])

Else

Return (SetError(@error, $rtn, ''))

EndIf

EndFunc ;==>_Sl3Answer

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Thanks for your reply jchd. I'm not very familiar with cliques and connected subgraphs but with some helps from wiki I can tell you are right, connected subgraphs is what I'm looking for.

http://en.wikipedia.org/wiki/Connected_component_%28graph_theory%29

http://en.wikipedia.org/wiki/Clique_%28graph_theory%29

I never really thought about using triggers and native SQLite to improve the performance. It was a huge difference between your and Yoriz code even though your used more inputs.

However if both given examples would have been in native C, wouldn't Yoriz be faster, at least the insert since it doesn't update the values on every insert, but slower to get the the output data (search)?

I believe your use of indexes also improves your code some.

Since I have more than 10k accounts in total (if I filter them to only use active accounts I can get down to some hundreds I hope, maybe thousands) I believe there would be a huge performance hit so I'll try to do as much as I can in SQLite like you explained.

I only inserts the values once but might search through it several times, therefore, if my above theory is correct, your code would be to prefer.

I might make some chances to your code to better fit my needs but overall it looks great.

I can't thank you and Yoriz enough for your help.

Link to comment
Share on other sites

Well, I'm not engaging in a competition, but be aware that Yoriz' code will not look further than the connected graph where the first entry belong.

I mean, try his code with the following data:

Global $aIpList1[4] = ["test1", "127.0.0.8", "127.0.0.7", ""]
Global $aIpList2[4] = ["test2", "127.0.0.3", "127.0.0.2", "127.0.0.1"]
Global $aIpList3[4] = ["test3", "127.0.0.4", "127.0.0.2", ""]
Global $aIpList4[4] = ["test4", "127.0.0.5", "", ""]
Global $aIpList5[4] = ["test5", "127.0.0.6", "127.0.0.5", "127.0.0.4"]

and you can check that the ouput is:

test1

This is leaving out the group of 4 connected accounts test2 to test5. So speed may be a concern, but exhaustiveness should prime over it.

It isn't obvious if, once added the necessary code to achieve exhaustive exact results, it could be any faster, even in C.

Compare to using the equivalent input data:

Local $testData[11][2] = [ _
    ["test1", "127.0.0.8"], _
    ["test1", "127.0.0.7"], _
    ["test2", "127.0.0.3"], _
    ["test2", "127.0.0.2"], _
    ["test2", "127.0.0.1"], _
    ["test3", "127.0.0.4"], _
    ["test3", "127.0.0.2"], _
    ["test4", "127.0.0.5"], _
    ["test5", "127.0.0.6"], _
    ["test5", "127.0.0.5"], _
    ["test5", "127.0.0.4"] _
]
My output is:
Account Group 001 (4 accounts using 6 IPs)
 test2
 test3
 test4
 test5
Account Group 002 (1 account using 2 IPs)
 test1

(it's automatically sorted by decreasing group size in number of accounts, then on decreasing number of used IPs, but it's very easy to change this output format/order.)
Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Since it will work together with another search feature (I track down people violating the TOS) where I first find an account (e.g "test1") and then want to find related accounts to that specific account there is no need to go further finding all other account relations (who follows the TOS).

Since no other accounts are related to test1, with your values, it should only return test1.

Edited by Pain
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...