Jump to content

Recommended Posts

Posted (edited)

At the office lately, I seem to have become the 'reports' guy.  Any time some kind of report is needed, that compiles information from several disparate sources, I tend to be the one asked to do it.  The biggest problem I run into is that there is not a consistent name for each person accross all systems. (A problem with multinational families of companies, I guess.)  One source may use one logon account name, with one version of a real name.  Another system may use just the real name, but the legal version.  Yet another uses a different account name, and the person's nickname.

Anyway, manually matching all of these names up each and every time is getting to be a problem.  After the third report, taking an extra week just for this one step, I got tired of it, and wrote a Names Aggregator.

It is fully functional, but need a lot of polishing.

The names database is stored using SQLite.  I will eventually make a version with file locking so that the database can be shared on a network, but that is for another time.

------------

HOW IT WORKS

------------

You call an aggregated lookup, passing a delimited string of names that your current source has in it for the current person.  The aggregator will search the database for any matching names, then return the preferred name known for that person.  This then becomes the common name for correlating data.  Any names provided that are not currently known for that person are then added to the database to make finding that person easier in the future.  If the person is entirely unknown, a new person record is created, and populated with the names you provided.

It really is a niche tool, but can be a real time saver.  For one major annual set of reports I do, this has reduced a three week long project down to about six hours.

:ILA2:

v2.0 - Now able to do lookups without adding names to the database.  Added _NameAggregator_Realign() function for doing some database cleanup.  The new function supports using a callback function for customizing the database cleanup or creating your own utility for manually maintaining the names database.

  Reveal hidden contents

 

v1.3 - improvements thanks to jchd.

  Reveal hidden contents

v1.2 - adds case insensitive matching

  Reveal hidden contents

v1.1 - includes SQLite fix

  Reveal hidden contents

and here's a simple example script

#include "NameAggregator.au3"


$NAobj = _NameAggregator_Open()
MsgBox(0, "test", _NameAggregator_Lookup("!John Q Public|jpublic", $NAobj))
MsgBox(0, "test", _NameAggregator_Lookup("jpublic|Jonny Public", $NAobj))
MsgBox(0, "test", _NameAggregator_Lookup("jpublic|Jonnathan Public", $NAobj))
MsgBox(0, "test", _NameAggregator_Lookup("Jonny Public|jqpublic", $NAobj))
MsgBox(0, "test", _NameAggregator_Lookup("jqpublic", $NAobj))
_NameAggregator_Close($NAobj)


; If all goes well, you should get "Jonh Q Public" returned 5 times
Edited by willichan
  • 3 weeks later...
Posted

Nice. I like it that you're not using Global variables.

UDF List:

  Reveal hidden contents

Updated: 22/04/2018

  • 2 weeks later...
Posted

Don't escape column Id and person since they are integer literals (e.g. line 186). It might bite you some day.

You're lucky that SQLite is smart enough to convert back integers as strings to integers when the column has type integer but realize that those two conversions are useless.

Allow me to add a few remarks:

  Shouldn't both ID columns be declared autoincrement?

  person shouldn't it be a foreign key for persons.id ?

  after inserting a new row in names, simply use _SQLite_LastInsertRowid to retrieve the ID used.

All in all, why not use a single self-joined table (unless you have other developments in mind)

  Reveal hidden contents

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Posted
  On 7/12/2013 at 10:53 PM, jchd said:

Don't escape column Id and person since they are integer literals (e.g. line 186). It might bite you some day.

You're lucky that SQLite is smart enough to convert back integers as strings to integers when the column has type integer but realize that those two conversions are useless.

 

Thanks.  When I encountered a problem with an unescaped name, I guess I rushed implementing the fix, and started coloring outside the lines.  :doh:

  On 7/12/2013 at 10:53 PM, jchd said:

Shouldn't both ID columns be declared autoincrement?

 

I thought SQLite automatically increments if the primary key is an integer, and not provided in the INSERT statement.  Is it better to explicitly declare it?

  On 7/12/2013 at 10:53 PM, jchd said:

after inserting a new row in names, simply use _SQLite_LastInsertRowid to retrieve the ID used.

 

Thank you.  That is a lot simpler.

  

  On 7/12/2013 at 10:53 PM, jchd said:

  person shouldn't it be a foreign key for persons.id ?

  All in all, why not use a single self-joined table (unless you have other developments in mind)

 

This was a larger project for creating some security auditing reports that spanned multiple systems, in multiple locations, administered by groups with different naming standards, with different ... :mad2: ... you get the picture.

I just yanked the names portion out, since I could see some use for it elsewhere.  There are originally other tables with security access data that point back to the person record.  If it can be done in one table, I am in favor of it.

I will have to admit that, though I use databases in alot of solutions, I am not much of a database developer.

How would you go about making this into a self-joined table?  Do you mean pointing each name back to a record in the same table that is the preferred name?  If the preferred name were never to be updated, then I would definately see that as more efficient, but since the preferred name can be changed at any time, I think updating all of the related name records to point to the new name would loose that efficiency.

Thank you for the help.  I am updating the code in post#1 with the fixes you mentioned above.

Posted

Just for the record (I'm not there to drive your ideas into anything residing in my own head!), you can simplify the current schema into that:

CREATE TABLE [People] (
  [Id] INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT, 
  [Name] CHAR NOT NULL COLLATE NOCASE, 
  [PreferredId] INTEGER REFERENCES iD ON DELETE SET NULL ON UPDATE RESTRICT DEFERRABLE INITIALLY DEFERRED);

Have PreferredId = Id to denote the preferred ID of someone. That's pretty easy to manage, at least for a simple design as the one you posted. Now if the project evolves into a much more ambitious/complex development, all bets are off. But in such case it's fairly normal to have to reconsider the DB design, just like the need to evolve an application may imply deep changes in its OOP class hierarchy.

Insert a new person with PreferredId = Id. Insert a new alias with PreferredId = Null then (in the same transaction) obtain the Id of the preferred name and update the newly inserted row. Deleting a preferred name is similar, changing one as well. Experimentation will tell if a compound index helps in practice; unless pressed to do so, leave that alone.

Explicitely declaring a PK autoincrement makes it clear of your intention of considering this column as an internal-only, meaningless, identifier. It also eases porting to a possible future other engine, if ever you evolve this into a shareable, wider used audit/management tool.

  Reveal hidden contents

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

  • 11 months later...
Posted

Posted v2.0 in the first post above.

This version adds an optional parameter to _NameAggregator_Lookup() to allow you to do a name lookup without adding any new name entries into the names database.

The major improvement is the _NameAggregator_Realign() function.

Depending on the order in which names are submitted to the aggregator, is it possible for a name that is already associated with one person record to become associated with a second person record.  From that point forward, that name will return results for the person record with the lower ID.  This function has been added to facillitate repairing this problem.  This can be handled using the internal 'fully automatic' mode, or using your own callback function to implement your own rules for correcting the records, or for creating an interactive script for human intervention.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...