Jump to content

efficiently searching a large array


Dana
 Share

Recommended Posts

I'm working on a script that (among other things) takes a pair of x,y coordinates and then searches a large (20000 x 6) array for the nearest xy location (the other 4 columns in the array are irrelevant here). It's not quite a straight dist = sqrt((x1-x2)^2 + (y1-y2)^2) but Sinnott's formula for spherical geometry (the xy coords are actually angles, latitude and longitude:

Func _LLDist($lat1, $lon1, $lat2, $lon2)
    ;using Sinnott's formula for great circle distance between two points
    ;first convert degrees to radians for trig functions
    $lat1 = $lat1 * (ACos(-1) / 180)
    $lon1 = $lon1 * (ACos(-1) / 180)
    $lat2 = $lat2 * (ACos(-1) / 180)
    $lon2 = $lon2 * (ACos(-1) / 180)
    Local $R = 6371000 ; Earth radius in meters
    Local $dlon = $lon2 - $lon1
    Local $dlat = $lat2 - $lat1
    Local $a = (Sin($dlat / 2)) ^ 2 + Cos($lat1) * Cos($lat2) * (Sin($dlon / 2)) ^ 2
    Local $c = 2 * ASin(Sqrt($a))
    Local $d = $R * $c
    Return $d
EndFunc   ;==>_LLDist

If no point is closer than the minimum distance ($minsize) then it returns 0 (no match). The actual comparison I'm using now is a simple brute force comparison of each of 20000 lines in the array:

Global $apdb[20000][6]
; the actual $apdb array is read in from a text file
Global $minsize = 1609.344 ; 1 mile
Func _APnearest($alat, $along)
    ;ConsoleWrite("checking stop at " & $alat & "," & $along & ":" & @CRLF)
    Local $mindist = 20015087 ; Earth circumference
    For $a = 1 To $apnum
        Local $apdist = _LLDist($alat, $along, $apdb[$a][3], $apdb[$a][4])
        If $apdist < $mindist Then
            $mindist = $apdist
            $closest = $a
        EndIf
        ;ConsoleWrite($a & " " & $apdb[$a][1] & " " & $alat & "," & $along & "  " & $apdb[$a][3] & "," & $apdb[$a][4] & ", apdist:  " & $apdist & " mindist: " & $mindist & @CRLF)
    Next
    If $mindist < $minsize Then Return $closest ; else returns 0
EndFunc   ;==>_APnearest

The problem is that it takes quite some time to process 20000 lines, so I'm trying to figure out a way to make it faster. I know I could simply stop when it finds the first value less than $minsize, but that's not necessarily unique. I could also pre-check the latitude and longitude for approximate proximity (say, 2 minutes of arc which (for latitude at least) equals 2 nautical miles) and then run the _LLDist function only on the close pairs, but I'm unclear on how much time that function takes compared with stepping through the array.

I know there are ways with single variable and a sorted list to zero in on the target by jumping around without fully checking each value. I suppose I could do that, sorting the array first by latitude, zero in on the target latitude and then do the actual calculation only on close latitudes... so I guess one question is has anybody created or seen a UDF already that does this? Or any other suggestions?

Edited by Dana
Link to comment
Share on other sites

In my signature is a link to a modified array UDF. In that UDF is the _ArrayBinarySearch function modifed to work with 2D arrays.If you sort the array first, and search on the sorted "column" of the array you'll find that this search is much faster than normal searching in an array.

This UDF by the way should be compatible with the normal Array.au3 UDF file except I modified most of the 1D only functions to work with 2D arrays, plus there are some new functions added.

If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Link to comment
Share on other sites

  • Moderators

Dana,

How about a simple flat geometry (root of sum of squared x/y differences) comprison on the first pass and then a full spherical geometry comparison with the X closest matches that you find? That way you get a fast run through the whole array and only do the really expensive calculations on the likely positions. Adjust X as needed depending on the spread of the points. :D

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

Interesting question. This depends entirely on how you have the coordinates organized in your array. You currently have two columns, and if they are layed out in order of coordinate magnitude, the job will be made easier using binary search methods as suggested by BrewManNH. If the array is disorganized, then it will be more difficut to improve on search speed.

Edit

The nearest value for X may not be one of the closest set of coordinates, since the y coordinate may be a long distance away. I think you need to search outwards from an artificial set of coordinates which include the closest value of x and the closest value of y. You will probably need to compare a number of candidate close matches using trig, also taking into consideration the fact that longditudinal lines get closer towards the poles.

Edited by czardas
Link to comment
Share on other sites

It's not going to make the difference between night and day, but I don't see why you'd want to execute this portion 80,000 times:

ACos(-1) / 180

when it could be defined once as a constant...

Edit: fixed whacked-out code tags

Edited by Spiff59
Link to comment
Share on other sites

If you have to perform such massive geodata search, I'd look into the Spatialite extension to SQLite.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Thanks all,

The data in the large array initially isn't sorted. I altered the code to first ReDim the array to the actual size to eliminate blank lines (I initially set it to 25,000 prior to reading the data in whereas the actual number at this moment is 19,782), then I use _ArraySort to sort it on latitude. I then modified my function to use the same kind of search as _ArrayBinarySearch:

Func _APnearest($alat, $along)
    Local $mindist = 20015087 ; Earth circumference

    Local $amid = $apnum / 2
    Local $astart = 1
    Local $aend = $apnum

    While 1

        If ($alat < ($apdb[$amid][3] - $minsize/96560.64)) Then ; meters to minutes of latitude
            $aend = $amid - 1
        Elseif ($alat &gt; ($apdb[$amid][3] + $minsize/96560.64)) then
            $astart = $amid + 1
        Else
            ExitLoop
        EndIf
        $amid = Int(($aend + $astart) / 2)
    WEnd


    For $a = $astart To $apnum ; was 1 To $apnum
        Local $apdist = _LLDist($alat, $along, $apdb[$a][3], $apdb[$a][4])
     If $apdist &lt; $mindist Then
            $mindist = $apdist
            $closest = $a
        EndIf
        If ($alat &lt; ($apdb[$a][3] - ($minsize/96560.64))) Then
     ExitLoop
        endif
$apdist &amp; " mindist: " &amp; $mindist &amp; @CRLF)
    Next
    If $mindist &lt; $minsize Then Return $closest ; else returns 0
EndFunc   ;==&gt;_APnearest

This starts the search at the first location within $minsize to the south, and stops the same distance to the north, which restricts the actual distance calculation to perhaps a couple of hundred locations. The search naturally goes much faster, but the time saved over (in this case) six locations is offset by the time it takes to do the _ArraySort initially. If I was doing a lot more locations sorting first would make a lot of sense; in this case it doesn't seem to make much difference.

Melba23, the time intensive part seems to be stepping through the array... I tried doing a simple latitude comparison first and then the spherical geometry only on the locations close in latitude, and also the root sum of squares as a first test, but it made little or no difference.

Spiff59, I tried your suggestion and made (ACos(-1) / 180) a constant. It made 0.005 seconds difference. I spelled it out each time to make it self documenting code.

At this point it takes under 10 seconds to run six locations. Slower than I'd like, but acceptable.

Edited by Dana
Link to comment
Share on other sites

Spiff59, I tried your suggestion and made (ACos(-1) / 180) a constant. It made 0.005 seconds difference. I spelled it out each time to make it self documenting code.

I was sceptical that 80,000 calls to ACos() and 80,000 division operations could be executed in 5 milliseconds (you have a Cray?), so I threw together a test. The interesting thing is; the 32-bit ACos() function doesn't work! I started a related thread, and just now opened a bug tracker on the issue. I assume you must be running a 64-bit OS, where ACos() does not exhibit the bug?

Self-documentation for that formula, as well as the repeatedly executed "$minsize/96560.64", could also be accomplished by choosing meaningful variable names.

Edited by Spiff59
Link to comment
Share on other sites

Interesting... yes, I'm running 64bit Win7. It's a fast engineering workstation, but no Cray... I'll try it on my 32bit XP laptop.

Of course the timing information I was simply getting from the SciTE console, which I'm sure is not accurate to .005 seconds... but the point is that the difference is not noticeable.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...