Jump to content

Euclidean Distance (Points)


seadoggie01
 Share

Recommended Posts

Recently I was reading this article about "K-nearest neighbors from scratch" (for school), and I was interested in implementing this in AutoIt... so I needed this piece before I started. It was a beast. Please let me know if you have any suggestions/improvements.

The method may seem a bit backwards, but I wanted to be able to use this for an array of any number of dimensions... which was an interesting challenge. It's based around using Execute to return the value of a particular point in an array.

; #FUNCTION# ====================================================================================================================
; Name ..........: _Point_EuclideanDistance
; Description ...: Gets the Euclidean distance between two points (represented as an array like [x, y, z, ..., n]).
; Syntax ........: _Point_EuclideanDistance($aPoint1, $aData2)
; Parameters ....: $aPoint1              - an array of 1 dimensional data.
;                  $aData2              - an array of 1 dimensional data.
; Return values .: Success - the distance between the two points
;                  Failure - False and sets @error:
;                  |1 - array is not 1-D. @extended is set to the argument number.
;                  |2 - the arrays are not the same size.
; Author ........: Seadoggie01
; Modified ......: 3/29/2020
; Remarks .......: The arrays MUST contain only numerical data. This is not verified in the function.
;                  Special thanks to jchd and RTFC for the mathematical explanations on dimensional measurements!
; Related .......:
; Link ..........:
; Example .......: No
; ===============================================================================================================================
Func _Point_EuclideanDistance($aPoint1, $aData2)

    If Not (UBound($aPoint1, 0) = 1) Then Return SetError(1, 1, False)
    If Not (UBound($aData2, 0) = 1) Then Return SetError(1, 2, False)

    If Not (UBound($aPoint1) = UBound($aData2)) Then Return SetError(2, 0, False)

    Local $iSum = 0

    For $i=0 To UBound($aPoint1) - 1
        $iSum += ($aPoint1[$i] - $aData2[$i])^2
    Next

    Return Sqrt($iSum)

EndFunc

 

Edited by seadoggie01
Updated code to use points ;)

All my code provided is Public Domain... but it may not work. ;) Use it, change it, break it, whatever you want.

Spoiler

My Humble Contributions:
Personal Function Documentation - A personal HelpFile for your functions
Acro.au3 UDF - Automating Acrobat Pro
ToDo Finder - Find #ToDo: lines in your scripts
UI-SimpleWrappers UDF - Use UI Automation more Simply-er
KeePass UDF - Automate KeePass, a password manager
InputBoxes - Simple Input boxes for various variable types

Link to comment
Share on other sites

Sorry to jump in your thread but there's something I don't understand.

The Euclidean distance (also called L² distance) is the measure of the distance between two points in Euclidean space of dimension n, that is from vectors in ℝⁿ.

You seem to want to extend this notion of distance to matrices (hence of dimension >= 2, that is between elements of ℝᵏ × ℝⁿ) and I'm unsure the result you get here has the same nice properties as the Euclidean distance in ℝⁿ.

See for instance https://math.stackexchange.com/questions/507742/distance-similarity-between-two-matrices

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Actually, I'm not 100% sure myself. You might be right in that I've implemented this too far. I was honestly just following the code from the link I posted and thought going above to more dimensions would be cool. I didn't think about how matrices would work, I barely understand them as is :D

"Don't ask me, I just wrote the code 😆"

Edit: And please don't be sorry, I'm just happy someone is interested!

Edited by seadoggie01

All my code provided is Public Domain... but it may not work. ;) Use it, change it, break it, whatever you want.

Spoiler

My Humble Contributions:
Personal Function Documentation - A personal HelpFile for your functions
Acro.au3 UDF - Automating Acrobat Pro
ToDo Finder - Find #ToDo: lines in your scripts
UI-SimpleWrappers UDF - Use UI Automation more Simply-er
KeePass UDF - Automate KeePass, a password manager
InputBoxes - Simple Input boxes for various variable types

Link to comment
Share on other sites

A matrix is two-dimensional by definition; higher-dimensional = tensor. Tensor norms (e.g. the Frobenius norm) exist, but are defined differently (see for example here). For matrices, a faster implementation would be:

#include "Eigen4AutoIt.au3"

_Eigen_StartUp()

$rows=123
$cols=234
$matA = _Eigen_CreateMatrix_Random ( $rows, $cols )
$matB = _Eigen_CreateMatrix_Random ( $rows, $cols  )

$specs = ( _Eigen_MatrixSpecs ( _Eigen_CwiseBinaryOp ( $matA, "-", $matB )))

_ArrayDelete($specs,"0-5;10-32")
_ArrayDisplay($specs,"norms")

_Eigen_CleanUp()

The Frobenius norm is the first specs entry (after deleting the other specs).

Edited by RTFC
clarification
Link to comment
Share on other sites

7 hours ago, RTFC said:

<Math mumbo jumbo>

I bow to your superiority. But seriously though, I've only taken Calculus and don't understand that at all. I'm a visual person and can barely think about data in 3 dimensions.

After rethinking about this function, it seems to make sense that I should really only be accepting 2 1-D arrays that represent points in an unlimited dimensional space to get the distance between.

All my code provided is Public Domain... but it may not work. ;) Use it, change it, break it, whatever you want.

Spoiler

My Humble Contributions:
Personal Function Documentation - A personal HelpFile for your functions
Acro.au3 UDF - Automating Acrobat Pro
ToDo Finder - Find #ToDo: lines in your scripts
UI-SimpleWrappers UDF - Use UI Automation more Simply-er
KeePass UDF - Automate KeePass, a password manager
InputBoxes - Simple Input boxes for various variable types

Link to comment
Share on other sites

49 minutes ago, seadoggie01 said:

I bow to your superiority.

Please don't.;)

49 minutes ago, seadoggie01 said:

I've only taken Calculus and don't understand that at all.

No calculus is required, yippeee.:cheer:

You can think of "distance" as: 1. how far is point A from point B in some space with any number of dimensions, and 2. how dissimilar (distant from one another) are two mathematical entities, be it a scalar, a vector, a matrix, or a tensor. There are various so-called "norms" for expressing this, of which the Euclidean norm/distance is one. The problem with high-dimensional entities is that they can be close in some respects and distant/dissimilar in others, so it's not always easy to figure out which method to use for quantifying this (that is, which norm is most appropriate in that particular context). But I think it's great that you're exploring this topic.:thumbsup:

Edited by RTFC
Link to comment
Share on other sites

I second what @RTFC said.

Depending on the nature of the problem you have to solve, some norm may be completely misleading or unsignificant, while another (more or less unintuitive) will provide enlightning results.  It often boils down to which mathematical property your problem requires or implies, which some norm(s) satisfy while other(s) destroy.  Digging further into this for the general case needs non-trivial background as you can imagine.

Clearly, stepping into higher dimensions makes things less simple.  The distance between two points in 1D, 2D, 3D, ... nD is pretty intuitive. But even in as low as 2D, deciding what means the "distance" between two closed curves (say shadows projected on a paper of two different patatoes) is something open to very different interpretations.  Is the distance between the closest points of the two curves more meaningful for your problem than the distance of their center of gravity, or distance between the most distant points, or whatelse?  Just with this example you can see the answer can't be unique and fit every problem involving two closed curves in 2D plane.

Going into higher dimensions generally breaks down intuition because "usual" (everyday life) properties don't hold anymore. Even stepping from 2D to 3D can be tricky: for instance, rotations in 3D aren't commutative anymore (the result depends on the order of applying two or more of them), but they are in 2D.

Nonetheless you can very often rely on results published in the huge freely available academic litterature, as well as practical implementations easy to use provided by computational packages like the one @RTFC has wrapped into his useful UDF.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

@RTFC Wasn't actually bowing, just kidding ;) And I think I see what you are getting at... and how @jchd explained it kind of helps. I think what you're gently and kindly trying to say (and I appreciate that!) is that you can't exactly calculate the distance between two non-similar entities using this method, and that completely makes sense. (Potato example was incredibly helpful!)

Just for kicks and giggles I tested my function on two sets of points from two cubes of size 1 that were directly next to each other. It returned the square of 8, where I was expecting 1, which proves both of your points. Each of the points I passed were 1 unit away from their counterparts, so the total was 8, which it took the square of. Then I realized that the distance would be the distance between two the center points (assuming equal objects).

So I should probably rename this to something like point euclidean distance and reign the dimensions back down :)

Thank you for the explanations!

Edit: Also, jchd, your first post makes total sense now that I re-read it

Edited by seadoggie01

All my code provided is Public Domain... but it may not work. ;) Use it, change it, break it, whatever you want.

Spoiler

My Humble Contributions:
Personal Function Documentation - A personal HelpFile for your functions
Acro.au3 UDF - Automating Acrobat Pro
ToDo Finder - Find #ToDo: lines in your scripts
UI-SimpleWrappers UDF - Use UI Automation more Simply-er
KeePass UDF - Automate KeePass, a password manager
InputBoxes - Simple Input boxes for various variable types

Link to comment
Share on other sites

11 hours ago, seadoggie01 said:

just kidding

Me too.:D And apologies for the mumbo jumbo.

Have you ever read "Flatland" by E. Abbott? It's about a sphere that visits a 2D world, appearing and disappearing as it briefly intersects that plane, thereby disturbing the complacent existence of A. Square (who eventually gets locked up in an insane asylum). Higher-dimensional entities have features that remain either completely hidden or are only perceived in a partial, distorted fashion. That means the rules for an operation may be different depending on their dimensionality (for example, multiplying two matrices is not the same as multiplying each pair of corresponding cell values, as you would in the scalar case), or the operation itself may not even exist (for example, you cannot divide one matrix by another). Think of the difference between a 3D coffee pot and a 2D picture of a coffee pot. How much coffee can the 2D pot contain? Answer: it does not compute. The 3D complexity of "volume" is completely hidden in the 2D case. We could use the 2D area of the coffee pot in the picture instead, but we'd have to apply a different calculation for that, and you still can't put your coffee in there.:P

In your case, distance/dissimilarity can be expressed in many different ways, depending (for example) on how exceptional we consider outliers in a single dimension to be. The Euclidean version implicitly expects outliers to be fairly common, because you take the square of the difference per dimension (exponent = 2, also called the L2-norm). If, on the other hand, a big difference in even a single dimension is a big deal (even if the differences in a thousand other dimensions are tiny), then you would raise that exponent (Lmax norm), causing the effect of any single big dim-difference on the total sum to be magnified. The opposite effect is achieved when summing just the absolute differences (L1-norm) instead of their squared differences, meaning the effect of outliers will affect the overall "distance" even less than in the Euclidean case. So that power of two you're using in your algorithm is itself a (hopefully) conscious choice about the expected properties of what you're trying to quantify.:think:

Edited by RTFC
typos
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...