AutoIt Function For Cosine Similarity (Vector Embeddings)?

noellarkin · April 11, 2023

I can use the OpenAI API to get arrays containing vector embeddings for a word/phrase using this: https://platform.openai.com/docs/guides/embeddings

But what's the process of comparing the two vector arrays using something like this: https://en.wikipedia.org/wiki/Cosine_similarity

In python, there's a library for this: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html

Anything similar in AutoIt? Thanks!

noellarkin · April 11, 2023

Did I get this right? Just working off of the Wikipedia definition.

#include <Array.au3>
#include <Math.au3>


Local $embedding1[3] = [1.0, 2.0, 3.0]
Local $embedding2[3] = [4.0, 5.0, 6.0]


Local $dotProduct = 0.0
For $i = 0 To UBound($embedding1) - 1
    $dotProduct += $embedding1[$i] * $embedding2[$i]
Next


Local $magnitude1 = 0.0
For $i = 0 To UBound($embedding1) - 1
    $magnitude1 += $embedding1[$i] ^ 2
Next
$magnitude1 = Sqrt($magnitude1)

Local $magnitude2 = 0.0
For $i = 0 To UBound($embedding2) - 1
    $magnitude2 += $embedding2[$i] ^ 2
Next
$magnitude2 = Sqrt($magnitude2)

Local $cosineSimilarity = $dotProduct / ($magnitude1 * $magnitude2)

MsgBox(0, "", "Cosine similarity: " & $cosineSimilarity)

RTFC · April 11, 2023

looks okay, but you should really look into E4A's DotProduct (section: Multiplication) and GetNorm (section: Reduction) functions.

noellarkin · April 11, 2023

23 minutes ago, RTFC said:

looks okay, but you should really look into E4A's DotProduct (section: Multiplication) and GetNorm (section: Reduction) functions.

I remember you recommending this library some time back, and I downloaded it but it looked so daunting (I don't have a CS background) I backed off immediately :)
Okay I'll give it another go :)

RTFC · April 11, 2023

How is this daunting?

#include "C:\AutoIt\Eigen\Eigen4AutoIt.au3" ; NB adjust path to wherever you put it

Local $embedding1[3] = [1.0, 2.0, 3.0]
Local $embedding2[3] = [4.0, 5.0, 6.0]

_Eigen_StartUp()

$vec1=_Eigen_CreateMatrix_FromArray($embedding1)
$vec2=_Eigen_CreateMatrix_FromArray($embedding2)

MsgBox(0, "", "Cosine similarity: " & _
    _Eigen_DotProduct($vec1,$vec2) / (_Eigen_GetNorm($vec1) * _Eigen_GetNorm($vec2)))

_Eigen_CleanUp()

(I don't have a CS background either.)

Edited June 9, 2023 by RTFC
typo

noellarkin · April 11, 2023

2 hours ago, RTFC said:

How is this daunting?

Okay now I feel really stupid, haha :)
Thank you I'll give the library another go

RTFC · June 9, 2023

Update: as of version 5.4 (released: 29 May 2023), E4A supports direct retrieval of the angle between two vectors with function _Eigen_GetVectorAngle ( $vecA, $vecB, $returnRadians = False ). A zero-degree angle signifies parallel vectors (aligned and pointing in the exact same direction), a 90-degree angle perpendicular ones, and a 180-degree angle implies the vectors are anti-parallel (aligned, but pointing in opposite directions).

#include "C:\AutoIt\Eigen\Eigen4AutoIt.au3" ; NB adjust path to wherever you put it

Local $embedding1[3] = [1.0, 2.0, 3.0]
Local $embedding2[3] = [4.0, 5.0, 6.0]

_Eigen_StartUp()

$vec1=_Eigen_CreateMatrix_FromArray($embedding1)
$vec2=_Eigen_CreateMatrix_FromArray($embedding2)

MsgBox(0, "", "Cosine similarity: " & _Eigen_GetVectorAngle($vec1,$vec2))

_Eigen_CleanUp()

noellarkin · June 12, 2023

Sounds awesome :) I love that there are some alternatives to using Python for ML.

RTFC · June 12, 2023

Never jumped on the Python bandwagon myself either. From what I read at stackoverflow in various threads, you should be able to get significantly better performance when replacing numPy with raw Eigen/C++, even without GPU/CUDA/MPI refactoring.

If you're serious about setting up ML in this way, I can probably help you. Because many of Eigen's speed optimisations are obtained at compile-time (e.g. lazy evaluation, smart loop unrolling, and matrix operation-specific stuff), if you were to present a snippet of E4A code (say, a UDF that applies a number of E4A functions to some input matrices), I could duplicate/optimise/rewrite that and present you with single pre-compiled E4A dllcall. I first suggested this when I started the E4A thread many years ago, but so far nobody has taken me up on this. Up to you of course. If you're worried about your intellectual property, you can PM me instead. In any case, hope it helps.

noellarkin · June 12, 2023

1 hour ago, RTFC said:

so far nobody has taken me up on this

Would love to :) but nothing in my workflow (so far) has warranted anything extremely complex - - at most, I'm using SBERT embeddings + Milvus vector DB and doing some vector comparisons, indexing corpus, some n-gram extractions with Yake.

AutoIt Function For Cosine Similarity (Vector Embeddings)?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Recently Browsing 0 members

Similar Content