Jump to content

Which is the fastest method to compare two array


rootx
 Share

Recommended Posts

19 hours ago, AspirinJunkie said:

I'm not sure if this was adressed to me

I was not aiming my comment at you or anyone in particular. This is the approach I would take in particular cases (as mentioned). I thought my comments might be potentially valuable to some readers. In fact a combination of both approaches also seems doable. It really does depend on the array discrepancies (which we all seem to agree about). Today I'm busy. I'll try it some time during the week.

Edited by czardas
Link to comment
Share on other sites

As an experiment, I have reinvented the binary search algorithm. I'm sure this method has been tried before and it might turn out to be a fruitless exercise. However - given the idea that many duplicate file names appear in some backup folder along with various differences, using the standard binary search algorithm might not necessarily be the most optimal approach. This code has not been thoroughly tested.

CODE DELETED
See here: https://www.autoitscript.com/forum/topic/190340-parallel-exponential-search-algorithm/

This is a proof of concept and will need further adaptation to work with all unicode strings. Incremental of powers of two are used for the binary search (not division by two). Numbers are used here for testing the algorithm. It's also a parallel search. I might have overlooked something.

 

Edited by czardas
Link to comment
Share on other sites

I have just run a couple of precursory tests so far, using the following similar arrays:

#include <Array.au3>

Local $aOne[50001]
For $i = 0 To 50000
    $aOne[$i] = Hex($i, 8)
Next
_ArrayShuffle($aOne)
Local $aTwo = $aOne
_ArrayReverse($aTwo)
ReDim $aOne[35000]
ReDim $aTwo[35000]

_ArraySort($aOne)
_ArraySort($aTwo)

_ArrayDisplay($aOne)
_ArrayDisplay($aTwo)

It appears that binary search is less efficient than a straight forward linear search on these arrays. A parallel linear search might be best when there are more matches. Current code differences produce unfair tests, so I haven't got a straight forward answer yet. I can still imagine how these methods will compare given a variety of starting conditions. Inverted/exponential binary search needs testing separately against the standard binary search (outside of this context). Again varied starting conditions will play an important role.

Test results were as follows:

Parallel Binary Search: 603.656865104227
Average Binary search:  2276.60675885885
Average Linear scan:    160.069939565546

 

 

 

Edited by czardas
Link to comment
Share on other sites

  • 3 weeks later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...