Jump to content

Directory Enquiries Challenge


Recommended Posts

20 hours ago, czardas said:

If you can ring the number and someone answers, you have a complete number - not a partial number. The numbers may have been gathered from various sources, manually written or otherwise. It is a real world scenario.

I'm not sure that's completely accurate :-) A number doesn't have to belong to someone to be a "real number". There is, however, a difference between an assigned number and an unassigned number. The challenge (at least from what I understood) wasn't specifically to find real/fake or assignable/unassignable numbers, it was to match numbers from the reference list to the best possible means.

Link to comment
Share on other sites

1 minute ago, stamandster said:

A number doesn't have to belong to someone to be a "real number". There is, however, a difference between an assigned number and an unassigned number.

You are absolutely right and I take back what I said. An unassigned number might be assigned at any point in the future. DUH! :idiot:

Link to comment
Share on other sites

9 hours ago, czardas said:

One thing is for certain: there are some talented individuals around here and so far the discussion has been of value in many ways.

hear, hear!

 

i am still in mind that the solution should be a general one, rather than needing to accommodate for each and every domestic and international country-per-country telephone numbers formats, past, present and future.

and if, per problem description, typos are not to be considered, then the "string difference" methods (suggested by jchd post #47 in SQLite, and presented in AutoIt by stamandster in post #62) is an overkill, and quite a heavy one. especially since - as i demonstrate hereunder - typos can be accommodated for (to some extent) by specifying a more lenient match score.

about the optional task presented in post #4:

+44208.....missing numbers [optional task]

i assume this was not thought thru, since if you search for that, you end up with a LOT of numbers... if this is indeed the intention, then i elaborate on my suggestion at post #36 to form this:

#include <Array.au3>

; tester input
Global $aPhone = [ _
        '+262 692 12 03 00', '1800 251 996', '+1 994 951 0197', _
        '091 535 98 91 61', '2397865', '08457 128276', _
        '348476300192', '05842 361774', '0-800-022-5649', _
        '15499514891', '0096 363 0949', '04813137349', _
        '06620 220168', '07766 554433', '047 845 44 22 94', _
        '0435 773 4859', '(01) 882 8565', '00441619346434', _
        '09314 367090', '0 164 268 0887', '0590995603', _
        '991', '0267 746 3393', '064157526153', _
        '0 719 829 7756', '+1-541-754-3012', '+441347543010', _
        '03890 978398', '(31) 10 7765420', '020 8568 6646', _
        '0161 934 6534', '0 637 915 1283', '+44 207 882 8565', _
        '0800 275002', '0750 646 9746', '982-714-3119', _
        '000 300 74 52 40', '023077529227', '1 758 441 0611', _
        '0183 233 0151', '02047092863', '+44 20 7946 0321', _
        '04935 410618', '048 257 67 60 79']
Global $aQuery = [ _
        '882 8565', _
        '123 8762', _
        '7543010', _
        '07843 543287', _
        '00441619346534', _
        '+44208', _
        '0015417543012']
Global $iScoreThreshold = 0.7
Global $bCheckBothSides = True

; declare a global var to temporarily stote the match score for any specific match
Global $iScore

; declare the match results array: rows = phone numbers, columns = queries
Global $aMatch[UBound($aPhone) + 1][UBound($aQuery) + 1]

; populate headers (rows and columns) and strip non-numeric characters
For $iPhone = 0 To UBound($aPhone) - 1
    $aMatch[$iPhone + 1][0] = _StringStripNonNumeric($aPhone[$iPhone])
Next
For $iQuery = 0 To UBound($aQuery) - 1
    $aMatch[0][$iQuery + 1] = _StringStripNonNumeric($aQuery[$iQuery])
Next

; match
For $iPhone = 1 To UBound($aMatch) - 1
    For $iQuery = 1 To UBound($aMatch, 2) - 1
        $iScore = _StringMatch($aMatch[$iPhone][0], $aMatch[0][$iQuery], $bCheckBothSides)
        If $iScore > $iScoreThreshold Then $aMatch[$iPhone][$iQuery] = String(Round($iScore * 100)) & '%'
    Next
Next

; re-populate headers with original values for display
For $iPhone = 0 To UBound($aPhone) - 1
    $aMatch[$iPhone + 1][0] = $aPhone[$iPhone]
Next
For $iQuery = 0 To UBound($aQuery) - 1
    $aMatch[0][$iQuery + 1] = $aQuery[$iQuery]
Next

; display match results
_ArrayDisplay($aMatch)

; functions
Func _StringStripNonNumeric($sString)
    Local $sResult = ''
    For $i = 1 To StringLen($sString)
        If StringIsDigit(StringMid($sString, $i, 1)) Then $sResult &= StringMid($sString, $i, 1)
    Next
    Return $sResult
EndFunc   ;==>_StringStripNonNumeric
Func _StringMatch($sString, $sSubstr, $bCheckBothSides)
    Local $iScoreMax = 0
    Local $iScoreNow = 0
    Local $iScorePerChar = 1 / StringLen($sSubstr)
    ; check end-first
    For $i = StringLen($sSubstr) To 1 Step -1
        $iScoreNow = $iScorePerChar * $i
        If StringInStr($sString, StringRight($sSubstr, $i)) And $iScoreNow > $iScoreMax Then $iScoreMax = $iScoreNow
    Next
    If $bCheckBothSides Then
        ; check start-first
        For $i = StringLen($sSubstr) To 1 Step -1
            $iScoreNow = $iScorePerChar * $i
            If StringInStr($sString, StringLeft($sSubstr, $i)) And $iScoreNow > $iScoreMax Then $iScoreMax = $iScoreNow
        Next
    EndIf
    ; done
    Return $iScoreMax
EndFunc   ;==>_StringMatch

note that checking the start of the string can be disabled by setting $bCheckBothSides = False, and the score threshold can also be set. i've set it to 0.7, which i find adequate to the given data sets - other data sets will probably benefit from different threshold values, although i believe not too different.

note: this version shows the match score (in percentage) instead of just the "MATCH!" notice.

run it, let me know what you think.

EDIT: i delegate you to run it, since the _ArrayDisplay GUI is too big to fit in a screenshot...:huh2:

 

 

Edited by orbs

Signature - my forum contributions:

Spoiler

UDF:

LFN - support for long file names (over 260 characters)

InputImpose - impose valid characters in an input control

TimeConvert - convert UTC to/from local time and/or reformat the string representation

AMF - accept multiple files from Windows Explorer context menu

DateDuration -  literal description of the difference between given dates

Apps:

Touch - set the "modified" timestamp of a file to current time

Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes

SPDiff - Single-Pane Text Diff

 

Link to comment
Share on other sites

3 hours ago, orbs said:

i assume this was not thought thru,

You are right and that's why I said not to bother with it unless you want to. Any number starting with +44208 is in London. I would only expect numbers that match 0208 xxx xxxx or +44208... in the results. You should probably also allow 0011 44 208 xxx xxxx and similar. 

Edited by czardas
Link to comment
Share on other sites

6 hours ago, orbs said:

hear, hear!

...

+44208.....missing numbers [optional task]

i assume this was not thought thru, since if you search for that, you end up with a LOT of numbers... if this is indeed the intention, then i elaborate on my suggestion at post #36 to form this:

; functions
Func _StringStripNonNumeric($sString)
    Local $sResult = ''
    For $i = 1 To StringLen($sString)
        If StringIsDigit(StringMid($sString, $i, 1)) Then $sResult &= StringMid($sString, $i, 1)
    Next
    Return $sResult
EndFunc   ;==>_StringStripNonNumeric

 

You can use StringRegExpReplace in place of your _StringStripNonNumeric :-)

Also, while I think what your doing with the script is awesome :-) Can we really assume that because someone has typed 882 8565 that they also mean to add (01) prefix? The match shows 100% of 882 8565 for both (01) 882 8565 and +44 207 882 8565, which is not completely accurate. What does the percentage indicate? That 100% of the challenge phone number is within the database number?

Let's look at 00441619346534, show's 79% similar to 00441619346434. But, I would say that it's 92.86% similar because only one number is different. The math is 100-($nDiff*(100/stringlen($cNum))). $nDiff are the amount of different numbers between the two, $cNum is the challenge number. Though, it get's a little more complicated if the compared numbers lengths vary at all.

My own function of checking for accuracy is based on a couple different elements, not the above example. Which is why against the same number I show 99% instead of 92%. +44208 shows a match to +442078828565 @ 80%, which in my mind would be a 66.67% match.

The more I play with the information the more I realize that one solution for finding a number does not work for all iterations of numbers and what data you expect to return. Latest code below... factors in partial matches for left to right and right to left of challenge number if under 7 digits long.

#include <Array.au3>
#include "typos.au3"

#cs looking for
882 8565
123 8762
7543010
07843 543287
00441619346534
+44208.....missing numbers [optional task]
44208
   0800275002 ; too short, japan local?
   08000225649 ; 11 chars
   08457128276 ; 11 chars
0015417543012
#ce

GLOBAL $refNumT
Local $aArray = _
    ['+262 692 12 03 00', '1800 251 996',    '+1 994 951 0197', _
    '091 535 98 91 61',   '2397865',         '08457 128276', _
    '348476300192',       '05842 361774',    '0-800-022-5649', _
    '15499514891',        '0096 363 0949',   '04813137349', _
    '06620 220168',       '07766 554433',    '047 845 44 22 94', _
    '0435 773 4859',      '(01) 882 8565',   '00441619346434', _
    '09314 367090',       '0 164 268 0887',  '0590995603', _
    '991',                '0267 746 3393',   '064157526153', _
    '0 719 829 7756',     '+1-541-754-3012', '+441347543010', _
    '03890 978398',       '(31) 10 7765420', '020 8568 6646', _
    '0161 934 6534',      '0 637 915 1283',  '+44 207 882 8565', _
    '0800 275002',        '0750 646 9746',   '982-714-3119', _
    '000 300 74 52 40',   '023077529227',    '1 758 441 0611', _
    '0183 233 0151',      '02047092863',     '+44 20 7946 0321', _
    '04935 410618',       '048 257 67 60 79']


Local $findnumb = _
    ['882 8565','123 8762','7543010','07843 543287','00441619346534','+44208','0015417543012']

Consolewrite('---------------------------------------------------------------------------------------------------------------------'& @CRLF _
           & '---------------------------------------------------------------------------------------------------------------------'& @CRLF)

For $i = 0 to Ubound($findnumb)-1 ; find these numbers!
    $reference = StringRegExpReplace($findnumb[$i],"[^0-9]","") ; Santize Numbers

    For $a = 0 to ubound($aArray)-1
        GLOBAL $m = 0
        $dbnumbers = StringRegExpReplace($aArray[$a],"[^0-9]","") ; Sanitize Numbers
        $refNumT1 = $reference
        $refNumT2 = $reference
        if $reference = $dbnumbers Then
            Consolewrite('> Phone Number --] '& $findnumb[$i] & ' [--'& @CRLF)
            Consolewrite('+> Exact Match to --] '& $aArray[$a] & ' [-- row '& $a & @CRLF)
        EndIf

        IF StringLen($reference) < 7 then ; Find Partial Match at beginning of databasen number, through cyclical deleting of one character at a time from left or right of challenge number
            Do ; matches left to right of challenge number against db number
                IF StringLeft($dbnumbers,StringLen($refNumT1)) = $refNumT1 then
                    Consolewrite('> Phone Number --] '& $findnumb[$i] & ' [-- using LAST '& StringLen($refNumT1) & ' digits'& @CRLF)
                    consolewrite('+> Partial Match, matching FIRST '& StringLen($refNumT1)&' numbers ('& $refNumT2 &') of --] ' & $aArray[$a] & ' [-- row '& $a & @CRLF)
                EndIf
                $refNumT1 = StringTrimLeft($refNumT1,1)
            Until StringLen($refNumT1) = 1 OR StringLen($dbnumbers) = 10

            Do ; matches right to left of challenge number against db number
                IF StringLeft($dbnumbers,StringLen($refNumT2)) = $refNumT2 then
                    Consolewrite('> Phone Number --] '& $findnumb[$i] & ' [-- using FIRST '& StringLen($refNumT2) & ' digits'& @CRLF)
                    consolewrite('+> Partial Match, matching FIRST '& StringLen($refNumT2)&' numbers ('& $refNumT2 &') of --] ' & $aArray[$a] & ' [-- row '& $a & @CRLF)
                    exitloop
                EndIf
                $refNumT2 = StringTrimRight($refNumT2,1)
            Until StringLen($refNumT2) = 1 OR StringLen($dbnumbers) = 10
        endif

        if StringInStr($dbnumbers,$reference) then ; Find Partial Match within the numbers database
            Consolewrite('> Phone Number --] '& $findnumb[$i] & ' [--'& @CRLF)
            Consolewrite('+> Partial Match, within larger number --] '& $aArray[$a] & ' [-- row '& $a & @CRLF)
            ;ContinueLoop
        EndIf
        $typos = _Typos($dbnumbers, $reference) ; Find Similar numbers based on limits
        $stringlen = Stringlen($dbnumbers) / StringLen($reference)
        $similarity = Stringleft(100-($stringlen*$typos),6)
        IF $similarity > 97.5 then
            Consolewrite('> Phone Number --] '& $findnumb[$i] & ' [--'& @CRLF)
            consolewrite('+> Similarity Match, '& $similarity &'% similar to number --] '& $aArray[$a] &' [-- row '& $a & @CRLF)
        EndIf
    Next
Next

Consolewrite('---------------------------------------------------------------------------------------------------------------------'& @CRLF _
           & '---------------------------------------------------------------------------------------------------------------------'& @CRLF)

Output

---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------
> Phone Number --] 882 8565 [--
+> Partial Match, within larger number --] (01) 882 8565 [-- row 16
> Phone Number --] 882 8565 [--
+> Partial Match, within larger number --] +44 207 882 8565 [-- row 32
> Phone Number --] 7543010 [--
+> Partial Match, within larger number --] +441347543010 [-- row 26
> Phone Number --] 00441619346534 [--
+> Similarity Match, 99% similar to number --] 00441619346434 [-- row 17
> Phone Number --] 00441619346534 [--
+> Similarity Match, 97.642% similar to number --] 0161 934 6534 [-- row 30
> Phone Number --] +44208 [-- using LAST 2 digits
+> Partial Match, matching FIRST 2 numbers (44208) of --] 08457 128276 [-- row 5
> Phone Number --] +44208 [-- using LAST 2 digits
+> Partial Match, matching FIRST 2 numbers (44208) of --] 0-800-022-5649 [-- row 8
> Phone Number --] +44208 [-- using FIRST 2 digits
+> Partial Match, matching FIRST 2 numbers (44) of --] +441347543010 [-- row 26
> Phone Number --] +44208 [-- using FIRST 4 digits
+> Partial Match, matching FIRST 4 numbers (4420) of --] +44 207 882 8565 [-- row 32
> Phone Number --] +44208 [-- using FIRST 4 digits
+> Partial Match, matching FIRST 4 numbers (4420) of --] +44 20 7946 0321 [-- row 41
> Phone Number --] 0015417543012 [--
+> Similarity Match, 98.307% similar to number --] +1-541-754-3012 [-- row 25
---------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------

 

Edited by stamandster
Link to comment
Share on other sites

1 hour ago, stamandster said:

You can use StringRegExpReplace in place of your _StringStripNonNumeric :-)

i never got the hang of RegExp... :unsure:

1 hour ago, stamandster said:

Let's look at 00441619346534, show's 79% similar to 00441619346434

the score is based on the longest successive substring, which in this case:

phone: 00441619346534

query: 00441619346434

(don't you just love monospace fonts? :P)

now, 11 characters out of 14 is (rounded to) 79%.

assuming what we're assuming about typos (i.e. they should be ignored), i feel that a match score of 79% is more sound them 93%. @czardas?

also what i failed to highlight, is that once the array contains match scores, it can be sorted - which makes it quite easy for the end user to distinguish between the results. one can easily overlook the dissimilarity demonstrated above, but when they see it's only 79%, they will (hopefully) double-check. especially when - which is more troubling - the match score for 0161 934 6534 is only 71%.

the 79% match score takes into account both sides of the query string, which as i mentioned, can (and should) be disabled, so the 71% match score becomes the only result, which happens to be the required result. it scores low because the query is so long. perhaps i should cross-check the shorter string against the longer one, whichever they may be.

...

ok, i introduced the cross-check. now the 71% match score reevaluates to 91%. yey!

also, another match score that was 85% now reevaluates to 100%, as it should - it matches the query 0015417543012 to the phone +1-541-754-3012. double yey!

this is the updated code:

#include <Array.au3>
#include <Math.au3>

; tester input
Global $aPhone = [ _
        '+262 692 12 03 00', '1800 251 996', '+1 994 951 0197', _
        '091 535 98 91 61', '2397865', '08457 128276', _
        '348476300192', '05842 361774', '0-800-022-5649', _
        '15499514891', '0096 363 0949', '04813137349', _
        '06620 220168', '07766 554433', '047 845 44 22 94', _
        '0435 773 4859', '(01) 882 8565', '00441619346434', _
        '09314 367090', '0 164 268 0887', '0590995603', _
        '991', '0267 746 3393', '064157526153', _
        '0 719 829 7756', '+1-541-754-3012', '+441347543010', _
        '03890 978398', '(31) 10 7765420', '020 8568 6646', _
        '0161 934 6534', '0 637 915 1283', '+44 207 882 8565', _
        '0800 275002', '0750 646 9746', '982-714-3119', _
        '000 300 74 52 40', '023077529227', '1 758 441 0611', _
        '0183 233 0151', '02047092863', '+44 20 7946 0321', _
        '04935 410618', '048 257 67 60 79']
Global $aQuery = [ _
        '882 8565', _
        '123 8762', _
        '7543010', _
        '07843 543287', _
        '00441619346534', _
        '+44208', _
        '0015417543012']
Global $iScoreThreshold = 0.7
Global $bCheckBothSides = True

; declare a global var to temporarily stote the match score for any specific match
Global $iScore

; declare the match results array: rows = phone numbers, columns = queries
Global $aMatch[UBound($aPhone) + 1][UBound($aQuery) + 1]

; populate headers (rows and columns) and strip non-numeric characters
For $iPhone = 0 To UBound($aPhone) - 1
    $aMatch[$iPhone + 1][0] = _StringStripNonNumeric($aPhone[$iPhone])
Next
For $iQuery = 0 To UBound($aQuery) - 1
    $aMatch[0][$iQuery + 1] = _StringStripNonNumeric($aQuery[$iQuery])
Next

; match
For $iPhone = 1 To UBound($aMatch) - 1
    For $iQuery = 1 To UBound($aMatch, 2) - 1
        $iScore = _Max(_StringMatch($aMatch[$iPhone][0], $aMatch[0][$iQuery], $bCheckBothSides), _StringMatch($aMatch[0][$iQuery], $aMatch[$iPhone][0], $bCheckBothSides))
        If $iScore > $iScoreThreshold Then $aMatch[$iPhone][$iQuery] = String(Round($iScore * 100)) & '%'
    Next
Next

; re-populate headers with original values for display
For $iPhone = 0 To UBound($aPhone) - 1
    $aMatch[$iPhone + 1][0] = $aPhone[$iPhone]
Next
For $iQuery = 0 To UBound($aQuery) - 1
    $aMatch[0][$iQuery + 1] = $aQuery[$iQuery]
Next

; display match results
_ArrayDisplay($aMatch)

; functions
Func _StringStripNonNumeric($sString)
    Local $sResult = ''
    For $i = 1 To StringLen($sString)
        If StringIsDigit(StringMid($sString, $i, 1)) Then $sResult &= StringMid($sString, $i, 1)
    Next
    Return $sResult
EndFunc   ;==>_StringStripNonNumeric
Func _StringMatch($sString, $sSubstr, $bCheckBothSides)
    Local $iScoreMax = 0
    Local $iScoreNow = 0
    Local $iScorePerChar = 1 / StringLen($sSubstr)
    ; check end-first
    For $i = StringLen($sSubstr) To 1 Step -1
        $iScoreNow = $iScorePerChar * $i
        If StringInStr($sString, StringRight($sSubstr, $i)) And $iScoreNow > $iScoreMax Then $iScoreMax = $iScoreNow
    Next
    If $bCheckBothSides Then
        ; check start-first
        For $i = StringLen($sSubstr) To 1 Step -1
            $iScoreNow = $iScorePerChar * $i
            If StringInStr($sString, StringLeft($sSubstr, $i)) And $iScoreNow > $iScoreMax Then $iScoreMax = $iScoreNow
        Next
    EndIf
    ; done
    Return $iScoreMax
EndFunc   ;==>_StringMatch

 

oh, and let's round the percentage. no-one cares for the 0.86% in the 92.86%. just put 93%, ok?

 

1 hour ago, stamandster said:

The match shows 100% of 882 8565 for both (01) 882 8565 and +44 207 882 8565, which is not completely accurate.

why not? the query number appears in both phones in exact.

 

Edited by orbs

Signature - my forum contributions:

Spoiler

UDF:

LFN - support for long file names (over 260 characters)

InputImpose - impose valid characters in an input control

TimeConvert - convert UTC to/from local time and/or reformat the string representation

AMF - accept multiple files from Windows Explorer context menu

DateDuration -  literal description of the difference between given dates

Apps:

Touch - set the "modified" timestamp of a file to current time

Show For Files - tray menu to show/hide files extensions, hidden & system files, and selection checkboxes

SPDiff - Single-Pane Text Diff

 

Link to comment
Share on other sites

@orbs I like your improvements to the script! ... in regards to the 100% match, the number as a whole doesn't match, but I understand what you mean. 12345 is not the 100% the same as 345 even if 345 exists within 12345. See what I mean?

Below is the sample that will sanitize input var to only have numbers

$new = StringRegExpReplace($value,"[^0-9]","")

 

Edited by stamandster
Link to comment
Share on other sites

5 hours ago, orbs said:

assuming what we're assuming about typos (i.e. they should be ignored), i feel that a match score of 79% is more sound them 93%. @czardas?

I very much like the way you are going with this. Finding typos was not the original intention, however If they get caught in the net as a side effect of the method, I still consider it a valid approach. False positives are to be expected anyway.

The (01) prefix was made up, but could easily occur as an outward dialing code from within an internal company network. Perhaps I ought to have verified every number, but unassigned numbers should actually be included.

I hope that clears up any current doubts.

Link to comment
Share on other sites

Ok, created a different approach to what I've posted earlier... does away with typos.au3 ;-)

#include <Array.au3>

GLOBAL $MatchPerc = 30 ; no less than % to match on
GLOBAL $MatchLen = 3 ; no smaller than X digits to match on

Local $adbPNum = _
    ['+262 692 12 03 00', '1800 251 996',    '+1 994 951 0197', _
    '091 535 98 91 61',   '2397865',         '08457 128276', _
    '348476300192',       '05842 361774',    '0-800-022-5649', _
    '15499514891',        '0096 363 0949',   '04813137349', _
    '06620 220168',       '07766 554433',    '047 845 44 22 94', _
    '0435 773 4859',      '(01) 882 8565',   '00441619346434', _
    '09314 367090',       '0 164 268 0887',  '0590995603', _
    '991',                '0267 746 3393',   '064157526153', _
    '0 719 829 7756',     '+1-541-754-3012', '+441347543010', _
    '03890 978398',       '(31) 10 7765420', '020 8568 6646', _
    '0161 934 6534',      '0 637 915 1283',  '+44 207 882 8565', _
    '0800 275002',        '0750 646 9746',   '982-714-3119', _
    '000 300 74 52 40',   '023077529227',    '1 758 441 0611', _
    '0183 233 0151',      '02047092863',     '+44 20 7946 0321', _
    '04935 410618',       '048 257 67 60 79']


Local $arPNum = _
    ['882 8565', _
    '123 8762', _
    '7543010', _
    '07843 543287', _
    '00441619346534', _
    '+44208', _
    '0015417543012']

Consolewrite('--> Matching Threshold '& $MatchPerc &'% of Reference Number | No less than '& $MatchLen & ' digits' & @CRLF)

For $i = 0 to Ubound($arPNum)-1 ; find these numbers!
    $rPNum = StringRegExpReplace($arPNum[$i],"[^0-9]","") ; Santize Numbers

    For $a = 0 to ubound($adbPNum)-1
        $dbPNum = StringRegExpReplace($adbPNum[$a],"[^0-9]","") ; Sanitize Numbers

        $pM = _PhoneMatch($rPNum,$dbPNum,$MatchPerc,$MatchLen)
        IF $pM <> 0 then
            $sPM = StringSplit($pM,'|')
            Consolewrite($sPM[1] &'% of Reference Number '& $arPNum[$i] &' matches '& $sPM[2] & '% of DB Phone Number '& $adbPNum[$a] & ' -- Accuracy of ' & $sPM[3] &'%'& @CRLF)
        EndIf
    Next
Next


FUNC _PhoneMatch($_refNum,$_dbNum, $_pMatch = 50, $_refNumLen = 3)

    LOCAL $_refNumC = $_refNum
    LOCAL $_dbNumC = $_dbNum
    LOCAL $C = 0
    LOCAL $swap = 0

    IF Stringlen($_refNum) > Stringlen($_dbNum) Then
        $swap = 1
        $_refNumC = $_dbNum
        $_dbNumC = $_refNum
    EndIf

    $_refNumR = $_refNumC ; cached Right
    $_refNumL = $_refNumC ; cached Left

    Do
        IF $c <> 0 then
            $_refNumR = StringTrimRight($_refNumR,1)
            $_refNumL = StringTrimLeft($_refNumL,1)
        endif

        $percDbNum = (StringLen($_refNumR)*(100/StringLen($_dbNumC)))
        $percRefNum = (StringLen($_refNumR)*(100/StringLen($_refNumC)))

        Select
            Case (StringInStr($_dbNumC,$_refNumL) OR StringInStr($_dbNumC,$_refNumR)) and $percDbNum >= $_pMatch
                if $swap = 1 then
                    $_PMAcc = (StringLeft($percDbNum,5) + StringLeft($percRefNum,5))/2
                    Return StringLeft($percDbNum,5) &'|'& StringLeft($percRefNum,5) &'|'& StringLeft($_PMAcc,5)
                else
                    $_PMAcc = (StringLeft($percDbNum,5) + StringLeft($percRefNum,5))/2
                    Return StringLeft($percRefNum,5) &'|'& StringLeft($percDbNum,5) &'|'& StringLeft($_PMAcc,5)
                endif
        EndSelect
        $c = $c + 1
    until StringLen($_refNumL) = $_refNumLen OR StringLen($_refNumR) = $_refNumLen

EndFunc

Output

--> Matching Threshold 30% of Reference Number | No less than 3 digits
100% of Reference Number 882 8565 matches 77.77% of DB Phone Number (01) 882 8565 -- Accuracy of 88.88%
100% of Reference Number 882 8565 matches 58.33% of DB Phone Number +44 207 882 8565 -- Accuracy of 79.16%
85.71% of Reference Number 7543010 matches 54.54% of DB Phone Number +1-541-754-3012 -- Accuracy of 70.12%
100% of Reference Number 7543010 matches 58.33% of DB Phone Number +441347543010 -- Accuracy of 79.16%
78.57% of Reference Number 00441619346534 matches 78.57% of DB Phone Number 00441619346434 -- Accuracy of 78.57%
71.42% of Reference Number 00441619346534 matches 90.90% of DB Phone Number 0161 934 6534 -- Accuracy of 81.16%
80% of Reference Number +44208 matches 33.33% of DB Phone Number +44 207 882 8565 -- Accuracy of 56.66%
80% of Reference Number +44208 matches 33.33% of DB Phone Number +44 20 7946 0321 -- Accuracy of 56.66%
84.61% of Reference Number 0015417543012 matches 100% of DB Phone Number +1-541-754-3012 -- Accuracy of 92.30%

 

Edited by stamandster
fixed precentages of referenced number when swapped, tried to add accuracy percentage
Link to comment
Share on other sites

i been working on this for days..... its still not quite right but i just FINALLY got it to do remotely what i wanted it to.  Its a mess and damn near impossible to decipher.  But im proud of it.  It finds all the matches i believe.  I didn't get into partial matches.  I just matched the search criteria to the phone number that all the numbers were there and in the correct order?  I'm sure i could clean it up and refine it even further but i've already spent too much time on this.  I was not able to do this without using 2d arrays.  I will say i'm much better at using and understanding arrays.  I tried to use ALOT of notes so that anyone that looks at it has some kinda guide. 

#include <Array.au3>

Local $aArray = _
    ['+262 692 12 03 00', '1800 251 996',    '+1 994 951 0197', _
    '091 535 98 91 61',   '2397865',         '08457 128276', _
    '348476300192',       '05842 361774',    '0-800-022-5649', _
    '15499514891',        '0096 363 0949',   '04813137349', _
    '06620 220168',       '07766 554433',    '047 845 44 22 94', _
    '0435 773 4859',      '(01) 882 8565',   '00441619346434', _
    '09314 367090',       '0 164 268 0887',  '0590995603', _
    '991',                '0267 746 3393',   '064157526153', _
    '0 719 829 7756',     '+1-541-754-3012', '+441347543010', _
    '03890 978398',       '(31) 10 7765420', '020 8568 6646', _
    '0161 934 6534',      '0 637 915 1283',  '+44 207 882 8565', _
    '0800 275002',        '0750 646 9746',   '982-714-3119', _
    '000 300 74 52 40',   '023077529227',    '1 758 441 0611', _
    '0183 233 0151',      '02047092863',     '+44 20 7946 0321', _
    '04935 410618',       '048 257 67 60 79']
Global $data
   local $search=['882 8565', '123 8762', '7543010', '07843 543287', '00441619346534', '+44208', '0015417543012']

Local $compare='0123456789',$bool,$temp,$newtemp='',$bs,$result


for $x=0 to UBound($aArray) -1 ;shuffle through array
    $aArray[$x]=_Sort($aArray[$x])

    Next
for $x=0 to UBound($search) -1 ;shuffle through array
    $search[$x]=_Sort($search[$x])

    Next
;~ _ArrayDisplay($search)
;so so now everything is broken down lets compare

for $x=0 to UBound($aArray)-1
    for $y=0 to UBound($search)-1

        $stringinstr=StringInStr($aArray[$x],$search[$y])

        if $stringinstr<>0 Then   ;success
        MsgBox('','success','search criteria=' & $search[$y] & 'phone number result=' & $aArray[$x])
        EndIf
    Next
    Next


;search for missing numbers.....



    _RefinedSort(1)



Func _RefinedSort($data)
Local $newsearcharray[ubound($search)][1]                            ;$newsearcharray[$x][0]=number of digits in the search criteria
;~ _ArrayDisplay($search)

for $x=0 to UBound($search)-1
    for $y=0 to StringLen($search[$x])
        if stringlen($search[$x]) > UBound($newsearcharray, 2) Then
            ReDim $newsearcharray[UBound($search)][StringLen($search[$x])+1]         ;this section breaks down the search criteria into individual digits and also saves the length
        EndIf
;~      MsgBox('','',stringlen($search[$x]))
;~      msgbox('','',UBound($newsearcharray,2))
                    if $y=0 Then
                        $newsearcharray[$x][$y]=StringLen($search[$x])
                    EndIf

                    if $y>=1 Then
                        $newsearcharray[$x][$y]=StringMid($search[$x],$y,1)
                    EndIf
    Next
Next
;~ _ArrayDisplay($newsearcharray)           ;working good to here...

;~ _ArrayDisplay($aArray)
Local $temparray[UBound($aArray)]     ;this just saves the original array to a temp that can be manipulated
for $x=0 to UBound($aArray)-1

    $temparray[$x]=$aArray[$x]

    Next

;lets search the temparray to see if the search numbers actually exist in the aArray


for $w=0 to UBound($temparray)-1
    Local $count=0
for $x=0 to UBound($newsearcharray,1)
    for $y=1 to UBound($newsearcharray,2)

        $stringinstr=StringInStr($temparray[$w],$newsearcharray[$x][$y])
        if $stringinstr<>0 Then
            $count+=1
        Else
            $temparray[$w]=''
            ExitLoop 2
        EndIf

    if $count=$newsearcharray[$x][0] Then
;~      msgbox('','','partial match found.....')
        ExitLoop 2
        EndIf
Next
Next
Next

;the temp array should be at least trimmed down at this point.....
;~ _ArrayDisplay($temparray)   ;good good not sure how accurate the result is to this point but progress nonetheless
;***************************************************************************************************************************************



for $w=0 to UBound($temparray)-1          ;starts the loop to shuffle through $temparray(trimmed)

    for $x=0 to UBound($newsearcharray,1)-1                             ;starts the loop to shuffle through $newsearcharray[search criterial][digit],  $newsearcharray[$x][0]=number of digits in the search

;~  MsgBox('','',$newsearcharray[$x][0])          ;testing area
;~ _ArrayDisplay($newsearcharray)
;~  _ArrayDisplay($charpos)


Local  $occurrence=0, $count=0, $tempoccurrence=1, $temp=""
Local   $charpos[2][$newsearcharray[$x][0]]
$occurrence=0
    if $newsearcharray[$x][0]>UBound($charpos) Then
        ReDim $charpos[2][$newsearcharray[$x][0]]       ;the row saves how many times the digit occurs in the possible match
    EndIf
    $occurrence=0
        for $y=1 to $newsearcharray[$x][0]      ; shuffles through the digits in the search criteria   $newsearcharray[$x][0]=number of digits in the search

            if $temparray[$w]='' then   ;skips any entry in temparray that is blank
                ExitLoop 2
            EndIf
;~ _ArrayDisplay($newsearcharray)
            for $k=1 to $newsearcharray[$x][0]   ;can probably make it so this is skipped if its already verified that the digits exist
            $stringinstr=StringInStr($temparray[$w],$newsearcharray[$x][$y],0,$k)   ;this should figure out if a particular digit occurs more than once
;~              MsgBox('','stringinstr, temparray,newsearcharray', $stringinstr & '  ' & $temparray[$w] & '  ' & $newsearcharray[$x][$y])
            if $stringinstr=0 Then
                ExitLoop
            EndIf

            if $stringinstr<>0 then
                $occurrence+=1
;~              MsgBox('','','occurrence +1')
            EndIf
        next
;~          _ArrayDisplay($charpos)
            if $occurrence=0 Then  ;this should make it so if the digit doesn't exist at this point then exit loop and move on to the next search term
            ExitLoop
            EndIf

        ;so at this point we should be sure that all digits in the search term exist
            $charpos[0][$y-1]=StringInStr($temparray[$w],$newsearcharray[$x][$y],0,$tempoccurrence) ;the digit exists and saves its position
;~          _ArrayDisplay($charpos)
            $charpos[1][$y-1]=$occurrence ;saves how many times the digit occurs
;~ _ArrayDisplay($charpos)
$occurrence=0 ; fucking variable
;*****************************************looks good to this point&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&
;if a position is listed More than once but only has 1 occurrence then that means its in the search criteria more than once so it is not possible to be a match
if $y=$newsearcharray[$x][0] Then
$jesuschrist=1
        for $a=0 to UBound($charpos,2)-1 ;first position............
        $jesuschrist=1
        for $b=$a+1 to UBound($charpos,2)-1
            if $charpos[0][$a]=$charpos[0][$b] then
            $jesuschrist+=1
;~          _ArrayDisplay($charpos)
            EndIf

            if $charpos[0][$a]=$charpos[0][$b] and $jesuschrist>$charpos[1][$a] then
;~              MsgBox('','charA, charB, char[1][a], Jesuschrist', $charpos[0][$a] & '  ' & $charpos[0][$b] & '  ' & $charpos[1][$a] & '  ' & $jesuschrist)
;~              MsgBox('','','exiting loop')
                ExitLoop 3
            EndIf
        Next
        $jesuschrist=1
        Next
EndIf
;**********************************************looks good to here...............jezzzzz=====================================================================

if $y=$newsearcharray[$x][0] Then
    $pp=2
        for $a=0 to UBound($charpos,2)-1            ;this basicly sorts through the positions and makes sure that each occurrence of a number has a unique position...lol wtf
        for $b=$a+1 to UBound($charpos,2)-1
            if $charpos[1][$a]=1 Then     ;exits the loop if the occurrence is one
                ExitLoop
            endif
;~ _ArrayDisplay($charpos)
            if $charpos[0][$a]=$charpos[0][$b] Then
;~              MsgBox('','charpos[0][$b] newsearcharray temparray, pp', $charpos[0][$b] & "  " & $newsearcharray[$x][$a+1] & "  "  & $temparray[$w] & '  ' & $pp)
            $charpos[0][$b]=StringInStr($temparray[$w],$newsearcharray[$x][$a+1],0,$pp)          ;this is so that if a digit occurs more than once
;~          MsgBox('','charpos[0][$b] newsearcharray temparray, pp', $charpos[0][$b] & "  " & $newsearcharray[$x][$a+1] & "  "  & $temparray[$w] & '  ' & $pp)

            EndIf
            if $charpos[0][$b]=0 then
;~              MsgBox('','','exiting loop')
                ExitLoop 3
                EndIf
;~ _ArrayDisplay($charpos)

            if $charpos[1][$a]>$pp Then
                $pp+=1
                EndIf


        Next
        $pp=2
        Next
;~      MsgBox('','','made it')
EndIf
                                               ; so now that every digit is accounted for and has a unique position... where getting really close.....
;~ MsgBox('','',$temparray[$w])

if $y=$newsearcharray[$x][0] Then

for $a=UBound($charpos,2)-1 to 0 step -1
        for $b=$a-1 to 0 step -1                ;charpos cooresponds with the digits in search in order charpos[0][0] is the first digit in newsearcharray
;~ MsgBox('','charposA, charposB',$charpos[0][$a] & '  ' & $charpos[0][$b])
        if $charpos[0][$a]<$charpos[0][$b] Then                 ;looks at the last digit first so if any of the digits behind it are positioned ahead of then theyre out of order
;~          MsgBox('','','exiting loop')
            ExitLoop 3
        EndIf
;~      MsgBox("","",$temparray[$w])
if $a=1 Then   ;if it makes it this far then we should have a result...........
    for $g=1 to UBound($newsearcharray,2)-1    ;this puts the matching search criteria back together
        $temp&=$newsearcharray[$x][$g]
    Next

;~  MsgBox('','',$w)
;~  _ArrayDisplay($temparray)




MsgBox('','Match','search criteria=' & $temp & 'match=' & $temparray[$w])
EndIf

;~ MsgBox('','a',$a)
    Next
    Next



EndIf

        Next      ;end of the digit shuffle loop ******************************************************************








    Next
Next

















EndFunc






;********************************************************************************************


    Func _Sort($data)

local $length=StringLen($data)    ;finds the length of the string
Local $temp[$length+1]

    for $y=1 to $length
    $temp[$y]=stringmid($data,$y,1)     ;breaks the string down to individual chars
    Next
;_ArrayDisplay($temp)      ;seems good to here
    for $z=1 to $length
        $stringinstr=StringInStr('0123456789',$temp[$z])     ;working to here......

        if $stringinstr <> 0 Then      ;saves the numbers to a new string with no spaces or random characters
            $newtemp &= $temp[$z]
        EndIf
        $data=$newtemp
    Next
$newtemp=""
    Return $data

EndFunc

 

Link to comment
Share on other sites

Damn! I've been working all week on a solution as per a request from Somerset, but now he won't pay me, because I missed the deadline. :'(

Wait until I work out which of his phone numbers is the correct one, then I will give him a piece of my mind. :angry:

Spoiler

Just Kidding .... September Fool's Day! :P

 

Make sure brain is in gear before opening mouth!
Remember, what is not said, can be just as important as what is said.

Spoiler

What is the Secret Key? Life is like a Donut

If I put effort into communication, I expect you to read properly & fully, or just not comment.
Ignoring those who try to divert conversation with irrelevancies.
If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it.
I'm only big and bad, to those who have an over-active imagination.

I may have the Artistic Liesense ;) to disagree with you. TheSaint's Toolbox (be advised many downloads are not working due to ISP screwup with my storage)

userbar.png

Link to comment
Share on other sites

2 hours ago, TheSaint said:

but now he won't pay me

y'all know I've won this due to my superior coding skills and unchallenged intelligence, but relax, I'll share the bounty ;) 
PS: I won't be able to do this when the price is a coffee cup. :drinks: 

Follow the link to my code contribution ( and other things too ).
FAQ - Please Read Before Posting.
autoit_scripter_blue_userbar.png

Link to comment
Share on other sites

"Doesnt matter who wins cause they're all losers" - trolololol :)

 

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Link to comment
Share on other sites

After some deliberation, I have come to a decision to call this a draw between orbs and stamandster. Both your examples are better than my attempt, although tweaking some regular expressions would improve mine to a degree. This was a deceptively difficult challenge and I think your examples are as good as anything some MVPs could have created. Not to be put off by their lack of enthusiasm for this: the only person who volunteered to look at your examples was Jos. To be fair, some people I asked are not able to do so for one reason or another.

Despite some confusion over the first post, you have demonstrated, to me at least, that my description was clear enough. Perhaps I could have elaborated more. I wasn't quite sure how to approach this problem myself, and I am most impressed by the winners. I think orbs quickly got on the right track, and stamandster put in great effort to refine his code. Both examples passed further tests with flying colours, although I had to lower orbs' score threshold to get through to Argentina. :)

I declare you both champions of the unofficial August 2016 AutoIt code challenge. Many thanks to all who participated here.

@Somerset Better luck next time. :D

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...