Jump to content
czardas

Directory Enquiries Challenge

Recommended Posts

im just saying for how generic the criteria has to be to capture all nationalities flavor of phone number it will be difficult to create many different criteria of failure, so build those first and work that way.


,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites

Well, not to bash the challenger or anything, but if a customer would come to me or my team with requirements this vague and not be willing to discuss better specifications, I wouldn't even accept the job :) Too much risk of the work turning out not to satisfy the customer, too much dependence on my crystal ball :sweating:


Roses are FF0000, violets are 0000FF... All my base are belong to you.

Share this post


Link to post
Share on other sites

Let's assume the phone numbers exist. Is that any better? Longish ones will undoubtedly begin with zero.

Four international call prefixes exist: 00, 0011, 010, 011 [Edit: actually it's more complicated than that]. Stop complaining about specs: phone numbers can be found all over the internet. :frantics:

You are just not thinking out of the box.

Edited by czardas

Share this post


Link to post
Share on other sites

ok, feel free to bash me in... how about this:

step 1: traverse the array and strip all non-numeric characters for each element.

step 2: strip all non-numeric characters for the input.

step 3: traverse the array and try to match the input to each element from the right-hand-side of the string (start from the more specific digits sequence to the more general one).

any good?

 

Share this post


Link to post
Share on other sites
On 8/24/2016 at 9:53 PM, orbs said:

any good?

Very good indeed. At the point that a digit does not match, there are only a very small number of possibilities. Either you ran out of numbers in the shorter version, or you hit a zero = the first character (internal dialing), or you found a 1 that does not coincide with a zero within the international code prefix, else it's a mismatch. Country codes will always match. That's how far I got with it anyway. :)

Edit: This is not quite correct (amended later).

Edited by czardas

Share this post


Link to post
Share on other sites

Ahh, i was thinking the input could be random and we were trying to discern if it was a phone number...

You are providing every string is attempting to be a phone number, because those rules will still match random input like addresses or ssn if you strip the non-numeric


,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites
6 minutes ago, iamtheky said:

... those rules will still match random input ...

well, if the user is insane enough to submit the input "hi!, i'm no.6 and my age is 41, i was born in 1975" then why wouldn't they expect the match 641-1975 (if it exists in the array)?

EDIT: of course you can sanitize the input (to some extent), for example by disallowing letters, or allowing only commonly-used phone numbers delimiters (space, brackets, hyphen...)

Edited by orbs

Share this post


Link to post
Share on other sites

what if they enter

123-45-6789 or

19078 XXX st. Apt. 1904-2

especially if the goal is "find it if you can" you may not want to strip all non-alpha without checking some stuff first, is all.

 We have no example string from which to find a phone number, so just guessing still...

Edited by iamtheky

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites
6 minutes ago, iamtheky said:

19078 XXX st. Apt. 1904-2

FAIL. Unless you want to cover random text inserts within phone numbers. XXX is probably a variable covering several numbers. You can look for "st. Apt." using StringInStr() instead.

Edited by czardas

Share this post


Link to post
Share on other sites

well, if the array contains something like  "+1 112 345 6789" then it will match the first input, and that is a good result (if i get the initial intention right...). the second input is by no means a phone number, and the user who is ought to know that he is looking for a phone number should not expect any coherent result.

Share this post


Link to post
Share on other sites

also, if you want to allow wildcards, then let the user type-in a well-known character (e.g. a question mark, or an X as in your example), then when you strip non-numeric characters, leave that one in, and use RegExp to make the match.

Share this post


Link to post
Share on other sites

i get the same results as JohnOne, and i'm not sure what you expect further. here's my code - excluding the wildcard sample:

#include <Array.au3>

; tester input
Global $aPhone = [ _
        '+262 692 12 03 00', '1800 251 996', '+1 994 951 0197', _
        '091 535 98 91 61', '2397865', '08457 128276', _
        '348476300192', '05842 361774', '0-800-022-5649', _
        '15499514891', '0096 363 0949', '04813137349', _
        '06620 220168', '07766 554433', '047 845 44 22 94', _
        '0435 773 4859', '(01) 882 8565', '00441619346434', _
        '09314 367090', '0 164 268 0887', '0590995603', _
        '991', '0267 746 3393', '064157526153', _
        '0 719 829 7756', '+1-541-754-3012', '+441347543010', _
        '03890 978398', '(31) 10 7765420', '020 8568 6646', _
        '0161 934 6534', '0 637 915 1283', '+44 207 882 8565', _
        '0800 275002', '0750 646 9746', '982-714-3119', _
        '000 300 74 52 40', '023077529227', '1 758 441 0611', _
        '0183 233 0151', '02047092863', '+44 20 7946 0321', _
        '04935 410618', '048 257 67 60 79']
Global $aQuery = [ _
        '882 8565', _
        '123 8762', _
        '7543010', _
        '07843 543287', _
        '00441619346534', _
        '0015417543012']

; declare the match results array: rows = phone numbers, columns = queries
Global $aMatch[UBound($aPhone) + 1][UBound($aQuery) + 1]

; populate headers (rows and columns) and strip non-numeric characters
For $iPhone = 0 To UBound($aPhone) - 1
    $aMatch[$iPhone + 1][0] = _StringStripNonNumeric($aPhone[$iPhone])
Next
For $iQuery = 0 To UBound($aQuery) - 1
    $aMatch[0][$iQuery + 1] = _StringStripNonNumeric($aQuery[$iQuery])
Next

; match
For $iPhone = 1 To UBound($aMatch) - 1
    For $iQuery = 1 To UBound($aMatch, 2) - 1
        If _StringMatchEnd($aMatch[$iPhone][0], $aMatch[0][$iQuery]) Then $aMatch[$iPhone][$iQuery] = 'MATCH'
    Next
Next

; re-populate headers with original values for display
For $iPhone = 0 To UBound($aPhone) - 1
    $aMatch[$iPhone + 1][0] = $aPhone[$iPhone]
Next
For $iQuery = 0 To UBound($aQuery) - 1
    $aMatch[0][$iQuery + 1] = $aQuery[$iQuery]
Next

; display match results
_ArrayDisplay($aMatch)

; functions
Func _StringStripNonNumeric($sString)
    Local $sResult = ''
    For $i = 1 To StringLen($sString)
        If StringIsDigit(StringMid($sString, $i, 1)) Then $sResult &= StringMid($sString, $i, 1)
    Next
    Return $sResult
EndFunc   ;==>_StringStripNonNumeric
Func _StringMatchEnd($sString, $sSubstr)
    If StringRight($sString, StringLen($sSubstr)) = $sSubstr Then Return True
    Return False
EndFunc   ;==>_StringMatchEnd

 

Edited by orbs

Share this post


Link to post
Share on other sites

Like @JLogan3o13, I originally thought this would be quite easy. It turns out to be less trivial than it sounds. Perhaps I didn't quite explain the task clearly enough.

If the phone numbers are real, the following code should find them (most of time anyway - see next post :ermm:). If @jchd would be kind enough to squash it with a crazy regexp, I would love to see that. I prefer to have the process broken down into more understandable steps, at least to begin with. I haven't thoroughly tested the code yet (posted quickly so you will hopefully get the idea).

#include <Array.au3>

; MsgBox(0, "", TelCompare('010 44 161 882 8565', '0011 44 161 882 8565'))

Local $aArray = _
    ['+262 692 12 03 00', '1800 251 996',    '+1 994 951 0197', _
    '091 535 98 91 61',   '2397865',         '08457 128276', _
    '348476300192',       '05842 361774',    '0-800-022-5649', _
    '15499514891',        '0096 363 0949',   '04813137349', _
    '06620 220168',       '07766 554433',    '047 845 44 22 94', _
    '0435 773 4859',      '(01) 882 8565',   '00441619346434', _
    '09314 367090',       '0 164 268 0887',  '0590995603', _
    '991',                '0267 746 3393',   '064157526153', _
    '0 719 829 7756',     '+1-541-754-3012', '+441347543010', _
    '03890 978398',       '(31) 10 7765420', '020 8568 6646', _
    '0161 934 6534',      '0 637 915 1283',  '+44 207 882 8565', _
    '0800 275002',        '0750 646 9746',   '982-714-3119', _
    '000 300 74 52 40',   '023077529227',    '1 758 441 0611', _
    '0183 233 0151',      '02047092863',     '+44 20 7946 0321', _
    '04935 410618',       '048 257 67 60 79']

Global $aQuery = [ _
        '882 8565', _
        '123 8762', _
        '7543010', _
        '07843 543287', _
        '00441619346534', _
        '0015417543012']

For $i = 0 To UBound($aQuery) -1
    For $j = 0 To UBound($aArray) -1
        If TelCompare($aQuery[$i], $aArray[$j]) Then
            ConsoleWrite($aQuery[$i] & " = " & $aArray[$j] & " found at index: " & $j & @LF)
        EndIf
    Next
Next

_ArrayDisplay($aArray)

Func TelCompare($sTelNum1, $sTelNum2, $iMinMatch = 3) ; , $iMaxLen = 25 probably
    ; get rid of typical delimiters
    $sTelNum1 = StringRegExpReplace($sTelNum1, '[ \+\(\)\-]', '')
    $sTelNum2 = StringRegExpReplace($sTelNum2, '[ \+\(\)\-]', '')
    If $sTelNum1 = $sTelNum2 Then Return True ; no need to go any further

    Local $iLen1 = StringLen($sTelNum1), $iLen2 = StringLen($sTelNum2), $vTemp

    If $iLen2 < $iLen1 Then ; make $sTelNum1 the shorter number
        $vTemp = $iLen1
        $iLen1 = $iLen2
        $iLen2 = $vTemp

        $vTemp = $sTelNum1
        $sTelNum1 = $sTelNum2
        $sTelNum2 = $vTemp
    EndIf

    If $iLen1 <= $iMinMatch Then Return False ; insufficient information

    If StringRight($sTelNum1, $iMinMatch) <> StringRight($sTelNum2, $iMinMatch) Then Return False ; minimum match failed

    $sTelNum1 = StringReverse($sTelNum1) ; to simplify parsing later
    $sTelNum2 = StringReverse($sTelNum2) ; dito

    ; the algorithm [international dialing codes all begin with zero]
    Local $sDigit1, $sDigit2
    For $i = $iMinMatch +1 To $iLen1
        $sDigit1 = StringMid($sTelNum1, $i, 1)
        $sDigit2 = StringMid($sTelNum2, $i, 1)

        If $sDigit1 <> $sDigit2 Then ; let's find out why
            Local $iOffSet = $iLen2 - $iLen1
            If $i = $iLen1 Then ; we have reached the first digit
                If $sDigit1 = "0" Then ; maybe omitted in $sTelNum2 or different international dialing code
                    ; test the first zero omission theory with country codes (reversed)
                    If StringRegExp(StringRight($sTelNum2, $iOffSet +1), '(\d){1,3}(00|1100|010|110)?') Then Return True

                    ; next test international dialing codes (reversed)
                    Return ($iOffSet < 3 And StringRegExp($sTelNum2, '(1100|10|110)\z')) ? True : False
                EndIf

            Else ; test international dialing codes (reversed)
                If $sDigit1 = "1" Then ; must have failed to match a zero within the international code
                    ; since the shorter number contains 1 in the international dialing code:
                    Return ($iOffSet = 1 And StringRight($sTelNum2, 4) = '1100') ? True : False
                EndIf
            EndIf

        EndIf
    Next

    Return True
EndFunc ;==> TelCompare

Here are the results:

882 8565 = (01) 882 8565 found at index: 16
882 8565 = +44 207 882 8565 found at index: 32
7543010 = +441347543010 found at index: 26
00441619346534 = 0161 934 6534 found at index: 30
0015417543012 = +1-541-754-3012 found at index: 25

It contains a bug (logic flaw).

Edited by czardas
bugfix

Share this post


Link to post
Share on other sites

ok, i took this approach:

following my previous code, i changed the match criteria from True/False to a matching score [0..1] - the more identical numerals of the query matching the phonebook entry, the higher the score is. i assumed a score of >0.7 is considered a match, and after testing i found the threshold to be >0.3. try this:

#include <Array.au3>

; tester input
Global $aPhone = [ _
        '+262 692 12 03 00', '1800 251 996', '+1 994 951 0197', _
        '091 535 98 91 61', '2397865', '08457 128276', _
        '348476300192', '05842 361774', '0-800-022-5649', _
        '15499514891', '0096 363 0949', '04813137349', _
        '06620 220168', '07766 554433', '047 845 44 22 94', _
        '0435 773 4859', '(01) 882 8565', '00441619346434', _
        '09314 367090', '0 164 268 0887', '0590995603', _
        '991', '0267 746 3393', '064157526153', _
        '0 719 829 7756', '+1-541-754-3012', '+441347543010', _
        '03890 978398', '(31) 10 7765420', '020 8568 6646', _
        '0161 934 6534', '0 637 915 1283', '+44 207 882 8565', _
        '0800 275002', '0750 646 9746', '982-714-3119', _
        '000 300 74 52 40', '023077529227', '1 758 441 0611', _
        '0183 233 0151', '02047092863', '+44 20 7946 0321', _
        '04935 410618', '048 257 67 60 79']
Global $aQuery = [ _
        '882 8565', _
        '123 8762', _
        '7543010', _
        '07843 543287', _
        '00441619346534', _
        '0015417543012']
Global $iScoreThreshold = 0.3

; declare the match results array: rows = phone numbers, columns = queries
Global $aMatch[UBound($aPhone) + 1][UBound($aQuery) + 1]

; populate headers (rows and columns) and strip non-numeric characters
For $iPhone = 0 To UBound($aPhone) - 1
    $aMatch[$iPhone + 1][0] = _StringStripNonNumeric($aPhone[$iPhone])
Next
For $iQuery = 0 To UBound($aQuery) - 1
    $aMatch[0][$iQuery + 1] = _StringStripNonNumeric($aQuery[$iQuery])
Next

; match
For $iPhone = 1 To UBound($aMatch) - 1
    For $iQuery = 1 To UBound($aMatch, 2) - 1
        If _StringMatchEnd($aMatch[$iPhone][0], $aMatch[0][$iQuery]) > $iScoreThreshold Then $aMatch[$iPhone][$iQuery] = 'MATCH'
    Next
Next

; re-populate headers with original values for display
For $iPhone = 0 To UBound($aPhone) - 1
    $aMatch[$iPhone + 1][0] = $aPhone[$iPhone]
Next
For $iQuery = 0 To UBound($aQuery) - 1
    $aMatch[0][$iQuery + 1] = $aQuery[$iQuery]
Next

; display match results
_ArrayDisplay($aMatch)

; functions
Func _StringStripNonNumeric($sString)
    Local $sResult = ''
    For $i = 1 To StringLen($sString)
        If StringIsDigit(StringMid($sString, $i, 1)) Then $sResult &= StringMid($sString, $i, 1)
    Next
    Return $sResult
EndFunc   ;==>_StringStripNonNumeric
Func _StringMatchEnd($sString, $sSubstr)
    Local $iScore = 0
    Local $iScorePerChar = 1 / StringLen($sSubstr)
    For $i = 1 To StringLen($sSubstr)
        If StringRight($sString, $i) = StringRight($sSubstr, $i) Then $iScore = $iScorePerChar * $i
    Next
    Return $iScore
EndFunc   ;==>_StringMatchEnd

 

Edited by orbs

Share this post


Link to post
Share on other sites

I think if you want to do the job right you have to include every country number convention into the source code. https://en.wikipedia.org/wiki/Category:Telephone_numbers_by_country And even then. How can you be sure its a valid phone number. It seems to me there is no easy way. With service numbers and all (which are also valid numbers) I have looked (a little) into the wiki page to see if its possible to gather the information about the number conventions automatically but as far as i see its not that easy as every page has a different lay-out. Some more info:

https://en.wikipedia.org/wiki/National_conventions_for_writing_telephone_numbers

Spoiler

$aStrPlainNumber = StringRegExp ( $strPhoneNumber, '\d+',  $STR_REGEXPARRAYGLOBALMATCH ) ; strip all non-number characters

 

 

Edited by pluto41

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...