Parse US Postal State from a string. (Solved)

Crayfish · July 22, 2012

#include <Array.au3>; Just for _ArrayDisplay
$s_states = "(?-i:A[LKSZRAEP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])"

$s_test_str = "123 HOUSTON DRIVE, HEARTLAND, OH 75126"
;~ $s_test_str &= "555 HOUSTON DRIVE, HEARTLAND, TX 75126"
;~ $s_test_str &= "789 WOODLAND TRL, LADY LAKE, MA 32159"

$ST = StringRegExp($s_test_str, $s_states, 3)

_ArrayDisplay($ST)

I just want to return the State without pulling random char from the address.

What wrong with my logic?

** Solved with regex word boundary **

Edited July 22, 2012 by Crayfish

Crayfish · July 22, 2012

$s_states = "s(A[LKSZRAEP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])s"

I suppose I can do space_ front and _space back.

abberration · July 22, 2012

You could split it by the commas and only search for the 3rd element.

#include <Array.au3>; Just for _ArrayDisplay
$s_states = "(?-i:A[LKSZRAEP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])"

$s_test_str = "123 HOUSTON DRIVE, HEARTLAND, OH 75126"
$s_split = StringSplit($s_test_str, ",", 2)
$ST = StringRegExp($s_split[2], $s_states, 3)

_ArrayDisplay($ST)

Melba23 · July 22, 2012

Crayfish,

This works for me:

#include <Array.au3>

$s_test_str = "123 HOUSTON DRIVE, HEARTLAND, OH 75126" & @CRLF
$s_test_str &= "555 HOUSTON DRIVE, HEARTLAND, TX 75126" & @CRLF
$s_test_str &= "789 WOODLAND TRL, LADY LAKE, MA 32159"

$aState = StringRegExp($s_test_str, "(?i).*,s(w{2})s.*", 3)

_ArrayDisplay($aState)

Any use?

M23

czardas · July 22, 2012

I have a working, but very long winded, version, but it should be possible to shorten it. You were on the money when you suggested adding spaces, but I think it's better to use b which marks a word boundry. If you are always certain of the exact character padding around the letters then I would go with Melba's code. I'm not entirely happy with what I produced, although it seems to work fine. I tried a few other ways and kept getting duplicated results. :idiot:

#include <Array.au3>; Just for _ArrayDisplay
$s_states = "(?-i)(bA[LKSZRAEP]b|bC[AOT]b|bD[EC]b|bF[LM]b|bG[AU]b|bHIb|bI[ADLN]b|bK[SY]b|bLAb|bM[ADEHINOPST]b|bN[CDEHJMVY]b|bO[HKR]b|bP[ARW]b|bRIb|bS[CD]b|bT[NX]b|bUTb|bV[AIT]b|bW[AIVY]b)"

$s_test_str = "123 HOUSTON DRIVE, HEARTLAND, OH 75126"
;~ $s_test_str &= "555 HOUSTON DRIVE, HEARTLAND, TX 75126"
;~ $s_test_str &= "789 WOODLAND TRL, LADY LAKE, MA 32159"

$ST = StringRegExp($s_test_str, $s_states, 3)

_ArrayDisplay($ST)

Perhaps you can figure out a way to simplify it.

Edited July 22, 2012 by czardas

Crayfish · July 22, 2012

Melba:

I thought of that before but than I lost the validation for the actual States validation. Your logic only pull any 2 chars in between spaces after a comma.

abberration:

Your logic is great. Parse into 3 parts and check last no more random mis-grab random letters matching states, but it only work in this 3 perfect example that I included. I can't depend on "comma" too much because sometime it wasn't included.

555 HOUSTON DRIVE, HEARTLAND TX 75126 (This will pull LA and TX).

Thank you for your time. Any other approach?

abberration · July 22, 2012

Well, you could do it old school, assuming all zip codes are 5 characters.

$s_test_str = "123 HOUSTON DRIVE, HEARTLAND, OH 75126"
$s_no_white_space = StringStripWS($s_test_str, 8)
$trim = stringright($s_no_white_space, 7)
$trim = StringTrimRight($trim, 5)
MsgBox(0, "", $trim)

Edited July 22, 2012 by abberration

iamtheky · July 22, 2012

or if you have the option to read it in as an array - or stringsplit it on the crlfs

local $s_test_str[3]

$s_test_str[0] = "123 HOUSTON DRIVE, HEARTLAND, OH 75126"
$s_test_str[1] = "555 HOUSTON DRIVE, HEARTLAND, TX 75126"
$s_test_str[2] = "789 WOODLAND TRL, LADY LAKE, MA 32159"

for $i = 0 to ubound ($s_test_str) - 1
$ST = Stringleft(Stringright($s_test_str[$i] , 8) , 2)
msgbox (0, '' , $ST)
next

Crayfish · July 22, 2012

Thanks everyone for your contribution.

I still want to keep the validation of the specific 50 states.

Trimming and space count defeat that purpose.

boththose | abberration:

Lose validation of the states. Interest in pulling the states but I still want it to pull the actual states. If someone mistype the state I don't want it to pull and rather it return @error.

czardas:

Matches at a word boundary. I think this is what I need but I don't think there any shortcut or any way to slim this down.

(bA[LKSZRAEP]b|bC[AOT]b|bD[EC]b|bF[LM]b|bG[AU]b|bHIb|bI[ADLN]b|bK[SY]b|bLAb|bM[ADEHINOPST]b|bN[CDEHJMVY]b|bO[HKR]b|bP[ARW]b|bRIb|bS[CD]b|bT[NX]b|bUTb|bV[AIT]b|bW[AIVY]b)

Edited July 22, 2012 by Crayfish

czardas · July 22, 2012

I don't think there any shortcut or any way to slim this down.

That would surprise me, but I failed to figure it out. I don't think it matters too much though. You save a few bytes, that's all. It should already be fast enough to not make any significant difference if you were to rewrite it. I would still like to know why I struggled with it though. It seems as if it should be easy.

Edited July 22, 2012 by czardas

abberration · July 22, 2012

On a side note, since you want to check for accuracy of the state, check out this site:

http://www.carrierroutes.com/ZIPCodes.html

Check out the "three digit zip code" section. The first digit of the zip code could help determine if the state matches correctly. Might be something to think about.

Crayfish · July 22, 2012

abberration:

That is very interesting, but it seem to branch out too big after 2nd digit. I tried to see the zip map (They want to charge me to buy the map.^^; ).

I think I'll just stick to good old 50 states because using 3 digits zip there are 1000 variants. http://en.wikipedia.org/wiki/List_of_ZIP_code_prefixes

GEOSoft · July 22, 2012

"^.*sb([ACDFGHIKLMNOPRSTUVW][ACDEHIJKLMNORSTUVYZZ])bs*.?d+.*$"

I think I have them all in there except for places like Guam, Marshall Islands &etc.

You can remove the ^ and $ from the ends if there is more to the string than what you showed.

Edit:

I worked it based on this list

https://www.usps.com/send/official-abbreviations.htm

Another Edit:

I also tried it with good results using this shorter version

"^.*sb([ACDFGHIKLMNOPRSTUVW][A-Z])bs*.?d+.*$"

Edited July 22, 2012 by GEOSoft

Crayfish · July 23, 2012

GEOSoft:

That is an exceptional expression to pull states off but it's assuming the state is correct. I suppose there no win-all in this situation. (i.e. This will grab AA AC AD AF etc..These are not states.)

I really like the shorten version but I still think czardas' idea most promising because it does a simple 50 states guard without going extreme on creating a new validating function.

Sign In

Parse US Postal State from a string. (Solved)

Recommended Posts

Crayfish

Crayfish

abberration

Melba23

czardas

Crayfish

abberration

iamtheky

Crayfish

czardas

abberration

Crayfish

GEOSoft

Crayfish

Create an account or sign in to comment

Create an account

Sign in

Browse

AutoIt Resources

Release

Beta