Sign in to follow this  
Followers 0
Crayfish

Parse US Postal State from a string. (Solved)

14 posts in this topic

#1 ·  Posted (edited)

#include <Array.au3>; Just for _ArrayDisplay
$s_states = "(?-i:A[LKSZRAEP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])"

$s_test_str = "123 HOUSTON DRIVE, HEARTLAND, OH 75126"
;~ $s_test_str &= "555 HOUSTON DRIVE, HEARTLAND, TX 75126"
;~ $s_test_str &= "789 WOODLAND TRL, LADY LAKE, MA 32159"

$ST = StringRegExp($s_test_str, $s_states, 3)

_ArrayDisplay($ST)

I just want to return the State without pulling random char from the address.

What wrong with my logic?

** Solved with regex word boundary **

Edited by Crayfish

Share this post


Link to post
Share on other sites



$s_states = "s(A[LKSZRAEP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])s"

I suppose I can do space_ front and _space back.

Share this post


Link to post
Share on other sites

You could split it by the commas and only search for the 3rd element.

#include <Array.au3>; Just for _ArrayDisplay
$s_states = "(?-i:A[LKSZRAEP]|C[AOT]|D[EC]|F[LM]|G[AU]|HI|I[ADLN]|K[SY]|LA|M[ADEHINOPST]|N[CDEHJMVY]|O[HKR]|P[ARW]|RI|S[CD]|T[NX]|UT|V[AIT]|W[AIVY])"

$s_test_str = "123 HOUSTON DRIVE, HEARTLAND, OH 75126"
$s_split = StringSplit($s_test_str, ",", 2)
$ST = StringRegExp($s_split[2], $s_states, 3)

_ArrayDisplay($ST)

RAID Calculator | Software Installer

The truth has been suppressed since the dawn of time.

Share this post


Link to post
Share on other sites

Crayfish,

This works for me: :)

#include <Array.au3>

$s_test_str = "123 HOUSTON DRIVE, HEARTLAND, OH 75126" & @CRLF
$s_test_str &= "555 HOUSTON DRIVE, HEARTLAND, TX 75126" & @CRLF
$s_test_str &= "789 WOODLAND TRL, LADY LAKE, MA 32159"

$aState = StringRegExp($s_test_str, "(?i).*,s(w{2})s.*", 3)

_ArrayDisplay($aState)

Any use? :)

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

I have a working, but very long winded, version, but it should be possible to shorten it. You were on the money when you suggested adding spaces, but I think it's better to use b which marks a word boundry. If you are always certain of the exact character padding around the letters then I would go with Melba's code. I'm not entirely happy with what I produced, although it seems to work fine. I tried a few other ways and kept getting duplicated results. :idiot:

#include <Array.au3>; Just for _ArrayDisplay
$s_states = "(?-i)(bA[LKSZRAEP]b|bC[AOT]b|bD[EC]b|bF[LM]b|bG[AU]b|bHIb|bI[ADLN]b|bK[SY]b|bLAb|bM[ADEHINOPST]b|bN[CDEHJMVY]b|bO[HKR]b|bP[ARW]b|bRIb|bS[CD]b|bT[NX]b|bUTb|bV[AIT]b|bW[AIVY]b)"

$s_test_str = "123 HOUSTON DRIVE, HEARTLAND, OH 75126"
;~ $s_test_str &= "555 HOUSTON DRIVE, HEARTLAND, TX 75126"
;~ $s_test_str &= "789 WOODLAND TRL, LADY LAKE, MA 32159"

$ST = StringRegExp($s_test_str, $s_states, 3)

_ArrayDisplay($ST)

Perhaps you can figure out a way to simplify it.

Edited by czardas

Share this post


Link to post
Share on other sites

Melba:

I thought of that before but than I lost the validation for the actual States validation. Your logic only pull any 2 chars in between spaces after a comma.

abberration:

Your logic is great. Parse into 3 parts and check last no more random mis-grab random letters matching states, but it only work in this 3 perfect example that I included. I can't depend on "comma" too much because sometime it wasn't included.

555 HOUSTON DRIVE, HEARTLAND TX 75126 (This will pull LA and TX).

Thank you for your time. Any other approach?

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

Well, you could do it old school, assuming all zip codes are 5 characters.

$s_test_str = "123 HOUSTON DRIVE, HEARTLAND, OH 75126"
$s_no_white_space = StringStripWS($s_test_str, 8)
$trim = stringright($s_no_white_space, 7)
$trim = StringTrimRight($trim, 5)
MsgBox(0, "", $trim)
Edited by abberration

RAID Calculator | Software Installer

The truth has been suppressed since the dawn of time.

Share this post


Link to post
Share on other sites

or if you have the option to read it in as an array - or stringsplit it on the crlfs

local $s_test_str[3]

$s_test_str[0] = "123 HOUSTON DRIVE, HEARTLAND, OH 75126"
$s_test_str[1] = "555 HOUSTON DRIVE, HEARTLAND, TX 75126"
$s_test_str[2] = "789 WOODLAND TRL, LADY LAKE, MA 32159"

for $i = 0 to ubound ($s_test_str) - 1
$ST = Stringleft(Stringright($s_test_str[$i] , 8) , 2)
msgbox (0, '' , $ST)
next

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

Thanks everyone for your contribution.

I still want to keep the validation of the specific 50 states.

Trimming and space count defeat that purpose.

boththose | abberration:

Lose validation of the states. Interest in pulling the states but I still want it to pull the actual states. If someone mistype the state I don't want it to pull and rather it return @error.

czardas:

Matches at a word boundary. I think this is what I need but I don't think there any shortcut or any way to slim this down.

(bA[LKSZRAEP]b|bC[AOT]b|bD[EC]b|bF[LM]b|bG[AU]b|bHIb|bI[ADLN]b|bK[SY]b|bLAb|bM[ADEHINOPST]b|bN[CDEHJMVY]b|bO[HKR]b|bP[ARW]b|bRIb|bS[CD]b|bT[NX]b|bUTb|bV[AIT]b|bW[AIVY]b)
Edited by Crayfish

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

I don't think there any shortcut or any way to slim this down.

That would surprise me, but I failed to figure it out. I don't think it matters too much though. You save a few bytes, that's all. It should already be fast enough to not make any significant difference if you were to rewrite it. I would still like to know why I struggled with it though. It seems as if it should be easy.

Edited by czardas

Share this post


Link to post
Share on other sites

On a side note, since you want to check for accuracy of the state, check out this site:

http://www.carrierroutes.com/ZIPCodes.html

Check out the "three digit zip code" section. The first digit of the zip code could help determine if the state matches correctly. Might be something to think about.


RAID Calculator | Software Installer

The truth has been suppressed since the dawn of time.

Share this post


Link to post
Share on other sites

abberration:

That is very interesting, but it seem to branch out too big after 2nd digit. I tried to see the zip map (They want to charge me to buy the map.^^; ).

I think I'll just stick to good old 50 states because using 3 digits zip there are 1000 variants. http://en.wikipedia.org/wiki/List_of_ZIP_code_prefixes

Share this post


Link to post
Share on other sites

#13 ·  Posted (edited)

"^.*sb([ACDFGHIKLMNOPRSTUVW][ACDEHIJKLMNORSTUVYZZ])bs*.?d+.*$"

I think I have them all in there except for places like Guam, Marshall Islands &etc.

You can remove the ^ and $ from the ends if there is more to the string than what you showed.

Edit:

I worked it based on this list

https://www.usps.com/send/official-abbreviations.htm

Another Edit:

I also tried it with good results using this shorter version

"^.*sb([ACDFGHIKLMNOPRSTUVW][A-Z])bs*.?d+.*$"
Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

GEOSoft:

That is an exceptional expression to pull states off but it's assuming the state is correct. I suppose there no win-all in this situation. (i.e. This will grab AA AC AD AF etc..These are not states.)

I really like the shorten version but I still think czardas' idea most promising because it does a simple 50 states guard without going extreme on creating a new validating function.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0