Sign in to follow this  
Followers 0
Champak

help searching and editing string

8 posts in this topic

#1 ·  Posted (edited)

I have the following address example strings that I need to search through and edit

482 Albany Shaker Rd Osborne Rd
Albany, NY 12211

875 New Scotland Ave opp Whitehall Rd
Albany, NY 12208

64 Colvin Ave Central Ave
Albany, NY 12206

62 Exchange St
Albany, NY 12205

477 Delaware Ave Near Whitehall Rd
Albany, NY 12209

351 Southern Blvd
Albany, NY 12209

477 Delaware Ave Whitehall Rd
Albany, NY 12209

553 Washington Ave Ontario St
Albany, NY 12206

591 Broadway (NY-32) opp Fed Ex Plaza, near Village One Apts
Albany, NY 12204

442 Madison Ave
Albany, NY 12208

484 Loudon Rd (US-9) near Turner Ln, E of Siena
Albany, NY 12211

821 New Scotland Ave near Crescent Dr
Albany, NY 12208

 

What I'm trying to do is get rid of everything on the first line following the FIRST road designation. The problem as you see is the first line which contains the street address sometimes contains a cross street. I can't simply put it in a loop and look for the first street designation because the second one might be triggered dependent on the order I load the designations in the array. Take the last one for example. If I put a loop looking for the address designation and "Dr" is the first one in the array that I'm searching through, it wont fix my problem, but if "Ave" is the first one in the array, I can program it to delete everything after "Ave". I can't program it to look for the fourth word, because the address can contain anywhere from 2-4 words. See my dilemma. Can I get some help with this? I'm stuck. Thanks.

I have no example, because I don't know where to begin.

Edited by Champak

Share this post


Link to post
Share on other sites



#3 ·  Posted (edited)

Try this.

Local $sTestString = StringRegExpReplace(FileRead(@ScriptFullPath), "(?is)^.+#cs\v*(.+)#ce.*$", "\1") ; Extract test data from this script.
;ConsoleWrite($sTestString & @LF)

ConsoleWrite(StringRegExpReplace($sTestString, "(?i)(\d+.+?\b(Rd|St|Ave|\(?[A-Z]+-\d+\)?|Dr|Blvd|Ln|Aly|Cres|Ct|Ter))[\s,.]\V*", "\1") & @LF)


#cs
482 Albany Shaker Rd Osborne Rd
Albany, NY 12211



875 New Scotland Ave opp Whitehall Rd
Albany, NY 12208



64 Colvin Ave Central Ave
Albany, NY 12206



62 Exchange St
Albany, NY 12205



477 Saly Aly Near Whitehall Rd
Albany, NY 12209



351 Southern Blvd
Albany, NY 12209



477 Delaware Ave Whitehall Rd
Albany, NY 12209



553 Terrance Ter, Ontario St
Albany, NY 12206



591 Broadway (NY-32) opp Fed Ex Plaza, near Village One Apts
Albany, NY 12204



442 Madison Ave
Albany, NY 12208



484 Loudon Rd (US-9) near Turner Ln, E of Siena
Albany, NY 12211



821 New Scotland Ave near Crescent Dr
Albany, NY 12208

#ce

Edit: Added "|Blvd"

Edit2: RE pattern was "(?i)(d+.+?(Rd|St|Ave|(?[A-Z]+-d+)?|Dr|Blvd))V*".
Just in case a street type exists as a sub-string within the street name I added "b" before street type group and "[s,.]" afterwards. Now, if a street type exists within a street name this street type will not be matched. Although, if there was a terrace called Ter, "553 Ter Ter" would be reduced to "553 Ter" by mistake. However, "477 Saly Aly"  and "553 Terrance Ter" are fine.

Edited by Malkey

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

THANKS! Almost perfect. One issues.

1221 Western Ave @Homestead St
Albany, NY 12203
 
produces:
 
1221 West
Albany, NY 12203
 
instead of:
 
1221 Western Ave
Albany, NY 12203

And

116 Broadway (RT-32) @Wards Ln
Albany, NY 12204
 
produces:
 
116 Broadway (RT-32)
Albany, NY 12204
 
instead of:
 
116 Broadway
Albany, NY 12204

And

195 21st Ave Madison St
Paterson, NJ 07501
 
produces:
 
195 21st
Paterson, NJ 07501
 
instead of:
 
195 21st Ave
Paterson, NJ 07501

Basically it seems that if the street designation is contained within the street name like "1ST", "WeSTchester", "EdwaRD", "TAVErn", "DRum", the function will cut everything off after that point instead of the actual street designation.

As far as the second example, could you show me separately how I would remove (RT-32) or a variable of that, because in places like NJ the "RT-??" is the actual street name of the address, so I'm not 100% if I want to remove that yet. Thanks.

Edited by Champak

Share this post


Link to post
Share on other sites

I fixed the first issue by adding a space before the designations like this:

StringRegExpReplace($UNIVERSALVAR2, "(?i)(\d+.+?( Rd| St| Ave|\(?[A-Z]+-\d+\)?| Dr| Blvd))\V*", "\1")

Let me know if that's not the best way to do this.

 

Also, I'm trying to retrieve a specific variable string in large paragraph. The only constant is the string, which is a number that is 1 to 3 digits, always follows "Summary @crlf @crlf There are ". How can I retrieve this numeric string that I'm after? I have a feeling that it will have to do with StringRegExp, but as long as I've been doing this, that and dllcall I just can't seem to get. Thanks.

Share this post


Link to post
Share on other sites

When I run the Edit2 version of the example of my post #3, your examples of post #4, example 1 and 3 already output the required "instead of" results.

To match "(NY-32)", I have used "(?[A-Z]+-d+)?".
In the Edit2 version of the example of my post #3, a matching of "(NY-32)" is included in the output.
In this post example, the matching of "(NY-32)" is not included in the output.

Local $sTestString = StringRegExpReplace(FileRead(@ScriptFullPath), "(?is)^.+#cs\v*(.+)#ce.*$", "\1") ; Extract test data from this script.
;ConsoleWrite($sTestString & @LF)

ConsoleWrite(StringRegExpReplace($sTestString, "(?i)(\d+.+?)(\b(Rd|St|Ave|Dr|Blvd|Ln|Aly|Cres|Ct|Ter)|(?:\(?[A-Z]+-\d+\)?))[\s,.]\V*", "\1\3") & @LF); |\(?[A-Z]+-\d+\)?


#cs
591 Broadway (NY-32) opp Fed Ex Plaza, near Village One Apts
Albany, NY 12204


195 21st Ave Madison St
Paterson, NJ 07501


1221 Western Ave @Homestead St
Albany, NY 12203


482 Albany Shaker Rd Osborne Rd
Albany, NY 12211


875 New Scotland Ave opp Whitehall Rd
Albany, NY 12208



64 Colvin Ave Central Ave
Albany, NY 12206



62 Exchange St
Albany, NY 12205



477 Saly Aly Near Whitehall Rd
Albany, NY 12209



351 Southern Blvd
Albany, NY 12209



477 Delaware Ave Whitehall Rd
Albany, NY 12209



553 Terrance Ter, Ontario St
Albany, NY 12206



442 Madison Ave
Albany, NY 12208



484 Loudon Rd (US-9) near Turner Ln, E of Siena
Albany, NY 12211



821 New Scotland Ave near Crescent Dr
Albany, NY 12208

#ce

Share this post


Link to post
Share on other sites

Thanks

Share this post


Link to post
Share on other sites

So an issue has popped up with this. If there is nothing to delete/remove after the street designation and there is no period at the end of the street designation, the line feed isn't added in. The line feed is only added when the period is there or when the string is edited. Does a separate stringregexpreplace need to be put in to take care of this afterwards, or can that be added into the existing one?

StringRegExpReplace($UNIVERSALVAR2, "(?i)(\d+.+?)(\b(Rd|St|Ave|Dr|Blvd|Boulevard|Ln|Lane|Pkwy|Way|Ally|Cres|Ct|Ter|Concourse|Hwy|Plaza)|(?:\(?[A-Z]+-\d+\)?))[\s,.]\V*", "\1\3" & @LF)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0