tobject Posted July 28, 2010 Share Posted July 28, 2010 (edited) I have bunch of text lines with addresses I want to parse addresses somehow I was thinking if I push whole string to google.MAPS but it does not understand it! Yikes! any other ideas? held October 4, 2010 To the Shareholders of Universal Security Instruments, Inc.: The Annual Meeting of Shareholders of Universal Security Instruments, Inc., a Maryland corporation (the “Company”) will be held at the Hilton Pikesville, 1726 Reisterstown Road, Pikesville, Maryland, on Monday, October 4, 2010 at 8:30 a.m., local time, for the following purposes: 1.To elect two dir held on Wednesday, September 8, 2010 at 2:30 p.m. at the Palais De Beaulieu, Rome Room, in Lausanne, Switzerland. Enclosed is the Invitation and Proxy Statement for the meeting, which includes an agenda and discussion of the items to be voted on at the meeting, information on how you can exercise your voting rights, information concerning Logitech’s compensation of its Board members and e HELD ON MONDAY, SEPTEMBER 13, 2010 NOTICE IS HEREBY GIVEN that the Annual Meeting of Stockholders of OPNET Technologies, Inc. will be held at our principal executive offices, 7255 Woodmont Avenue, Bethesda, Maryland 20814, on Monday, September 13, 2010 at 10:00 a.m., local time (the “Annual Meeting”), for the purpose of considering and voting upon the following matters: 1.To elect one Clas held on Monday, September 13, 2010 To the Shareholders of ePlus inc.: The Annual Meeting of Shareholders of ePlus inc., a Delaware corporation, will be held on September 13, 2010, at the Hyatt Regency, 1800 Presidents Street, Reston, Virginia, 20190 at 8:00 a.m. local time for the purposes stated below: 1.To elect directors named in the attached proxy statement, each to se held at the offices of the Company, 470 East Paces Ferry Road, N.E., Atlanta, Georgia, on Monday, August 16, 2010 at 4:00 p.m. for the following purposes: 1.To elect seven directors of the Company, three of whom will be elected by the holders of Class A Common Shares and four of whom will be elected by the holders of Class B Common Shares. 2.To approve the adoption of the Company’s 2 Edited July 28, 2010 by tobject Link to comment Share on other sites More sharing options...
czardas Posted July 29, 2010 Share Posted July 29, 2010 (edited) Gee this is a toughy. The only thing I can think of is starting at the end and working your way to the start. Armed with a database of Countries, Cities, Towns, and/or zip codes, try to match each word until you find what might be the last line of an address. Then try to identify the rest of the address using the commas as markers for each line. I'm not sure how you would do this, but there are some typical markers such as street (St) road (Rd), avenue (Ave), boulevard, place etc...I don't think this is at all easy, but you may be able to combine this approach with your original idea. Interesting project. One more comment: Look how many times the word 'at' appears in the examples you gave =>at the Hilton Pikesville, 1726 Reisterstown Road, Pikesville, Maryland, on Mondayat the Palais De Beaulieu, Rome Room, in Lausanne, Switzerland.at our principal executive offices, 7255 Woodmont Avenue, Bethesda, Maryland 20814, on Monday, September 13, 2010 at 10:00 a.m., local time at the Hyatt Regency, 1800 Presidents Street, Reston, Virginia, 20190 at 8:00 a.m.at the offices of the Company, 470 East Paces Ferry Road, N.E., Atlanta, Georgia, on Monday, August 16, 2010 at 4:00 p.m. Edited July 29, 2010 by czardas operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted July 29, 2010 Moderators Share Posted July 29, 2010 (edited) I'm sure this would fail somewhere ... but you should get the gist on how to fix it if it does. expandcollapse popup#include <Array.au3>; Just for _ArrayDisplay #region the important data Global $s_sre_address = "[\w ]+\s*(?:Road|Drive|Avenue|Street)?" Global $s_sre_direction = "(?:,\s*(?:N.|N.E.|N.W.|S.|S.E.|S.W.|W.|E.))?" Global $s_sre_city = ",\s*(?:[A-Z ]+)" Global $s_sre_states = ",\s*(?:" $s_sre_states &= "Alabama|Alaska|Arizona|Arkansas|California|" $s_sre_states &= "Colorado|Connecticut|Delaware|District of Columbia|Florida|" $s_sre_states &= "Georgia|Hawaii|Idaho|Illinois|Indiana|Iowa|Kansas|Kentucky|" $s_sre_states &= "Louisiana|Maine|Maryland|Massachusetts|Michigan|Minnesota|" $s_sre_states &= "Mississippi|Missouri|Montana|Nebraska|Nevada|New Hampshire|" $s_sre_states &= "New Jersey|New Mexico|New York|North Carolina|North Dakota|" $s_sre_states &= "Ohio|Oklahoma|Oregon|Pennsylvania|Rhode Island|South Carolina|" $s_sre_states &= "South Dakota|Tennessee|Texas|Utah|Vermont|Virginia|Washington|" $s_sre_states &= "West Virginia|Wisconsin|Wyoming)" Global $s_sre_zip = "(?:,\s*\d{5}(?:\s*-\s*\d{4})?)?" Global $s_sre_pattern = "(?i)(?s),\s*(" & $s_sre_address & $s_sre_direction & $s_sre_city & $s_sre_states & $s_sre_zip & ")" #endregion the important data Global $s_test_str = "" $s_test_str &= "held October 4, 2010 To the Shareholders of Universal Security Instruments, Inc.: The Annual Meeting" $s_test_str &= " of Shareholders of Universal Security Instruments, Inc., a Maryland corporation (the “Company”) wil" $s_test_str &= "l be held at the Hilton Pikesville, 1726 Reisterstown Road, Pikesville, Maryland, on Monday, October" $s_test_str &= " 4, 2010 at 8:30 a.m., local time, for the following purposes: 1.To elect two dirheld on Wednesday, " $s_test_str &= "September 8, 2010 at 2:30 p.m. at the Palais De Beaulieu, Rome Room, in Lausanne, Switzerland. Enclo" $s_test_str &= "sed is the Invitation and Proxy Statement for the meeting, which includes an agenda and discussion o" $s_test_str &= "f the items to be voted on at the meeting, information on how you can exercise your voting rights, i" $s_test_str &= "nformation concerning Logitech’s compensation of its Board members and eHELD ON MONDAY, SEPTEMBER 13" $s_test_str &= ", 2010 NOTICE IS HEREBY GIVEN that the Annual Meeting of Stockholders of OPNET Technologies, Inc. wi" $s_test_str &= "ll be held at our principal executive offices, 7255 Woodmont Avenue, Bethesda, Maryland 20814, on Mo" $s_test_str &= "nday, September 13, 2010 at 10:00 a.m., local time (the “Annual Meeting”), for the purpose of consid" $s_test_str &= "ering and voting upon the following matters: 1.To elect one Clasheld on Monday, September 13, 2010 T" $s_test_str &= "o the Shareholders of ePlus inc.: The Annual Meeting of Shareholders of ePlus inc., a Delaware corpo" $s_test_str &= "ration, will be held on September 13, 2010, at the Hyatt Regency, 1800 Presidents Street, Reston, Vi" $s_test_str &= "rginia, 20190 at 8:00 a.m. local time for the purposes stated below: 1.To elect directors named in t" $s_test_str &= "he attached proxy statement, each to seheld at the offices of the Company, 470 East Paces Ferry Road" $s_test_str &= ", N.E., Atlanta, Georgia, on Monday, August 16, 2010 at 4:00 p.m. for the following purposes: 1.To e" $s_test_str &= "lect seven directors of the Company, three of whom will be elected by the holders of Class A Common " $s_test_str &= "Shares and four of whom will be elected by the holders of Class B Common Shares. 2.To approve the ad" $s_test_str &= "option of the Company’s 2 " Global $a_sre = StringRegExp($s_test_str, $s_sre_pattern, 3) _ArrayDisplay($a_sre) Edited July 29, 2010 by SmOke_N Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
tobject Posted July 29, 2010 Author Share Posted July 29, 2010 (edited) Thanks, SmOke_N! why it misses some ZIP codes? Yikes, I wish I'd new how to construct Regular Expressions. is there a tool which does it for you?Good thing I have corporate address so if meeting is held there I can match strings from addressbut I also enconter problems like address is "7400 49TH AVE NORTH, NEW HOPE MN 55428" and in the letter it is spelled like "7400 49th Avenue North New Hope, Minnesota 55428"2nd problem - Not everything is in USA, i.e. "Palais De Beaulieu, Rome Room, in Lausanne, Switzerland"I'm looking for a quick solution. maybe a web service - I pass a string and it gets me an address or even better geo location.Resume parsing maybe? Is there like a no registration resume upload site which parses the address?Also if I can just get where start of the address is and where it endswhen I can pass it to google.Maps to get Geo location without parsing address further Edited July 29, 2010 by tobject Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted July 29, 2010 Moderators Share Posted July 29, 2010 String manipulation needs anchors, things that are constants to be able to be pulled off. You would have to build a very elaborate AI system to pull off what you're wanting more than likely unless you had all the output address rules. It's obviously not as simple as "I want so give me" type of thing. I gave you a base to work with, I'd suggest ( if you know RegEx ) to work from that. If you don't know regex, then as far as websites that do this type of thing, I'd imagine you're time on "Google" would be just as efficient as mine. The only other option if you don't know all the rules, is to give someone step by step how you get this data, give them access to be able to pull the data out and examine it, and more than likely, be willing to pay for the countless hours it would take for them to be able to distinguish all the string manipulation rules it would take to accomplish what you want. Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
czardas Posted July 29, 2010 Share Posted July 29, 2010 SmOke_N - that's a nice example of SRE to learn from. Indeed this is something of an AI type project. Each country will have different postal code formats. Some houses in England have names instead of numbers. In this case, it might be easier to concentrate on the language surrounding the address, such as: The meeting will be held at ... Meet at ... The address is ... The address is as follows ... the following address ... Write to ... Reply to ... operator64 ArrayWorkshop Link to comment Share on other sites More sharing options...
tobject Posted July 29, 2010 Author Share Posted July 29, 2010 (edited) You would have to build a very elaborate AI system to pull off what you're wanting more than likely unless you had all the output address rules.We're like hackers. Someone somewhere already done this - No need to re-invent the wheel!We just need to find where it is done and use it. Some web serviceor maybe I just wasting my time and there's a site like YourNextShareholderMeetingDotCom with all the addresses Here's what I got so far with SmOke_N help and my probability meetup in the company's officeall raw data in Line1,Line2,Line3parsed address in MeetAddr1,MeetAddr2, MeetAddr3 using SmOke_N's exampleDate and time almost perfect Edited August 22, 2010 by tobject Link to comment Share on other sites More sharing options...
tobject Posted July 29, 2010 Author Share Posted July 29, 2010 (edited) Checking Resume parsers Looks like this guy does somewhat good job. Requires Country selection Edited August 22, 2010 by tobject Link to comment Share on other sites More sharing options...
tobject Posted July 29, 2010 Author Share Posted July 29, 2010 (edited) I see some ocasional e-mails there what's the RegExp to get e-mail address? Edited August 22, 2010 by tobject Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted July 29, 2010 Moderators Share Posted July 29, 2010 http://www.regular-expressions.info/tutorial.htmlAgain, google is your friend.To help, the RegEx engine we use is PCRE. Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
tobject Posted July 29, 2010 Author Share Posted July 29, 2010 thanks! Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now