littleclown Posted March 9, 2010 Share Posted March 9, 2010 Hello. We have internal website with some communication. We have a lot of posts some of them with e-mail addresses, but without any specific rule. I need to filter just e-mails from this site to send everybody invite to register to the new user system. All I need is script to extract e-mails only. Something to find strings with e-mail pattern XXXX@XXXX.XXX Actually this is a standart e-mail spider, but I don't want to use some strange SPAM oriented shareware-s in office local network. I am not sure how to do this. Thank you in advanced Link to comment Share on other sites More sharing options...
littleclown Posted March 9, 2010 Author Share Posted March 9, 2010 (edited) I found this: #Include <String.au3> $Text = FileRead("email.txt") $EmailFound = StringRegExp($Text, "([A-Za-z0-9._%-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4})", 3) if @extended = 1 Then for $i = 0 to UBound($EmailFound) - 1 MsgBox(0, "E-Mail", $EmailFound[$i]) Next Else MsgBox(0, "E-Mail", "No E-Mail addressess found in the supplied text") EndIf But this don't work. I miss something, but I can get where is my mistake. Edited March 9, 2010 by littleclown Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted March 9, 2010 Moderators Share Posted March 9, 2010 littleclown,I do not know where that script came from, but the If test after the StringRegExp is wrong. If the SRE is successful @extended = 0, which will give you the "Not found" message. You need to check for @error instead.Change the If structure to read:If @error = 1 Then MsgBox(0, "E-Mail", "No E-Mail addressess found in the supplied text") Else For $i = 0 To UBound($EmailFound) - 1 MsgBox(0, "E-Mail", $EmailFound[$i]) Next EndIfI can extract email addresses with no problems using that. M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
littleclown Posted March 9, 2010 Author Share Posted March 9, 2010 Thank you very much! It works now. Link to comment Share on other sites More sharing options...
littleclown Posted March 9, 2010 Author Share Posted March 9, 2010 And can somebody modify this to make the same but for the URL-s? Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted March 9, 2010 Moderators Share Posted March 9, 2010 littleclown, Please post some examples of your data including the URLs you want to extract together with their surrounding characters so we can try and develop a SRE pattern for you. As URLs can vary in format quite a bit, trying to get a sense of how they are located in the data is vital. M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
littleclown Posted March 9, 2010 Author Share Posted March 9, 2010 Yes I know, and I can give you the most simple example <a href="http://someaddress/somepage.html">, but this pattern is not absolute. I mean sometimes the URL-s are different. Let me specify what I need if we forget this is an URL, but just a string. I need to extract all strings that begins with "http://" or before them there is "href=" or "href="" OR "href='" and the next character after this string is " " or """ or "'" or ">" or " " or some other symbols (i can add it after that - actually all symbols that can't be in correct URL). I think this is what I need. Thanks for your help! Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted March 9, 2010 Moderators Share Posted March 9, 2010 (edited) littleclown,Try this:$sText = 'rubbish_text<a href="http://someaddress/somepage.html">rubbish_text' $sURL = StringRegExpReplace($sText, '(?i).*http:(.+)">.*', 'http:$1') MsgBox(0, "", $sURL)Explanation -Pattern:(?i) = Case insensitive.* = any number of charactershttp: = literal text(.+) = capturing group of at least one character (capturing group means we can use it later, as you will see)"> = literal string.* = any number of charactersReplacement:http: = literal string$1 = first capturing group (what we had in brackets in the pattern)Now as long as you have the URL starting with "http:" and the tag ending in ">" - which I hope is all the time - you should be fine! M23Edit: Added the explanation. Edited March 9, 2010 by Melba23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
Fulano Posted March 9, 2010 Share Posted March 9, 2010 It looks like you've indicated the majority of the regular expression:(?i)[href=["']|http://]([^'>"]+)Breaking it down:[href=['"]|http://] = looks for either href= followed by an ' or " or http://([^'>"]+) = store as many characters as you can that are not: ' > "So: <a href="http://someaddress/somepage.html">Becomes: someaddress/somepage.htmlYou'll want to prefix them with http://, but that is fairly trivial.Hope this helpsAlmost forgot: (?i) makes it case insensitive #fgpkerw4kcmnq2mns1ax7ilndopen (Q, $0); while ($l = <Q>){if ($l =~ m/^#.*/){$l =~ tr/a-z1-9#/Huh, Junketeer's Alternate Pro Ace /; print $l;}}close (Q);[code] tag ninja! Link to comment Share on other sites More sharing options...
littleclown Posted March 10, 2010 Author Share Posted March 10, 2010 Thank you all for your replies, and for explanations about regular expressions, because I am new in this and will be great if next time I can do my own regular expression without ask you for this Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now