MikeFez Posted March 8, 2016 Share Posted March 8, 2016 (edited) Hello, I have a script that copies the text of a webpage into a variable named "$pagetext", and then runs a RegEx search looking for an email address in a specific spot. Originally, the format in the area of the page I'd need would be: Quote Phone: (555) 555-5555 Email: example@example.com Example Company To which, the following RegEx code worked: $extractedEmail = StringRegExp($pagetext, "(?i)(?m:^)\s*email:\s*(.+)(?:\v|$)", 1) Unfortunately, the format recently changed so that it now looks like: Quote Phone (555) 555-5555 Email example@example.com Company Example Company Using this tool, I was able to determine that after "Email", comes a [Tab] and then an [End of Line(LF)]. Therefore, I tweaked my RegEx to this: $extractedEmail = StringRegExp($pagetext, "(?i)(?m:^)\s*email\t\n*(.+)(?:\v|$)", 1) And while this tool shows that it should be working, it doesn't seem to be working within AutoIt. Does anyone have any suggestions on how I could resolve this, or what I'm doing wrong? Edited March 8, 2016 by MikeFez Clarification Link to comment Share on other sites More sharing options...
iamtheky Posted March 8, 2016 Share Posted March 8, 2016 $pagetext = "Phone" & @LF & _ "(555) 555-5555" & @LF & _ "Email" & @TAB & @LF & _ "example@example.com" & @LF & _ "Company" & @LF & _ "Example Company" msgbox(0, '' , StringRegExp(StringStripWS($pagetext , 8), "Email(.*?)Company", 3)[0]) ,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-. |(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/ (_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_) | | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) ( | | | | |)| | \ / | | | | | |)| | `--. | |) \ | | `-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_| '-' '-' (__) (__) (_) (__) Link to comment Share on other sites More sharing options...
jguinch Posted March 8, 2016 Share Posted March 8, 2016 Another one : $pagetext = "Phone" & @CRLF & _ "(555) 555-5555" & @CRLF & _ "Email" & @TAB & @CRLF & _ "example@example.com" & @CRLF & _ "Company" & @CRLF & _ "Example Company" $extractedEmail = StringRegExp($pagetext, "(?is)\s*email\s*([^@]+@\N+)", 1) MsgBox(0, "", $extractedEmail[0] ) Spoiler Network configuration UDF, _DirGetSizeByExtension, _UninstallList Firefox ConfigurationArray multi-dimensions, Printer Management UDF Link to comment Share on other sites More sharing options...
mikell Posted March 8, 2016 Share Posted March 8, 2016 Usually in webpages the email addresses have already been tested to check if valid, so this should work $extractedEmail = StringRegExp($pagetext, ".+@.+", 1) Link to comment Share on other sites More sharing options...
MikeFez Posted March 10, 2016 Author Share Posted March 10, 2016 Thanks for the replies everyone. I tried each of those variations but unfortunately, none of them seem to be working on my end. I've attached a copy of the form I'm trying to pull this from which seems to be the same as the one I quoted in the original post, but does anyone know why this would not work? Example.txt Link to comment Share on other sites More sharing options...
mikell Posted March 10, 2016 Share Posted March 10, 2016 Hum this works for me ... $txt = FileRead("Example.txt") $extractedEmail = StringRegExp($txt, ".+@.+", 1)[0] Msgbox(0,"", $extractedEmail) Link to comment Share on other sites More sharing options...
MikeFez Posted March 10, 2016 Author Share Posted March 10, 2016 Hi mikell, You're completely right. The issue was in another spot of my code - thank you very much for taking the time to help me out. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now