MikeFez

Help with RegEx

7 posts in this topic

#1 ·  Posted (edited)

Hello,

I have a script that copies the text of a webpage into a variable named "$pagetext", and then runs a RegEx search looking for an email address in a specific spot. Originally, the format in the area of the page I'd need would be:

Quote

Phone: (555) 555-5555
Email: example@example.com
Example Company

To which, the following RegEx code worked:

$extractedEmail = StringRegExp($pagetext, "(?i)(?m:^)\s*email:\s*(.+)(?:\v|$)", 1)

Unfortunately, the format recently changed so that it now looks like:

Quote

Phone    
(555) 555-5555
Email    
example@example.com
Company
Example Company

Using this tool, I was able to determine that after "Email", comes a [Tab] and then an [End of Line(LF)]. Therefore, I tweaked my RegEx to this:

$extractedEmail = StringRegExp($pagetext, "(?i)(?m:^)\s*email\t\n*(.+)(?:\v|$)", 1)

And while this tool shows that it should be working, it doesn't seem to be working within AutoIt. Does anyone have any suggestions on how I could resolve this, or what I'm doing wrong?

Edited by MikeFez
Clarification

Share this post


Link to post
Share on other sites



$pagetext = "Phone" & @LF & _
"(555) 555-5555" & @LF & _
"Email" & @TAB & @LF & _
"example@example.com" & @LF & _
"Company" & @LF & _
"Example Company"

msgbox(0, '' , StringRegExp(StringStripWS($pagetext , 8), "Email(.*?)Company", 3)[0])

 


,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites

Another one :

$pagetext = "Phone" & @CRLF & _
"(555) 555-5555" & @CRLF & _
"Email" & @TAB & @CRLF & _
"example@example.com" & @CRLF & _
"Company" & @CRLF & _
"Example Company"

$extractedEmail = StringRegExp($pagetext, "(?is)\s*email\s*([^@]+@\N+)", 1)

MsgBox(0, "", $extractedEmail[0] )

 

Share this post


Link to post
Share on other sites

 

Usually in webpages the email addresses have already been tested to check if valid, so this should work

$extractedEmail = StringRegExp($pagetext, ".+@.+", 1)

:)

Share this post


Link to post
Share on other sites

Thanks for the replies everyone. I tried each of those variations but unfortunately, none of them seem to be working on my end. I've attached a copy of the form I'm trying to pull this from which seems to be the same as the one I quoted in the original post, but does anyone know why this would not work?

Example.txt

Share this post


Link to post
Share on other sites

Hum this works for me ...

$txt = FileRead("Example.txt")
$extractedEmail = StringRegExp($txt, ".+@.+", 1)[0]
Msgbox(0,"", $extractedEmail)

 

Share this post


Link to post
Share on other sites

Hi mikell,

 

You're completely right. The issue was in another spot of my code - thank you very much for taking the time to help me out.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now