TheRauchster101

Pulling information from E-Mail

8 posts in this topic

Hello all,

Part of my job is finding people who download files illegally, and I get a few hundred to a few thousand emails a day regarding this. I've been trying to build an automatic script for a while on this, but am running into problems grabbing an IP address and timestamp from a variety of different formats on emails. Any suggestions on how to grab information from an email that changes depending on who is sending it?
 

Sometimes the information looks like:

Timestamp: 2015-03-18 21:50:13 North American Eastern Time

Unauthorized IP Address: 184.177.x.x 

Other times the information might be like: (I need to grab the first IP, but not the second)

2015-03-17 19:54:16.589158 IP (tos 0x0, ttl 241, id 40294, offset 0, flags [none], proto UDP (17), length 1427) 66.210.x.x.161 > 31.186.x.x.3389: UDP, length 1399

And other times, it might be:

>                             <TimeStamp>2015-03-28T19:30:11.23Z</TimeStamp>
>                             <IP_Address>67.202.x.x</IP_Address>

I have written code that can grab IP from a specific format, but I'd like to make a universal that can find the information no matter what it is surrounded by, rather than having to put in new code each time I get a new format.

Share this post


Link to post
Share on other sites



As the IP address always is in the format of nnn.nnn.nnn.nnn (IPV4) a regular expression should do what you want. I'm sure you will find example code on the forum.

Do you need to scan for IPV6 addresses as well?


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

_IsValidIP on the page mentioned by ViciousXUSMC does exactly what you are looking for.

It scans a string for IP-addresses and returns them (if any) in an array.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

Thank you for the prompt responses. 
I currently have not had any requests for IPv6 addresses, as most end-users do not have an IPv6 yet, but that may come in the future. Worry about that then.

IsValidIP looks like it would work for some of what I need. Probably can figure out something to base it on as far as what is a valid IP for the businesses I work with.

Any suggestions on how to pull the timestamp as well? That's one of the harder ones, because the format changes so much. And if I'm off by even a minute, I could get the wrong person.

Share this post


Link to post
Share on other sites

Looks to me that your timestamps are all in the basic xx:yy:zz ... regular expression should capture that too.


Lofting the cyberwinds on teknoleather wings, I am...The Blue Drache

Share this post


Link to post
Share on other sites

Just to show how you would extract the IP-addresses from the example lines you provided:

#include <Array.au3>

Local Const $sString = _
        "Timestamp: 2015-03-18 21:50:13 North American Eastern Time Unauthorized IP Address: 184.177.0.0" & @CRLF & _
        "2015-03-17 19:54:16.589158 IP (tos 0x0, ttl 241, id 40294, offset 0, flags [none], proto UDP (17), length 1427) 66.210.0.0.161 > 31.186.0.0.3389: UDP, length 1399" & @CRLF & _
        ">                             <TimeStamp>2015-03-28T19:30:11.23Z</TimeStamp> >                             <IP_Address>67.202.0.0</IP_Address>"

Local $IPsArray = _StringToIPArray($sString)
_ArrayDisplay($IPsArray)

Func _IsValidIP($sString, Const $sDelim = "")
    If Not StringInStr($sString, ".") Then Return 0
    If $sDelim <> "" Then $sString = StringLeft($sString, StringInStr($sString, $sDelim) - 1)
    If StringLen($sString) > 15 Then Return 0
    Local $Dot_Split = StringSplit($sString, ".")
    Local $iUbound = UBound($Dot_Split) - 1
    If $iUbound <> 4 Then Return 0
    For $i = 1 To $iUbound
        If $Dot_Split[$i] = "" Then Return 0
        If StringRegExp($Dot_Split[$i], '[^0-9]') Or Number($Dot_Split[$i]) > 255 Then Return 0
    Next
    If $sDelim <> "" Then Return $sString
    Return 1
EndFunc   ;==>_IsValidIP

Func _StringToIPArray($sString)
    Local $avArray = StringRegExp($sString, '([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)', 3)
    Local $avRetArr[1], $iUbound
    For $i = 0 To UBound($avArray) - 1
        If _IsValidIP($avArray[$i]) Then
            $iUbound = UBound($avRetArr)
            ReDim $avRetArr[$iUbound + 1]
            $avRetArr[$iUbound] = $avArray[$i]
        EndIf
    Next
    If $iUbound = 0 Then Return SetError(1, 0, 0)
    $avRetArr[0] = $iUbound
    Return $avRetArr
EndFunc   ;==>_StringToIPArray

returns:

Row|Col 0
[0]|4
[1]|184.177.0.0
[2]|66.210.0.0
[3]|31.186.0.0
[4]|67.202.0.0
 


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

Can you assume they are valid because the source is reliable?  And as long as you dont have other items that meet the pattern in the logs, a simple parse handling both x's and 0's should do (i went ahead and threw it in the first octet as well, eventhough that is quite unlikely to be wildcarded):

#include <Array.au3>

Local Const $sString = _
        "Timestamp: 2015-03-18 21:50:13 North American Eastern Time Unauthorized IP Address: 184.177.x.x" & @CRLF & _
        "2015-03-17 19:54:16.589158 IP (tos 0x0, ttl 241, id 40294, offset 0, flags [none], proto UDP (17), length 1427) 66.210.0.0.161 > 31.x.x.x.3389: UDP, length 1399" & @CRLF & _
        ">                             <TimeStamp>2015-03-28T19:30:11.23Z</TimeStamp> >                             <IP_Address>x.202.0.x</IP_Address>"



$aMatch = stringregexp($sString , "((?:\d+|x)\.(?:\d+|x)\.(?:\d+|x)\.(?:\d+|x))" , 3)

_ArrayDisplay($aMatch)

Out of curiosity what kind of monitoring solution are you using?  Waiting for SMTP traffic from whatever app is parsing your logs, and then having to parse those messages even further seems like there are inefficiencies upstream that could be removed.

Edited by boththose

,-. .--. ________ .-. .-. ,---. ,-. .-. .-. .-.
|(| / /\ \ |\ /| |__ __||| | | || .-' | |/ / \ \_/ )/
(_) / /__\ \ |(\ / | )| | | `-' | | `-. | | / __ \ (_)
| | | __ | (_)\/ | (_) | | .-. | | .-' | | \ |__| ) (
| | | | |)| | \ / | | | | | |)| | `--. | |) \ | |
`-' |_| (_) | |\/| | `-' /( (_)/( __.' |((_)-' /(_|
'-' '-' (__) (__) (_) (__)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now