Sign in to follow this  
Followers 0
TerenceAgius

Decoding raw email messages with attachments

13 posts in this topic

Is there anybody who has done this or knows how it can be done?

I have many thousands of emails in raw text format, and I want to decode them, remove the attachments etc

Any ideas?

Share this post


Link to post
Share on other sites

#2 ·  Posted (edited)

Hi,

There are many String* functions to process texts.

Take a look at them in the help file.

Br, FireFox.

Edited by FireFox

 

OS : Win XP SP2 (32 bits) / Win 7 SP1 (64 bits) / Win 8 (64 bits) | Autoit version: latest stable / beta.
Hardware : Intel(R) Core(TM) i5-2400 CPU @ 3.10Ghz / 8 GiB RAM DDR3.

My UDFs : Skype UDF | TrayIconEx UDF | GUI Panel UDF | Excel XML UDF | Is_Pressed_UDF

My Projects : YouTube Multi-downloader | FTP Easy-UP | Lock'n | WinKill | AVICapture | Skype TM | Tap Maker | ShellNew | Scriptner | Const Replacer | FT_Pocket | Chrome theme maker

My Examples : Capture toolIP Camera | Crosshair | Draw Captured Region | Picture Screensaver | Jscreenfix | Drivetemp | Picture viewer

My Snippets : Basic TCP | Systray_GetIconIndex | Intercept End task | Winpcap various | Advanced HotKeySet | Transparent Edit control

 

Share this post


Link to post
Share on other sites

What do you mean by "raw text format"? A file stored by Outlook as type MSG?


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

@water

No such as this

Return-path: <hrg.destinationmanager@solresor.se>

Received: from mail.solresor.se (mail.solresor.se [212.247.99.130])

by mail.meetingpointint.com (mail.meetingpointint.com)

(MDaemon PRO v9.6.2)

with ESMTP id 27-md50000002347.msg

for <reservation@meetingpointegypt.com>; Fri, 17 Oct 2008 15:04:30 +0200

Authentication-Results: mail.meetingpointint.com

smtp.mail=hrg.destinationmanager@solresor.se; spf=neutral

Authentication-Results: mail.meetingpointint.com

header.from=hrg.destinationmanager@solresor.se; domainkeys=neutral (not signed); dkim=neutral (not signed)

X-MDSPF-Result: none (mail.meetingpointint.com)

Received-SPF: none (mail.meetingpointint.com: hrg.destinationmanager@solresor.se does not

designate permitted sender hosts)

Share this post


Link to post
Share on other sites

How to process such text files depends on what you need to exctract.

Either use the String* functions as Firefox suggested or - if you are familiar with - Regular Expressions.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

I tried but there are too many combinations. I found a dot.net library Lumisoft which can do this now trying to figure out how to use this dot.net lib with autoit

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

Here you go (input file: s.txt) :

#include <File.au3>
#include <Array.au3>

Local $aContent = 0, $vParam = 0, $iParam = 0

_FileReadToArray("s.txt", $aContent)

Local $aOutPut[$aContent[0]][2]

For $iLine = 1 To $aContent[0]
    If StringIsAlpha(StringLeft($aContent[$iLine], 4)) = 1 Then
        $vParam = StringSplit($aContent[$iLine], ": ", 3)
    Else
        $vParam = $aContent[$iLine]
    EndIf

    If IsArray($vParam) Then
        $aOutPut[$iParam][0] = $vParam[0]
        $aOutPut[$iParam][1] = $vParam[1]

        $iParam += 1
    Else
        $aOutPut[$iParam][1] = $vParam
    EndIf
Next

ReDim $aOutPut[$iParam][2]

_ArrayDisplay($aOutPut)

I have also tried with a regexp, if someone wants to fix mine :

;how to do: match until new line with alpha (include new lines beginning with non alpha e.g: space).
StringRegExp($s, "(?s)([\w-]+):((*.?)\r\n[:alpha:|\s{4}])", 3)

Br, FireFox.

Edited by FireFox

 

OS : Win XP SP2 (32 bits) / Win 7 SP1 (64 bits) / Win 8 (64 bits) | Autoit version: latest stable / beta.
Hardware : Intel(R) Core(TM) i5-2400 CPU @ 3.10Ghz / 8 GiB RAM DDR3.

My UDFs : Skype UDF | TrayIconEx UDF | GUI Panel UDF | Excel XML UDF | Is_Pressed_UDF

My Projects : YouTube Multi-downloader | FTP Easy-UP | Lock'n | WinKill | AVICapture | Skype TM | Tap Maker | ShellNew | Scriptner | Const Replacer | FT_Pocket | Chrome theme maker

My Examples : Capture toolIP Camera | Crosshair | Draw Captured Region | Picture Screensaver | Jscreenfix | Drivetemp | Picture viewer

My Snippets : Basic TCP | Systray_GetIconIndex | Intercept End task | Winpcap various | Advanced HotKeySet | Transparent Edit control

 

Share this post


Link to post
Share on other sites

@Firefox: you handle the easy part; if you notice the heading I mention attachments which can bi multipart, mime or uu/encoded or html

That is the part I am trying to get working

Share this post


Link to post
Share on other sites

I have many thousands of emails in raw text format, and I want to decode them, remove the attachments etc

I think you need to define what you mean by decode.

Extract Sender, Date, Subject, Body ... and ignore all the rest?

If you know what you need to extract it is easy to drop the rest.


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

Can you provide an example, what do you want to extract and what for ?

Br, FireFox.


 

OS : Win XP SP2 (32 bits) / Win 7 SP1 (64 bits) / Win 8 (64 bits) | Autoit version: latest stable / beta.
Hardware : Intel(R) Core(TM) i5-2400 CPU @ 3.10Ghz / 8 GiB RAM DDR3.

My UDFs : Skype UDF | TrayIconEx UDF | GUI Panel UDF | Excel XML UDF | Is_Pressed_UDF

My Projects : YouTube Multi-downloader | FTP Easy-UP | Lock'n | WinKill | AVICapture | Skype TM | Tap Maker | ShellNew | Scriptner | Const Replacer | FT_Pocket | Chrome theme maker

My Examples : Capture toolIP Camera | Crosshair | Draw Captured Region | Picture Screensaver | Jscreenfix | Drivetemp | Picture viewer

My Snippets : Basic TCP | Systray_GetIconIndex | Intercept End task | Winpcap various | Advanced HotKeySet | Transparent Edit control

 

Share this post


Link to post
Share on other sites

I am trying to write a tool to detect spam using statistics at mail server level or through some pop/imap interface.

Now, after "downoading" an email I have a text file and so far all is good. Now there are 2 cases; case where there are no attachments and email is plain text, and this is very easy to split When there are attachments though, this becomes more complicated. What I need to do? Several things...read and decode the header and that is easy and done, the second part is split the various attachments into files - just like in an email client. I will then create md5 hashes on these attachments so in future I can do post processing and look for same "attachments" even with different names.

Share this post


Link to post
Share on other sites

The "problem" is that you have flat text files. If you had e.g. Outlook you could access the mail items and easily extract the needed information.

Can you access those mails using a mail client?


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

I thought about it and you @water have great udfs (I owe you some beers btw). But first of all a client would be very slow..outlook is a super hog, and downloading emails eventually would fill in the PST file. Even if each mail processed would be deleted PST file still grows and need to be recompacted. Its a messy solution. I already made a simple imap client which I tested agains 3 mail servers, but now I need the all important parse & decode part

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0