Jump to content

Decoding raw email messages with attachments


Recommended Posts

What do you mean by "raw text format"? A file stored by Outlook as type MSG?

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

@water

No such as this

Return-path: <hrg.destinationmanager@solresor.se>

Received: from mail.solresor.se (mail.solresor.se [212.247.99.130])

by mail.meetingpointint.com (mail.meetingpointint.com)

(MDaemon PRO v9.6.2)

with ESMTP id 27-md50000002347.msg

for <reservation@meetingpointegypt.com>; Fri, 17 Oct 2008 15:04:30 +0200

Authentication-Results: mail.meetingpointint.com

smtp.mail=hrg.destinationmanager@solresor.se; spf=neutral

Authentication-Results: mail.meetingpointint.com

header.from=hrg.destinationmanager@solresor.se; domainkeys=neutral (not signed); dkim=neutral (not signed)

X-MDSPF-Result: none (mail.meetingpointint.com)

Received-SPF: none (mail.meetingpointint.com: hrg.destinationmanager@solresor.se does not

designate permitted sender hosts)

Link to comment
Share on other sites

How to process such text files depends on what you need to exctract.

Either use the String* functions as Firefox suggested or - if you are familiar with - Regular Expressions.

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

Here you go (input file: s.txt) :

#include <File.au3>
#include <Array.au3>

Local $aContent = 0, $vParam = 0, $iParam = 0

_FileReadToArray("s.txt", $aContent)

Local $aOutPut[$aContent[0]][2]

For $iLine = 1 To $aContent[0]
    If StringIsAlpha(StringLeft($aContent[$iLine], 4)) = 1 Then
        $vParam = StringSplit($aContent[$iLine], ": ", 3)
    Else
        $vParam = $aContent[$iLine]
    EndIf

    If IsArray($vParam) Then
        $aOutPut[$iParam][0] = $vParam[0]
        $aOutPut[$iParam][1] = $vParam[1]

        $iParam += 1
    Else
        $aOutPut[$iParam][1] = $vParam
    EndIf
Next

ReDim $aOutPut[$iParam][2]

_ArrayDisplay($aOutPut)

I have also tried with a regexp, if someone wants to fix mine :

;how to do: match until new line with alpha (include new lines beginning with non alpha e.g: space).
StringRegExp($s, "(?s)([\w-]+):((*.?)\r\n[:alpha:|\s{4}])", 3)

Br, FireFox.

Edited by FireFox
Link to comment
Share on other sites

I have many thousands of emails in raw text format, and I want to decode them, remove the attachments etc

I think you need to define what you mean by decode.

Extract Sender, Date, Subject, Body ... and ignore all the rest?

If you know what you need to extract it is easy to drop the rest.

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

I am trying to write a tool to detect spam using statistics at mail server level or through some pop/imap interface.

Now, after "downoading" an email I have a text file and so far all is good. Now there are 2 cases; case where there are no attachments and email is plain text, and this is very easy to split When there are attachments though, this becomes more complicated. What I need to do? Several things...read and decode the header and that is easy and done, the second part is split the various attachments into files - just like in an email client. I will then create md5 hashes on these attachments so in future I can do post processing and look for same "attachments" even with different names.

Link to comment
Share on other sites

The "problem" is that you have flat text files. If you had e.g. Outlook you could access the mail items and easily extract the needed information.

Can you access those mails using a mail client?

My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2022-02-19 - Version 1.6.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts
OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download
Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki
PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki
Task Scheduler (NEW 2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki

Standard UDFs:
Excel - Example Scripts - Wiki
Word - Wiki

Tutorials:
ADO - Wiki
WebDriver - Wiki

 

Link to comment
Share on other sites

I thought about it and you @water have great udfs (I owe you some beers btw). But first of all a client would be very slow..outlook is a super hog, and downloading emails eventually would fill in the PST file. Even if each mail processed would be deleted PST file still grows and need to be recompacted. Its a messy solution. I already made a simple imap client which I tested agains 3 mail servers, but now I need the all important parse & decode part

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...