TerenceAgius Posted April 2, 2013 Posted April 2, 2013 Is there anybody who has done this or knows how it can be done? I have many thousands of emails in raw text format, and I want to decode them, remove the attachments etc Any ideas?
FireFox Posted April 2, 2013 Posted April 2, 2013 (edited) Hi, There are many String* functions to process texts. Take a look at them in the help file. Br, FireFox. Edited April 2, 2013 by FireFox
water Posted April 2, 2013 Posted April 2, 2013 What do you mean by "raw text format"? A file stored by Outlook as type MSG? My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki
TerenceAgius Posted April 2, 2013 Author Posted April 2, 2013 @water No such as this Return-path: <hrg.destinationmanager@solresor.se> Received: from mail.solresor.se (mail.solresor.se [212.247.99.130]) by mail.meetingpointint.com (mail.meetingpointint.com) (MDaemon PRO v9.6.2) with ESMTP id 27-md50000002347.msg for <reservation@meetingpointegypt.com>; Fri, 17 Oct 2008 15:04:30 +0200 Authentication-Results: mail.meetingpointint.com smtp.mail=hrg.destinationmanager@solresor.se; spf=neutral Authentication-Results: mail.meetingpointint.com header.from=hrg.destinationmanager@solresor.se; domainkeys=neutral (not signed); dkim=neutral (not signed) X-MDSPF-Result: none (mail.meetingpointint.com) Received-SPF: none (mail.meetingpointint.com: hrg.destinationmanager@solresor.se does not designate permitted sender hosts)
water Posted April 2, 2013 Posted April 2, 2013 How to process such text files depends on what you need to exctract. Either use the String* functions as Firefox suggested or - if you are familiar with - Regular Expressions. My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki
TerenceAgius Posted April 2, 2013 Author Posted April 2, 2013 I tried but there are too many combinations. I found a dot.net library Lumisoft which can do this now trying to figure out how to use this dot.net lib with autoit
FireFox Posted April 2, 2013 Posted April 2, 2013 (edited) Here you go (input file: s.txt) : #include <File.au3> #include <Array.au3> Local $aContent = 0, $vParam = 0, $iParam = 0 _FileReadToArray("s.txt", $aContent) Local $aOutPut[$aContent[0]][2] For $iLine = 1 To $aContent[0] If StringIsAlpha(StringLeft($aContent[$iLine], 4)) = 1 Then $vParam = StringSplit($aContent[$iLine], ": ", 3) Else $vParam = $aContent[$iLine] EndIf If IsArray($vParam) Then $aOutPut[$iParam][0] = $vParam[0] $aOutPut[$iParam][1] = $vParam[1] $iParam += 1 Else $aOutPut[$iParam][1] = $vParam EndIf Next ReDim $aOutPut[$iParam][2] _ArrayDisplay($aOutPut) I have also tried with a regexp, if someone wants to fix mine : ;how to do: match until new line with alpha (include new lines beginning with non alpha e.g: space). StringRegExp($s, "(?s)([\w-]+):((*.?)\r\n[:alpha:|\s{4}])", 3) Br, FireFox. Edited April 2, 2013 by FireFox
TerenceAgius Posted April 2, 2013 Author Posted April 2, 2013 @Firefox: you handle the easy part; if you notice the heading I mention attachments which can bi multipart, mime or uu/encoded or html That is the part I am trying to get working
water Posted April 2, 2013 Posted April 2, 2013 I have many thousands of emails in raw text format, and I want to decode them, remove the attachments etcI think you need to define what you mean by decode.Extract Sender, Date, Subject, Body ... and ignore all the rest?If you know what you need to extract it is easy to drop the rest. My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki
FireFox Posted April 2, 2013 Posted April 2, 2013 Can you provide an example, what do you want to extract and what for ? Br, FireFox.
TerenceAgius Posted April 2, 2013 Author Posted April 2, 2013 I am trying to write a tool to detect spam using statistics at mail server level or through some pop/imap interface. Now, after "downoading" an email I have a text file and so far all is good. Now there are 2 cases; case where there are no attachments and email is plain text, and this is very easy to split When there are attachments though, this becomes more complicated. What I need to do? Several things...read and decode the header and that is easy and done, the second part is split the various attachments into files - just like in an email client. I will then create md5 hashes on these attachments so in future I can do post processing and look for same "attachments" even with different names.
water Posted April 2, 2013 Posted April 2, 2013 The "problem" is that you have flat text files. If you had e.g. Outlook you could access the mail items and easily extract the needed information. Can you access those mails using a mail client? My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki
TerenceAgius Posted April 2, 2013 Author Posted April 2, 2013 I thought about it and you @water have great udfs (I owe you some beers btw). But first of all a client would be very slow..outlook is a super hog, and downloading emails eventually would fill in the PST file. Even if each mail processed would be deleted PST file still grows and need to be recompacted. Its a messy solution. I already made a simple imap client which I tested agains 3 mail servers, but now I need the all important parse & decode part
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now