Jump to content

Grabbing e-mail attachments from text files [solved]


Recommended Posts

Hello.

Need help from man who is experienced in Regular Expressions.

I have the following sample files:

z1_full.txt

Received: from smtp11.yandex.ru (smtp11.yandex.ru [213.180.223.93])
    by mxback21.yandex.ru (Postfix) with ESMTP id C8ADB1D800D
    for <delta2-greit[well-known-symbol]yandex.ru>; Sun, 26 Oct 2008 15:26:50 +0300 (MSK)
Received: from 58-079.dialup.primorye.ru ([81.2.58.79]:51209 "EHLO
        a62fc6637610461" smtp-auth: "delta-greit" TLS-CIPHER: <none>
        TLS-PEER-CN1: <none>) by mail.yandex.ru with ESMTP id S5095596AbYJZM0s
        (ORCPT <rfc822;delta2-greit[well-known-symbol]yandex.ru>);
        Sun, 26 Oct 2008 15:26:48 +0300
X-Yandex-Spam: 1 
X-Yandex-Front: smtp11
X-Yandex-TimeMark: 1225024008
X-BornDate: 1154552400
X-Yandex-Karma: 0
X-Yandex-KarmaStatus: 0
X-MsgDayCount: 8
X-Comment: RFC 2476 MSA function at smtp11.yandex.ru logged sender identity as: delta-greit
Date:   Sun, 26 Oct 2008 22:26:42 +1000
From:   delta-greit[well-known-symbol]yandex.ru
To:  delta2-greit[well-known-symbol]yandex.ru
X-Mailer: Blat v2.6.2 w/GSS encryption, a Win32 SMTP/NNTP mailer http://www.blat.net
Message-ID: <01c93766$Blat.v2.6.2$1aa2e126$f28e16946f4[well-known-symbol]yandex.ru>
Subject: =?Windows-1251?B?8e/g8ejh7iDn4CDv7uzu+fw=?=
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="=_BlatBoundary-9eALrKNKBRQkeS69PRwPa"

This is a multi-part message in MIME format.

--=_BlatBoundary-9eALrKNKBRQkeS69PRwPa
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset=Windows-1251

=CF=F0=E8=E2=E5=F2
=EA=E0=EA =E4=E5=EB=E0=3F2

--=_BlatBoundary-9eALrKNKBRQkeS69PRwPa
Content-Type: application/octet-stream;
 name="2_Slavinka4_per_7.zip"
Content-Disposition: ATTACHMENT;
 filename=2_Slavinka4_per_7.zip
Content-Transfer-Encoding: BASE64

UEsDBBQAAAAIABE8fDeSyHbWZw0AAKIdAAAJAH0ARGVsdGEuZGFsU0RoAKwAAAAACAAIFZzb
Y2RgaRFhYGAwYIAAHyBmZAUzWUWBxD4WO6MvAbtK2J9aa39hxC3HyMTAwMSQwMACkhWQYPjP
KM8AEgOpVQASCiC2gAhEnBEiLgRVu5JBCEWtIlTtfkZhuFpuIAEAVVQNAAfxjExHfwZNR0mj
jgbQ/27iSma3IPxbs+les4Lwb8Hk5GQE4f8KjZ8P0P+KWC4eQP9ISklLicmISgqJCHHOTDd3
Z48vuLPUYG0wyQdliO4WMF6CJvoy9QGktp+J1u49aEajVFnoU2tE7tMZPtwDmsh9fvUrE7To
z/n0aJZfGXFDYGvDjzMTyT8KEthrwaXJzmhHr1eHxhF+0xdvJb5w3X+Q6z5Lx75w3Reu+//I
fy/X/QtQSwECFwsUAAAACAARPHw3ksh21mcNAACiHQAACQARAAAAAAAAACAAtoEAAAAARGVs
dGEuZGFsU0QEAKwAAABVVAUAB/GMTEdQSwECFwsUAAAACAARPHw3UBg3pOkPAABgJwAADAAR
AAAAAAAAACAAtoELDgAAQWN0aW9uMTIuZGFsU0QEAKwAAABVVAUAB/GMTEdQSwUGAAAAAAIA
AgCTAAAAmx4AAAAA

--=_BlatBoundary-9eALrKNKBRQkeS69PRwPa--
.

z2_full.txt

Received: from smtp11.yandex.ru (smtp11.yandex.ru [213.180.223.93])
    by mxback24.yandex.ru (Postfix) with ESMTP id 5F0D82D6944
    for <delta2-greit[well-known-symbol]yandex.ru>; Sat, 25 Oct 2008 19:45:32 +0400 (MSD)
Received: from 59-096.dialup.primorye.ru ([81.2.59.96]:64009 "EHLO 81.2.59.96"
        smtp-auth: "delta-greit" TLS-CIPHER: <none> TLS-PEER-CN1: <none>)
        by mail.yandex.ru with ESMTP id S5095600AbYJYPpb (ORCPT
        <rfc822;delta2-greit[well-known-symbol]yandex.ru>); Sat, 25 Oct 2008 19:45:31 +0400
X-Yandex-Spam: 1 
X-Yandex-Front: smtp11
X-Yandex-TimeMark: 1224949531
X-BornDate: 1154552400
X-Yandex-Karma: 0
X-Yandex-KarmaStatus: 0
X-MsgDayCount: 3
X-Comment: RFC 2476 MSA function at smtp11.yandex.ru logged sender identity as: delta-greit
Date:   Sun, 26 Oct 2008 02:45:25 +1100
From:   =?windows-1251?B?zODq8ejsINHi6PDo5O7i?= <delta-greit[well-known-symbol]yandex.ru>
X-Mailer: The Bat! (v3.99.29) Professional
X-Priority: 3 (Normal)
Message-ID: <204833072.20081026024525[well-known-symbol]yandex.ru>
To:  delta2-greit[well-known-symbol]yandex.ru
Subject: efef
Resent-from: =?windows-1251?Q?=CC=E0=EA=F1=E8=EC_=D1=E2=E8=F0=E8=E4=EE=E2?= 
             <delta-greit[well-known-symbol]yandex.ru>
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="----------1239160213ED6B0"
Resent-Date: Sat, 25 Oct 2008 19:45:31 +0400
Resent-Message-Id: <20081025154532.5F0D82D6944[well-known-symbol]mxback24.yandex.ru>

------------1239160213ED6B0
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable

=C7=E4=F0=E0=E2=F1=F2=E2=F3=E9=F2=E5, delta2-greit.



--=20
=D1 =F3=E2=E0=E6=E5=ED=E8=E5=EC,
 =CC=E0=EA=F1=E8=EC                       mailto:delta-greit[well-known-symbol]yandex.ru
------------1239160213ED6B0
Content-Type: APPLICATION/OCTET-STREAM;
 name="1.rar"
Content-transfer-encoding: base64
Content-Disposition: attachment;
 filename="1.rar"

UmFyIRoHAM+QcwAADQAAAAAAAAATMHQgkCoApwQAAE0OAAACRkptkht4WTkdMwUAIAAAADEu
aHRtALBIhyYNgVEMzM/NQVvZ0C/h9pfhE3IkUnK6BU0XA4mna7ZG4KmL4sQUajupEtlEkI6u
5xNHZq1Rc0xRydBV6SV2joMzKYunN8D/7WIGl7JHm7ouAOPsENJgA0gX56UE6WP/UTExy7AT
nWIoB/PNmgSMGcVxQnpMXRuIggf8Uwbk/WmrODhTqn+H7/cSdtO0Gee+n/TiXdI1q4v6UC7k
xHX4B2DVmBQP2XBv3g9NNvqMQSNVKi72RKvC0YDhUXvBZ5gPDAlUZaY1nczv+ne/OrV4BT++
NyZAvAhqnTt2vZas9AuLmFz+7o5efmONhlbM77f+D1wQXq9iPswVnqiAsGdQL/g4BmofbL9o
IJm1ifHz+3ygXJ5PyZBVGG/EuJLKtnk7e8WDRRE+bEWcBbti40+3ilomV8d91vSXuNez3wNl
ZDQ1CsDE4m+8xtqLE9I7r2JXuAUiDYcD67PfobeHLdIyj57kX/pAhkah4Dpq/5zx1rFzBw//
78+Xb/UnSVm7/xDEPXsAQAcA
------------1239160213ED6B0--

.

z3_full.txt

Received: from smtp11.yandex.ru (smtp11.yandex.ru [213.180.223.93])
    by mxback1.yandex.ru (Postfix) with ESMTP id D3C4063EB5
    for <delta2-greit[well-known-symbol]yandex.ru>; Mon, 27 Oct 2008 01:37:59 +0300 (MSK)
Received: from 63-221.dialup.primorye.ru ([81.2.63.221]:54801 "EHLO
        81.2.63.221" smtp-auth: "delta-greit" TLS-CIPHER: <none> TLS-PEER-CN1:
        <none>) by mail.yandex.ru with ESMTP id S5095603AbYJZWhs (ORCPT
        <rfc822;delta2-greit[well-known-symbol]yandex.ru>); Mon, 27 Oct 2008 01:37:48 +0300
X-Yandex-Spam: 1 
X-Yandex-Front: smtp11
X-Yandex-TimeMark: 1225060668
X-BornDate: 1154552400
X-Yandex-Karma: 0
X-Yandex-KarmaStatus: 0
X-MsgDayCount: 1
X-Comment: RFC 2476 MSA function at smtp11.yandex.ru logged sender identity as: delta-greit
Date:   Mon, 27 Oct 2008 08:37:41 +1000
From:   =?windows-1251?B?zODq8ejsINHi6PDo5O7i?= <delta-greit[well-known-symbol]yandex.ru>
X-Mailer: The Bat! (v3.99.29) Professional
Reply-To: =?windows-1251?B?zODq8ejsINHi6PDo5O7i?= <delta-greit[well-known-symbol]yandex.ru>
X-Priority: 3 (Normal)
Message-ID: <239755940.20081027083741[well-known-symbol]yandex.ru>
To:  delta2-greit[well-known-symbol]yandex.ru
Subject: Test
MIME-Version: 1.0
Content-Type: multipart/mixed;
 boundary="----------CB510B17E2B4D9"

------------CB510B17E2B4D9
Content-Type: text/plain; charset=windows-1251
Content-Transfer-Encoding: quoted-printable

=C7=E4=F0=E0=E2=F1=F2=E2=F3=E9=F2=E5, delta2-greit.



--=20
=D1 =F3=E2=E0=E6=E5=ED=E8=E5=EC,
 =CC=E0=EA=F1=E8=EC                       mailto:delta-greit[well-known-symbol]yandex.ru
------------CB510B17E2B4D9
Content-Type: image/x-icon;
 name="exit64.ico"
Content-transfer-encoding: base64
Content-Disposition: attachment;
 filename="exit64.ico"

AAABAAEAQEAAAAAAAAAoFgAAFgAAACgAAABAAAAAgAAAAAEACAAAAAAAABIAAAAAAAAAAAAA
LCzQAH9/6AA3N80AZma1AE1NzQDExPIABAaxAB8r2wAzO+wAEBCRAJiY4ACRkcwAKSm+AE9P
4wCfn+8AERGwACEh1AAjL+QAe3vMAFVVqwAAAL0AoKDdAHJy6wBzc9oAR0e4ADFD6AAKC/QA
jZT1ADlD7QAREfkAOTnjAHBwvQBOTr4AurrZAAAAzgBCQ+UAaWm/AKio2QAeHtgAkJDFADQ0
lwAnJ+IAenq9ADIz1AA7O9sASkvoAGdn0gBkZMcAR0fOAAsP3ABuc/EAHx+6AC8/7gDJyeYA
AA/gAAAAAAAAD+AAAAAAAAAP4AAAAAAAAA/gAAAAAAAAD+AAAAAAAAAP4AAAAAAAAA/gAAAA
AAAAD+AAAAAAAAAP4AAAAAAAAA/gAAAAAAAAD+AAAAAAAAAP4AAAAAAAAA/gAAAAAAAAD+AA
AAAAAAAP4AAAAAAAAA/wAAAAAAAAH/AAAAAAAAAf+AAAAAAAAD/8AAAAAAAAf/4AAAAAAAD/
/8AAAAAAB///////////////////////
------------CB510B17E2B4D9
Content-Type: image/x-icon;
 name="help_index64.ico"
Content-transfer-encoding: base64
Content-Disposition: attachment;
 filename="help_index64.ico"

AAABAAEAQEAAAAAAAAAoFgAAFgAAACgAAABAAAAAgAAAAAEACAAAAAAAABIAAAAAAAAAAAAA
AAAAAAAAAAAAAAAA////ADw8/gCpkoYActr/AAAAkQBRQkIAJGqhAPnXnQCAgOAAAADtABaY
5wDDw/4AhGVlACYmuADKvLwAW1GZAAAAUgBVq9gAKB40AGqQrwAzKIEA/vnGAFZWzwCZuNIA
v/X/AJ6e/wBiYv4AMYrCAOHa2gAcHP8A77V/AAAAvQCmpK8AoNzzAMyqkgA7O9oA4eH+AHRm
AAB//8AAAAAAAP//4AAAAAAA///gAAAAAAH///AAAAAAA///+AAAAAAD///4AAAAAAf///wA
AAAAD////gAAAAAf////AAAAAD////+AAAAAf////+AAAAD/////8AAAA//////8AAAP////
//+AAH////////AD////////////////
------------CB510B17E2B4D9--

.

The task is to create an Array where to save:

1. In the zero line of Array - sender's email.

Insensitive search for From: substring in the start of the string, with the space after colon.

Note that the string can look like this

From:   =?windows-1251?B?zODq8ejsINHi6PDo5O7i?= <delta-greit[well-known-symbol]yandex.ru>

and this

From:   delta-greit[well-known-symbol]yandex.ru

2. In the first line of an Array - attachment's name

Insensitive search for filename= substring with leading space before filename

Note that the name of the attachment maybe enclosed in double-quotes

filename="1.rar"

and maybe not

filename=2_Slavinka4_per_7.zip

3. In the second line of Array - attachment's body

Body always starts with the first empty line after substring filename= is found

Body ends with string that have at least 2 symbols -- in the start:

--=_BlatBoundary-9eALrKNKBRQkeS69PRwPa--

or

------------1239160213ED6B0--

4. Search for the next filename= substring. If it is found, save in the next 2 lines of an Array new attachment's name and body. Do this until the end of file.

English is not my native language, so, if I've explained something bad, please ask me to express in different way.

Thanks in advance

Sincerely yours, Tipulatoid

Edited by Tipulatoid
Link to comment
Share on other sites

Get my SRE tester from my sig...

From:

(?i)From:\s?\s?\s?(.*)

Filename:

(?i)Filename=\s?\s?\s?(.*)

I'm not sure the rest of the text can be simply grabbed by a SRE. You'll have to find another way to actually grab the body of text=)

Hope this helps ya,

Szhlopp

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...