Sign in to follow this  
Followers 0
leuce

MUAcount, for counting MUAs in an MBOX file

2 posts in this topic

#1 ·  Posted (edited)

G'day everyone

I was interested to know which mail programs are most popular among the people who send me mail, particularly those who belong to traffic-intensive mailing lists such as Yahoogroups or Googlegroups. I use Thunderbird, which uses MBOX as its mail format, which is a plaintext format, and most mail clients identify themselves in the mail headers. Unfortunately they do so in a variety of ways... some use "X-Mailer", others use "User-Agent", and still others use other header types.

So, this script compares a list of mail clients with all lines in an MBOX file that contains a colon. If you can think of better, easier ways of counting mail clients, let me know.

Issues:

* I have 2 GB RAM on my computer, and the script refuses to open an MBOX file of 200 000 KB (saying it's too big).

* If there is a colon in the body of a mail, and the name of a mail client, the script will count it.

* The results for Elmo includes the results for Elm.

* I have no idea how Global works (hence all them Globals).

* You need a list of mail clients (see attached list (feel free to improve it)).

#Include <Timers.au3>

#cs
MUAcount, by Samuel Murray - A small program that counts the number of occurances of mail clients' names in an MBOX file in all lines that contain a colon.

Tweaks:  First check to see if mail client name occurs in the entire MBOX file before checking the MBOX file line by line;  TrayTip or ToolTip to tell the user about the progress.
#ce

Global $i
Global $j
Global $k
Global $l
Global $mboxfile
Global $mboxfileopen
Global $mboxfileread
Global $mboxfilesplit
Global $mboxfileread
Global $clientlistfile
Global $clientlistfileopen
Global $clientlistfileread
Global $clientlistfilesplit
Global $clientlistfileread
Global $writefileopen
Global $writefileread

$k = 0
$l = 0

; First, open the MBOX file and segment it by line

$mboxfile = FileOpenDialog ("Select the MBOX file", "", "All (*.*)")
; to specify more file types, do this: "All (*.*)|Text files (*.txt)"

$mboxfileopen = FileOpen ($mboxfile, 0)
; use 0 for ANSI, 32 for UTF16LE and 128 for UTF8

$mboxfileread = FileRead ($mboxfileopen)
$mboxfilesplit = StringSplit ($mboxfileread, @CRLF, 1)
; you can also use @CR and @LF for other line endings

MsgBox (0, "Number of lines in MBOX file", $mboxfilesplit[0], 0)
; TrayTip ("Number of lines in MBOX file", $mboxfilesplit[0] & " lines will be checked.", 20)
; I think the initial TrayTip may cause the script to malfunction if the user takes too long (not sure)

; Next, read the mail client list and segment it by line

$clientlistfile = FileOpenDialog ("Select the list of mail clients", "", "All (*.*)")
$clientlistfileopen = FileOpen ($clientlistfile, 0)
$clientlistfileread = FileRead ($clientlistfileopen)
$clientlistfilesplit = StringSplit ($clientlistfileread, @CRLF, 1)

; MsgBox (0, "Number of mail clients to check for", $clientlistfilesplit[0], 0)

$writefileopen = FileOpen ("countfile.txt", 1)
FileWrite ($writefileopen, @CRLF & "==" & @CRLF & @CRLF & "There are the number of times that a mail client's name occurs in a line in the MBOX file that also contains a colon.  Most lines with colons that also contain a mail client's name is a line from the header, which is probably a line that identifies the mail client of the sender.  The count is therefore only approximate, but generally close to the truth.  Another problem is the fact that the count for 'Elmo' will include the count for 'Elm', unfortunately.  By default, the search tries to be case-sensitive." & @CRLF & @CRLF)
FileClose ($writefileopen)

$starttime = _Timer_Init()

; Next, check the two arrays against each other, and if a match, write it to a file.

For $j = 1 to $clientlistfilesplit[0]
If StringInStr ($mboxfileread, $clientlistfilesplit[$j]) Then
For $i = 1 to $mboxfilesplit[0]
If StringInStr ($mboxfilesplit[$i], ":", 0) Then
$l = $l + 1
If StringInStr ($mboxfilesplit[$i], $clientlistfilesplit[$j], 1) Then
; 0 locale-default, 1 case-sensitive, 2 case-insensitive
$k = $k + 1
EndIf
EndIf
Next
$writefileopen = FileOpen ("countfile.txt", 1)
$writefilewrite = FileWrite ($writefileopen, $clientlistfilesplit[$j] & @TAB & $k & @CRLF)
Sleep ("100")
FileClose ($writefileopen)
; TrayTip ("Mail client found", $clientlistfilesplit[$j] & " found " & $k & " times", 1)
ToolTip ($clientlistfilesplit[$j] & " found " & $k & " times", 0, 0)
$k = 0
Else
; TrayTip ("Mail client not found", $clientlistfilesplit[$j] & " not found", 1)
ToolTip ($clientlistfilesplit[$j] & " not found", 0, 0)
$writefileopen = FileOpen ("countfile.txt", 1)
$writefilewrite = FileWrite ($writefileopen, $clientlistfilesplit[$j] & @TAB & "0" & @CRLF)
Sleep ("100")
FileClose ($writefileopen)
EndIf
; If you want the ToolTip to go away after 1 second, do this:
; Sleep ("1000")
; ToolTip ("")
Next

; MsgBox (0, "Report on MBOX file", $l & " lines checked, in " & _Timer_Diff($starttime) / 1000 & " seconds.", 0)
TrayTip ("Done!", $l & " lines checked, in " & _Timer_Diff($starttime) & " miliseconds.", 1)

If anyone knows of a freeware program that does this (or more), please let me know. :-)

Samuel

mailprogs.zip

Edited by leuce

Share this post


Link to post
Share on other sites



Someone from another mailing list said that this also works on some computers:

grep X-Mailer <mailboxfile(s)> | sort | uniq -c | sort -nr

One must just remember to repeat the action for User-Agent and for X-MimeOLE. And this line does not compare the MBOX files to a list of mail programs -- it merely gives a list of them (and every version of a program is counted as a separate client).

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0