Sign in to follow this  
Followers 0
Omatsei

Difficulty in Theory

15 posts in this topic

I've posted a few other topics about this, and gotten some help about specific issues I was having, but I'm having some trouble in the "theory" area.

Basically, I have a text file that I have to parse, but the format of the text file changes from computer to computer. There are 2-3 main fields. The first field is the Name, and it can be in double-quotes if it's more than 1 word long, or if it has spaces or commas in it, but if it's just 1 word, it has no quotes around it. The second field is the Nickname, and it follows the same rules as the Name field, but it's optional. If it exists, I'd like to use it... if not, then just use the Name field in place of it. Here's where it gets complicated. The Name and Nickname fields exist for all contacts. The third field, e-mail address, also exists on all contacts, but there can be more than 1 e-mail address... and if there is more than 1, sometimes each address has a name associated to it. Each of those names is either in quotes itself, or in parenthesis's, or the e-mail address itself is in greater-than / less-than signs (which, from this point on, I'll refer to as "brackets" for simplicity). Here's an example of the file I'm trying to parse:

alias Mary mary@company.com

alias "John Smith" John john@company.com

alias "Mary's Family and Friends" "Smith, John" <john@company.com>,"Mary Smith" <mary@company.com>,"'bill@elsewhere.net'" <bill@elsewhere.net>, Steve <steve@company.com>

The difficulty I'm having really is visualizing what I'm trying to do. At the moment, I'm thinking that if I could step, one character at a time, through each line, I could extract each element and do whatever I want with them. However, I can't seem to figure out the best (or possible) way to do it. It should be as simple as telling it to take each character, and if it's a quote, take whatever is between those quotes, and assign it to the Name / Nickname variable, then move on... but getting that into a somewhat readable code is apparently rather difficult. It's made worse by the fact that no character is seemingly "safe" to use as a field separator, which led me to think that stepping through each line is the best way to approach it.

Any suggestions would be very helpful, as I've been thinking about it all weekend and haven't come up with anything really solid.

Share this post


Link to post
Share on other sites



alias Mary mary@company.com

alias "John Smith" John john@company.com

alias "Mary's Family and Friends" "Smith, John" <john@company.com>,"Mary Smith" <mary@company.com>,"'bill@elsewhere.net'" <bill@elsewhere.net>, Steve <steve@company.com>

I would make sure you have a complete list of possible circumstances. The line with "Mary's Family and Friends" "Smith, John" <john@company.com> is there an example where the nickname is only one word. Would it also then have quotes or no quotes? It seems like it would be just like the name field otherwise how would the data be separated but you should check to be sure.

What program creates the initial text file and what do you need to do with the output of the conversion?


Be open minded but not gullible.A hammer sees everything as a nail ... so don't be A tool ... be many tools.

Share this post


Link to post
Share on other sites

The original file is the Address Book from Eudora. I'm trying to parse that file so I can move the contacts into Outlook. I did accomplish it once, with a reasonable success rate, but it became a mess of code well over 500 lines long, that just consisted of IF...ELSE...ENDIF statements all over the place, embedded within each other, etc. Then, I found one circumstance that wouldn't work (the 3rd line above), and re-writing the existing code proved to be far more complicated than I thought it should be.

Quotes are only used if the field has more than 1 word. I have a huge test file with 400+ entries that (I believe) contains all the possible circumstances.

Share this post


Link to post
Share on other sites

It may be that regular expressions are the way to go. Search for 'StringRegExp' in the AutoIt help. There are alot of web examples of using reg expressions.

The following is an example reg exp for extracting email addresses:

[a-zA-Z][\w\.-]*[a-zA-Z0-9]@[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]

This will roughly extract the first name without the email address (with or without quotes):

alias ([a-zA-Z0-9]*?\s|\"[a-zA-Z0-9,\@\.'\s]*?\"\s)

A couple of questions:

1. Is the middle John in the following an error?

alias "John Smith" John john@company.com

2. Are the email addresses in the following always surrounded by '<>'?

alias "Mary's Family and Friends" "Smith, John" <john@company.com>,"Mary Smith" <mary@company.com>,"'bill@elsewhere.net'" <bill@elsewhere.net>, Steve <steve@company.com>

Should be enough to get you going.


“Give a man a script; you have helped him for today. Teach a man to script; and you will not have to hear him whine for help.”AutoIt4UE - Custom AutoIt toolbar and wordfile for UltraEdit/UEStudio users.AutoIt Graphical Debugger - A graphical debugger for AutoIt.SimMetrics COM Wrapper - Calculate string similarity.

Share this post


Link to post
Share on other sites

Stumpii, I tried the StringRegExp last week, and only got frustrated. A couple other helpful folks on here gave me some pointers on which ones to use and how to use them (I have cursory knowledge of regular expressions, but not enough for this kind of thing), but I couldn't get it to give me the right info.

For your questions, no, the middle John in that line was intentional to show how some fields need the quotes and others don't. In that case, John was the nickname, and "John Smith" was the name. To answer your second question, no, the e-mails aren't always in brackets. I think it's possible (although I haven't got a clue how to do it yet) to search through each line, grab each occurance of the @ symbol, then take any characters preceeding it (unless they're either a space, comma, bracket, quote, or double-quote), then take all the characters after it, and assign it to a variable... then repeat the process for each e-mail address in the line.

I'll try to play around with the lines you suggested to see if I can get anywhere with them. Thanks!

Share this post


Link to post
Share on other sites

I tried to search through the line for all the e-mail addresses, but keep getting nonsense back...

$test = StringRegExp( $nndbase[$i], "[a-zA-Z][\w\.-]*[a-zA-Z0-9]@[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]", 1)
msgbox(0, "", $test)

$nndbase[$i] is the line of text that it's currently on. It keeps giving me "11", then nothing after that. Also, I can't set the Offset to anything... if I set it, it tells me that I have an incorrect number of parameters that I'm passing to the function.

Share this post


Link to post
Share on other sites

I tried to search through the line for all the e-mail addresses, but keep getting nonsense back...

$test = StringRegExp( $nndbase[$i], "[a-zA-Z][\w\.-]*[a-zA-Z0-9]@[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]", 1)
msgbox(0, "", $test)

$nndbase[$i] is the line of text that it's currently on. It keeps giving me "11", then nothing after that. Also, I can't set the Offset to anything... if I set it, it tells me that I have an incorrect number of parameters that I'm passing to the function.

Try:

#include <Array.au3>
$Test = 'alias "Marys Family and Friends" "Smith, John" <john@company.com>,"Mary Smith" <mary@company.com>,"bill@elsewhere.net" <bill@elsewhere.net>, Steve <steve@company.com>'
$Pattern = '[a-zA-Z][\w\.-]*[a-zA-Z0-9]@[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]'
$avArray = StringRegExp($Test, $Pattern, 3)
_ArrayDisplay ( $avArray, "Emails")

This will give all emails. Note that I removed single quotes to allow me test with a string rather than file input. StringRegExp is returning an array where the first index is the array length. Your first try may have been an array returning 11 results.

What character is used in place of <> in the group email alias? Can you give more examples covering all the different variants?

I think your best bet is to use RegExp to return all group emails segments seperated by spaces (but not those in quotes - refer one of my first examples for returning names) and then run through the array determining which is an email and which is a name.


“Give a man a script; you have helped him for today. Teach a man to script; and you will not have to hear him whine for help.”AutoIt4UE - Custom AutoIt toolbar and wordfile for UltraEdit/UEStudio users.AutoIt Graphical Debugger - A graphical debugger for AutoIt.SimMetrics COM Wrapper - Calculate string similarity.

Share this post


Link to post
Share on other sites

I can't get any useful information at all out of any RegExp. Using the code you have there, with the newest beta of AutoIT, nothing ever pops up. If I replace the _ArrayDisplay with a "msgbox(0, '', $avArray)", I get "11" again, but $avArray isn't an array... if I try "msgbox(0, '', $avArray[0])", I get an error saying that it's not an array.

The problem is that if I had all the different variants, I could make a bunch of if...else...endif's that could correctly parse them all. It'd be really messy, but I could do it without having to worry too much about the regular expressions. However, every time I've tried listing all the possible permutations, inevitably, I find one or two that I missed and I have to revise my list. The longest list I ever got to was around 20-25 variants.

Share this post


Link to post
Share on other sites

I can't get any useful information at all out of any RegExp. Using the code you have there, with the newest beta of AutoIT, nothing ever pops up. If I replace the _ArrayDisplay with a "msgbox(0, '', $avArray)", I get "11" again, but $avArray isn't an array... if I try "msgbox(0, '', $avArray[0])", I get an error saying that it's not an array.

The problem is that if I had all the different variants, I could make a bunch of if...else...endif's that could correctly parse them all. It'd be really messy, but I could do it without having to worry too much about the regular expressions. However, every time I've tried listing all the possible permutations, inevitably, I find one or two that I missed and I have to revise my list. The longest list I ever got to was around 20-25 variants.

There is no reason it work for me and not you. Place the following in the script. It should come back 3.2.1.14 (the latest beta) then a list of 5 emails. Even if the latest beta installed, you can still run programs using the release version.

MsgBox(0, "AutoIt Version", @AutoItVersion)

“Give a man a script; you have helped him for today. Teach a man to script; and you will not have to hear him whine for help.”AutoIt4UE - Custom AutoIt toolbar and wordfile for UltraEdit/UEStudio users.AutoIt Graphical Debugger - A graphical debugger for AutoIt.SimMetrics COM Wrapper - Calculate string similarity.

Share this post


Link to post
Share on other sites

Oh, that was it. I had the beta version downloaded, but when I double-clicked something to edit it, it only opened the release version. Damn, I should have thought of that. Sorry about that...

In that case, presumably, it should be a matter of figuring out the right regular expressions to use to select each field. The regexp you gave me worked perfectly for the e-mail addresses, now I just have to find one to work for the Name and Nickname fields.

Share this post


Link to post
Share on other sites

#13 ·  Posted (edited)

Oh, that was it. I had the beta version downloaded, but when I double-clicked something to edit it, it only opened the release version. Damn, I should have thought of that. Sorry about that...

In that case, presumably, it should be a matter of figuring out the right regular expressions to use to select each field. The regexp you gave me worked perfectly for the e-mail addresses, now I just have to find one to work for the Name and Nickname fields.

My first post had a pattern for name and nickname. It should be possible to combine with the email address one using '|'.

Here is the RegEx test program that I use:

CODE
#Include <GUIListView.au3>

#include <GUIConstants.au3>

#Region ### START Koda GUI section ### Form=F:\Programming\AutoIt Scripts\RegEx Tester\RegExp_Tester.kxf

$MainForm = GUICreate("RegEx", 856, 631, 197, 121)

$MainForm_String = GUICtrlCreateEdit("", 8, 8, 385, 393)

GUICtrlSetData(-1, StringFormat("sfasf<asd>asdf\r\nsdf<saddffff>dfdf"))

$Label1 = GUICtrlCreateLabel("Regular &Expression:", 8, 576, 104, 17)

$MainForm_RegExp = GUICtrlCreateInput("<(.*)>", 8, 600, 681, 21)

$MainForm_Check = GUICtrlCreateButton("Check", 696, 600, 73, 25, $BS_DEFPUSHBUTTON)

$MainForm_Reset = GUICtrlCreateButton("Reset", 776, 600, 73, 25, 0)

$lblFlagSetting0Result = GUICtrlCreateLabel("lblFlagSetting0Result", 512, 32, 103, 17)

$lblFlagSetting0 = GUICtrlCreateLabel("0", 424, 32, 10, 17)

$lblFlagSetting1 = GUICtrlCreateLabel("2", 424, 200, 10, 17)

$lblFlagSetting3 = GUICtrlCreateLabel("3", 424, 328, 10, 17)

$lblFlagSetting0Error = GUICtrlCreateLabel("lblFlagSetting0Error", 680, 32, 168, 17)

$lblFlagSetting2Error = GUICtrlCreateLabel("lblFlagSetting2Error", 680, 200, 168, 17)

$lblFlagSetting3Error = GUICtrlCreateLabel("lblFlagSetting3Error", 680, 328, 170, 17)

$lblFlagSetting0Extended = GUICtrlCreateLabel("lblFlagSetting0Extended", 680, 48, 167, 17)

$lblFlagSetting2Extended = GUICtrlCreateLabel("lblFlagSetting2Extended", 680, 224, 167, 17)

$lblFlagSetting3Extended = GUICtrlCreateLabel("lblFlagSetting3Extended", 680, 352, 169, 17)

$lvwFlagSetting2 = GUICtrlCreateListView("No.|Value", 464, 200, 209, 121)

$lvwFlagSetting3 = GUICtrlCreateListView("No.|Value", 464, 328, 209, 121)

$lblFlagSetting1Extended = GUICtrlCreateLabel("lblFlagSetting1Extended", 680, 96, 167, 17)

$lblFlagSetting1Error = GUICtrlCreateLabel("lblFlagSetting1Error", 680, 72, 168, 17)

$lvwFlagSetting1 = GUICtrlCreateListView("No.|Value", 464, 72, 209, 121)

$Label4 = GUICtrlCreateLabel("1", 424, 72, 10, 17)

$Label5 = GUICtrlCreateLabel("Result", 544, 8, 34, 17)

$Label6 = GUICtrlCreateLabel("RegEx Flag", 400, 8, 59, 17)

$Label7 = GUICtrlCreateLabel("4", 424, 464, 10, 17)

GUICtrlSetState(-1, $GUI_DISABLE)

$lvwFlagSetting4 = GUICtrlCreateListView("No.|Value", 464, 464, 209, 121)

GUICtrlSetState(-1, $GUI_DISABLE)

$lblFlagSetting4Error = GUICtrlCreateLabel("lblFlagSetting4Error", 680, 464, 168, 17)

GUICtrlSetState(-1, $GUI_DISABLE)

$lblFlagSetting4Extended = GUICtrlCreateLabel("lblFlagSetting4Extended", 680, 488, 167, 17)

GUICtrlSetState(-1, $GUI_DISABLE)

$edStore = GUICtrlCreateEdit("", 8, 408, 385, 161)

GUICtrlSetData(-1, "Temporary Storage Area:")

GUISetState(@SW_SHOW)

#EndRegion ### END Koda GUI section ###

;~ _GUICtrlListViewJustifyColumn(-1, 0, 2)

_GUICtrlListViewSetColumnWidth (-1, 0, 33)

_GUICtrlListViewSetColumnWidth (-1, 1, 215)

GUISetState()

Dim $MatchedResult, $MatchedAmount, $String, $RegExp, $CheckType, $AboutDialog = 0, $About_ButtonOK

Dim $StatusDefault = GUICtrlRead($lblFlagSetting0Result)

While 1

$msg = GUIGetMsg()

Select

Case $msg = $GUI_EVENT_CLOSE

ExitLoop

Case $msg = $MainForm_Reset

GUICtrlSetData($MainForm_String, "")

GUICtrlSetData($MainForm_RegExp, "")

GUICtrlSetState($MainForm_String, $GUI_FOCUS)

Case $msg = $MainForm_Check

_GUICtrlListViewDeleteAllItems ($lvwFlagSetting1)

_GUICtrlListViewDeleteAllItems ($lvwFlagSetting2)

_GUICtrlListViewDeleteAllItems ($lvwFlagSetting3)

_GUICtrlListViewDeleteAllItems ($lvwFlagSetting4)

$String = GUICtrlRead($MainForm_String)

$RegExp = GUICtrlRead($MainForm_RegExp)

Select

Case $String = ""

GUICtrlSetState($MainForm_String, $GUI_FOCUS)

Case $RegExp = ""

GUICtrlSetState($MainForm_RegExp, $GUI_FOCUS)

Case Else

; Test with flag 0 - Returns 1 (matched) or 0 (no match)

$MatchedResult = StringRegExp($String, $RegExp, 0)

$Error = @error

$Extended = @extended

Select

Case $MatchedResult = 1

GUICtrlSetData($lblFlagSetting0Result, "Result: Match found")

Case

GUICtrlSetData($lblFlagSetting0Result, "Result: Match not found")

EndSelect

Select

Case $Error = 2

GUICtrlSetData($lblFlagSetting0Error, "@Error: Bad Pattern.")

GUICtrlSetData($lblFlagSetting0Extended, "@Extended: Error offset: " & $Extended)

Case Else

GUICtrlSetData($lblFlagSetting0Error, "@Error: Executed properly.")

GUICtrlSetData($lblFlagSetting0Extended, "@Extended: N/A.")

EndSelect

; Test with flag 1 - Return array of matches.

$MatchedResult = StringRegExp($String, $RegExp, 1)

$Error = @error

$Extended = @extended

$MatchedAmount = UBound($MatchedResult)

For $i = 0 To $MatchedAmount - 1

GUICtrlCreateListViewItem(($i + 1) & "|" & $MatchedResult[$i], $lvwFlagSetting1)

Next

Select

Case $Error = 0

GUICtrlSetData($lblFlagSetting1Error, "@Error: Executed properly.")

GUICtrlSetData($lblFlagSetting1Extended, "@Extended: Next offset: " & $Extended)

Case $Error = 1

GUICtrlSetData($lblFlagSetting1Error, "@Error: Array invalid.")

GUICtrlSetData($lblFlagSetting1Extended, "@Extended: N/A.")

Case $Error = 2

GUICtrlSetData($lblFlagSetting1Error, "@Error: Pattern invalid.")

GUICtrlSetData($lblFlagSetting1Extended, "@Extended: Error offset: " & $Extended)

EndSelect

; Test with flag 2 - Return array of matches including the full match (Perl / PHP style).

$MatchedResult = StringRegExp($String, $RegExp, 1)

$Error = @error

$Extended = @extended

$MatchedAmount = UBound($MatchedResult)

For $i = 0 To $MatchedAmount - 1

GUICtrlCreateListViewItem(($i + 1) & "|" & $MatchedResult[$i], $lvwFlagSetting2)

Next

Select

Case $Error = 0

GUICtrlSetData($lblFlagSetting2Error, "@Error: Executed properly.")

GUICtrlSetData($lblFlagSetting2Extended, "@Extended: Next offset: " & $Extended)

Case $Error = 1

GUICtrlSetData($lblFlagSetting2Error, "@Error: Array invalid.")

GUICtrlSetData($lblFlagSetting2Extended, "@Extended: N/A.")

Case $Error = 2

GUICtrlSetData($lblFlagSetting2Error, "@Error: Pattern invalid.")

GUICtrlSetData($lblFlagSetting2Extended, "@Extended: Error offset: " & $Extended)

EndSelect

; Test with flag 3 - Return array of global matches.

$MatchedResult = StringRegExp($String, $RegExp, 3)

$Error = @error

$Extended = @extended

$MatchedAmount = UBound($MatchedResult)

For $i = 0 To $MatchedAmount - 1

GUICtrlCreateListViewItem(($i + 1) & "|" & $MatchedResult[$i], $lvwFlagSetting3)

Next

Select

Case $Error = 0

GUICtrlSetData($lblFlagSetting3Error, "@Error: Executed properly.")

GUICtrlSetData($lblFlagSetting3Extended, "@Extended: N/A")

Case $Error = 1

GUICtrlSetData($lblFlagSetting3Error, "@Error: Array invalid.")

GUICtrlSetData($lblFlagSetting3Extended, "@Extended: N/A.")

Case $Error = 2

GUICtrlSetData($lblFlagSetting3Error, "@Error: Pattern invalid.")

GUICtrlSetData($lblFlagSetting3Extended, "@Extended: Error offset: " & $Extended)

EndSelect

; Test with flag 4 - Return an array of arrays containing global matches including the full match (Perl / PHP style).

$MatchedResult = StringRegExp($String, $RegExp, 4)

$Error = @error

$Extended = @extended

$MatchedAmount = UBound($MatchedResult)

For $i = 0 To $MatchedAmount - 1

GUICtrlCreateListViewItem(($i + 1) & "|" & $MatchedResult[$i], $lvwFlagSetting4)

Next

Select

Case $Error = 0

GUICtrlSetData($lblFlagSetting4Error, "@Error: Executed properly.")

GUICtrlSetData($lblFlagSetting4Extended, "@Extended: N/A")

Case $Error = 1

GUICtrlSetData($lblFlagSetting4Error, "@Error: Array invalid.")

GUICtrlSetData($lblFlagSetting4Extended, "@Extended: N/A.")

Case $Error = 2

GUICtrlSetData($lblFlagSetting4Error, "@Error: Pattern invalid.")

GUICtrlSetData($lblFlagSetting4Extended, "@Extended: Error offset: " & $Extended)

EndSelect

EndSelect

EndSelect

WEnd

Edited by Stumpii

“Give a man a script; you have helped him for today. Teach a man to script; and you will not have to hear him whine for help.”AutoIt4UE - Custom AutoIt toolbar and wordfile for UltraEdit/UEStudio users.AutoIt Graphical Debugger - A graphical debugger for AutoIt.SimMetrics COM Wrapper - Calculate string similarity.

Share this post


Link to post
Share on other sites

#14 ·  Posted (edited)

I tried the pattern from your post, but had some problems with it. I figured it was time for me to learn a bit more about regular expressions, so I gave it a go and figured one out that works for what I need. For anyone else that needs it, here's what I ended up with: "alias (""[^""\r\n]*""|.*?\s)" That matches any string within quotes, or (if no quotes are found right after the word "alias") takes the next word.

The next problem I found, which I'm not even sure is worth trying to fix, revolves around finding the 2nd field, Nickname. I have a line from my code that finds that field correctly, but if the Name field has any parenthesis in it, it fails. I know it fails because StringRegExp starts the search with the first parenthesis it finds, which is why I'm not sure if it's worth fixing it... but in case anyone knows of a really simple (read: 5 lines or less) way of fixing it, I'll just let it go.

The line is:

$nicknamearray = StringRegExp( $nndbase[$i], "alias " & $namearray[0] & " (""[^""\r\n]*""|.*\s)", 3)
Edited by Omatsei

Share this post


Link to post
Share on other sites

I think I can get something workable but I have to go for today.

Also, I found a link on the Eudora mail file format. The format may be old and I don't know what version of Eudora is even out. http://alf.uib.no/eudora/kens/nickname.html

It shows the format as this: alias Nick_Name Real Name <email@address> and mentions that the Real Name is optional.

Unless this is another version or I missed a correction let us know if this is what you are dealing with. The link also mentions the format for multiple recipients or mailing lists.

I tried messing with String Expressions but it is a little confusing...your program helps immensely with tinkering with it to understand it though. It seems like it may be difficult to use one single string expression to pull all the data out of one line but maybe very useful for pulling out data once you know what is there...I don't know...like I said it is still confusing.

Fun to work on though.


Be open minded but not gullible.A hammer sees everything as a nail ... so don't be A tool ... be many tools.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0