Sign in to follow this  
Followers 0
Overlord

splitting, but not on the space...

20 posts in this topic

hi,

I get a list like this:

Andrew PenningtonBill PersonsChristine Anderson PalczewskiChristopher BrassDavid PasasoukDiane SchererEd Van Den Berg

and now i want to split it into this:

Andrew Pennington

Bill Persons

Christine Anderson Palczewski

Christopher Brass

David Pasasouk

Diane Scherer

Ed Van Den Berg

HOW?

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

$list = "Andrew PenningtonBill PersonsChristine Anderson PalczewskiChristopher BrassDavid PasasoukDiane SchererEd Van Den Berg"

$arrayOfNames = StringRegExp($list,"([A-Z][a-z]+ [A-Z][a-z]+)",3)

For $name In $arrayOfNames 
    MsgBox(0,"",$name)
Next

This seems to work for me.

:P

*Edit* I see now I got several of these wrong

Edited by Skizmata

AutoIt changed my life.

Share this post


Link to post
Share on other sites

Have a look at StringSplit. There is a good example in the help file.


Post your code because code says more then your words can. SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y. Use Opt("MustDeclareVars", 1)[topic="84960"]Brett F's Learning To Script with AutoIt V3[/topic][topic="21048"]Valuater's AutoIt 1-2-3, Class... is now in Session[/topic]Contribution: [topic="87994"]Get SVN Rev Number[/topic], [topic="93527"]Control Handle under mouse[/topic], [topic="91966"]A Presentation using AutoIt[/topic], [topic="112756"]Log ConsoleWrite output in Scite[/topic]

Share this post


Link to post
Share on other sites

What would the delimiter be for a stringsplit in his example?


AutoIt changed my life.

Share this post


Link to post
Share on other sites

@skizmata,

thx for your reply, this got it almost working completely.

However there seems to be a problem with people with longer surnames like Ed Van Der Berg or Christine Anderson Palczewski.

they respectively show as:

Ed Van

Der Berg

Christine Anderson

any idea how to solve this?

thx!

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

What would the delimiter be for a stringsplit in his example?

yes I may have miss read the first post and jumped into it, sorry I shall re think this :P

Edited by bo8ster

Post your code because code says more then your words can. SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y. Use Opt("MustDeclareVars", 1)[topic="84960"]Brett F's Learning To Script with AutoIt V3[/topic][topic="21048"]Valuater's AutoIt 1-2-3, Class... is now in Session[/topic]Contribution: [topic="87994"]Get SVN Rev Number[/topic], [topic="93527"]Control Handle under mouse[/topic], [topic="91966"]A Presentation using AutoIt[/topic], [topic="112756"]Log ConsoleWrite output in Scite[/topic]

Share this post


Link to post
Share on other sites

@bo8ster,

that's not going to work...

I need First name and Surname.

not First

name

surname

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

$list = "Andrew PenningtonBill PersonsChristine Anderson PalczewskiChristopher BrassDavid PasasoukDiane SchererEd Van Den Berg"

$arrayOfNames = StringRegExp($list,"([A-Z][a-z]+ [A-Z][a-z]+(?: [A-Z][a-z]+)?(?: [A-Z][a-z]+)?)",3)

For $name In $arrayOfNames 
    MsgBox(0,"",$name)
Next

I'm sure there is a prettier way to do this but the better exclusive look ahead method I would use in perl doesn't seem to be avalible in AutoIt at this time. This example is working for me with all the example names you could extend it to as accommodate longer names by adding more instances of the "(?: [A-Z][a-z]+)?" to the end of the expression.

*PS* This was a very fun puzzle for me heh.

Edited by Skizmata

AutoIt changed my life.

Share this post


Link to post
Share on other sites

@bo8ster,

that's not going to work...

I need First name and Surname.

not First

name

surname

Your right, i see no quick easy answer, i'll step out of this, looks like Skizmata is on to something. :P

Post your code because code says more then your words can. SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y. Use Opt("MustDeclareVars", 1)[topic="84960"]Brett F's Learning To Script with AutoIt V3[/topic][topic="21048"]Valuater's AutoIt 1-2-3, Class... is now in Session[/topic]Contribution: [topic="87994"]Get SVN Rev Number[/topic], [topic="93527"]Control Handle under mouse[/topic], [topic="91966"]A Presentation using AutoIt[/topic], [topic="112756"]Log ConsoleWrite output in Scite[/topic]

Share this post


Link to post
Share on other sites

@Skizmata,

I could kiss you!!! If you were a girl I would try!!! thank you so much!!!

I'm glad you had fun puzzeling...my hair got grey from it :-(

Share this post


Link to post
Share on other sites

Your right, i see no quick easy answer, i'll step out of this, looks like Skizmata is on to something. :P

thx anyway for trying to help... like I said in my previous post, my hair is starting to get grey from this LOL :unsure:

no kiss for you tho :D

Share this post


Link to post
Share on other sites

anyone with a better permanent working solution? I get quite some errors here :-(

Share this post


Link to post
Share on other sites

Could we see the errors?


AutoIt changed my life.

Share this post


Link to post
Share on other sites

#14 ·  Posted (edited)

C:\Documents and Settings\Evil_elf\Desktop\New Folder\auto-inviter II.au3 (112) : ==> Array variable has incorrect number of subscripts or subscript dimension range exceeded.:

$member = StringLeft($membersplit[2], StringInStr($membersplit[2], ',') - 1)

$member = StringLeft(^ ERROR

sometimes he's still not triggering the full name :-(

Sandi Mc should be Sandi McCaslin

Mi Trieu should be MiMi Trieu

Bla Kočar isn't recognized

that's the input: Enrico de CapoaGodwin LueGustavo BrandaoMiMi TrieuMichelle MeltzerPatrick ClouserSandi McCaslinScot A JohnsonScott FurlongScott HomanScott SnyderSean ItalianeShannon PanoraSharon M LoweShaun HarrisShlomo HasonShoebox JohnsonSkeet FormationStephanie ThorpeStephen QuattropaniThomas BauerTony JiangTorsten AslaksenVinnie PayneWil Chan

Edited by Overlord

Share this post


Link to post
Share on other sites

#15 ·  Posted (edited)

Thats dosn't appear to be part of the code I help you with and without seeing more of the code its hard to say but you are trying to access the third element of an array that dosnt have that many elemtents. Try somthing like this (without seeing more code its hard to say if this is the right approce but based on what you are showing...)

If UBound($membesplit) > 2 Then $member = StringLeft($membersplit[2], StringInStr($membersplit[2], ',') - 1)

I could likly give you more help if I had more code to work with.

*EDIT* You added a bit more detail after I responded. The characters in the name Blaž Kočar aren't in our regular expression so its no surprise it wont work. A name with a capital in the middle of it will entirely break the logic I was working with to make the regex I gave you. If we cant assume that a lowercase letter to the left of an upper case letter is a name boundary I'm not sure what logic we can use to generate the separation.

Edited by Skizmata

AutoIt changed my life.

Share this post


Link to post
Share on other sites

and sometime's he's even skipping them lol:

Input: Benjamin MostlyBla KocarEnrico de CapoaGodwin LueGustavo BrandaoMiMi TrieuMichelle MeltzerPatrick ClouserSandi McCaslinScot A JohnsonScott FurlongScott HomanScott SnyderSean ItalianeShannon PanoraSharon M LoweShaun HarrisShlomo HasonShoebox JohnsonSkeet FormationStephanie ThorpeStephen QuattropaniThomas BauerTony JiangTorsten AslaksenVinnie PayneWil ChanBenjamin MostlyBla KocarEnrico de CapoaGodwin LueGustavo BrandaoMiMi TrieuMichelle MeltzerPatrick ClouserSandi McCaslinScot A JohnsonScott FurlongScott HomanScott SnyderSean ItalianeShannon PanoraSharon M LoweShaun HarrisShlomo HasonShoebox JohnsonSkeet FormationStephanie ThorpeStephen QuattropaniThomas BauerTony JiangTorsten AslaksenVinnie PayneWil Chan

that's the order in which he processed:

Benjamin Mostly

Godwin Lue

Gustavo Brandao

where are Bla Kocar and Enrico de Capoa?

Share this post


Link to post
Share on other sites

next trial and error:

input: Bla KocarEnrico de CapoaGodwin LueScot A JohnsonScott FurlongScott HomanScott SnyderSean ItalianeShannon PanoraSharon M LoweShaun HarrisShlomo HasonShoebox JohnsonSkeet FormationStephanie ThorpeStephen QuattropaniThomas BauerTony JiangTorsten AslaksenVinnie PayneWil ChanBla KocarEnrico de CapoaGodwin LueScot A JohnsonScott FurlongScott HomanScott SnyderSean ItalianeShannon PanoraSharon M LoweShaun HarrisShlomo HasonShoebox JohnsonSkeet FormationStephanie ThorpeStephen QuattropaniThomas BauerTony JiangTorsten AslaksenVinnie PayneWil Chan

Godwin Lue

Scott Furlong

Scott Homan

Scott Snyder

Sean Italiane

Shannon Panora

Shaun Harris

Shlomo Hason

Shoebox Johnson

Skeet Formation

missing: ScotT A Johnson and Sharon M Lowe

here's the entire relevant code as it is now:

$oIE = _IECreate($sURL1)
    $sHTML = _IEBodyReadText($oIE)
    $output = StringSplit($sHTML, "join. ", 1)
    $uninvitedlist = StringLeft($output[2], StringInStr($output[2], "You haven't selected") - 1)
    ConsoleWrite($uninvitedlist & @LF)
    $arrayOfNames = StringRegExp($uninvitedlist, "([A-Z][a-z]+ [A-Z][a-z]+(?: [A-Z][a-z]+)?(?: [A-Z][a-z]+)?)", 3)
    For $name In $arrayOfNames
        ConsoleWrite($name & @LF)

Share this post


Link to post
Share on other sites

@ skizmata:

Bla Kočar : name found in friendslist, not processed

Deepika Manikchand : name NOT found in friendslist. here we get the error

Enrico de Capoa : name found in friendslist, not processed

Patrick Clouser : name NOT found in friendslist. here we get the error

Sandi McCaslin : name found in friendslist, not processed

Scott A Johnson : name NOT found in friendslist. here we get the error

Sharon M Lowe : name found in friendslist, not processed

I need to fix it that if we get the error that he continues to the next step. I can solve that normally since You have no friends named will be on those pages instead of You have one friend named.

so it seems logic that I'm getting the error there since the name can't be found...

I just need a update for names like Sharon M Lowe, Sandi McCaslin, Enrico de Capoa and maybe just maybe if you could do it also these Bla Kočar type of chars...

Thx in advance. I'm going to bed now, I'm knackered. I'll see tomorrow what you came up with :-)

Share this post


Link to post
Share on other sites

#19 ·  Posted (edited)

Wouldn't it be better to use StringRegExp or StingRegExpReplace to split the string whereever an uppercase character directly follows a lowercase character? I don't know SRE, but "([a-z][A-Z])" shows all the spots you'd want to stick in a "|" to make your string valid for StringSplit. I'd think that would be your best crappy solution, it would work with most name formats. Surnames McDonald or MacDonald are still a problem (as they are in all attempts at cracking your problem so far).

You've been dealt a rotten deck of cards. I don't believe your data, as you present it, can be processed with 100% accuracy.

My big question would be: Obviously these names were seperated at some point in the past... have you no way of getting the data at a point where delimiters could be inserted properly?

Edit: Reading one of Skizmata's last posts I guess he is using the upper-lower "Xx" sequence for his split. Although his SRE string confuses me (most of them do). I just don't see why there needed to be iterations in the string for each word in a name when it's so easy to identify the split points. Yes, there are lots of other "Mc" and "Mac" names than mcDonald and MacDonald. To restate my prior opinion, you're screwed.

Edit2: Where are you pulling this FriendsList from? There's no possibility of pulling them a name-at-a-time, rather than lumped into one stirng with no delimiters?

Edited by Spiff59

Share this post


Link to post
Share on other sites

You've been dealt a rotten deck of cards. I don't believe your data, as you present it, can be processed with 100% accuracy.

Edit: Reading one of Skizmata's last posts I guess he is using the upper-lower "Xx" sequence for his split. Although his SRE string confuses me (most of them do). I just don't see why there needed to be iterations in the string for each word in a name when it's so easy to identify the split points. Yes, there are lots of other "Mc" and "Mac" names than mcDonald and MacDonald. To restate my prior opinion, you're screwed.

Edit2: Where are you pulling this FriendsList from? There's no possibility of pulling them a name-at-a-time, rather than lumped into one stirng with no delimiters?

All good points and questions for Overlord to address. My answer was based on the logic that a name split happened at the "xX" boundary. If Overlord can explain the split logic needed I can write a regex for it but as Spiff pointed out your data set seems too erratic for good parsing. All your examples of names that don't work clearly don't fit the regex and shouldn't work.

In my opinion you will need to do one of the following...

1. Reexamine the source of this data whatever is giving you a concatenated pile of names knew what the names where before it gave them to you perhaps you could get it before the concatenation. If we had any delimiter at all this would be gravy.

2. Come up with the logic sufficient to code for the complexity of the names.

The first seems allot easier then the second to me.


AutoIt changed my life.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0