Sign in to follow this  
Followers 0
MvGulik

[Solved] StringRegExp, weird case.

66 posts in this topic

#1 ·  Posted (edited)

whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Share this post


Link to post
Share on other sites

I'm afraid I'm having trouble following what's going on here. What exactly is '[0:\0][1:\1][2:\2][3:\3][4:\4][5:\5][6:\6][7:\7][8:\8][9:\9]'?

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

This produces the example output you gave. However I don't really understand what the you are trying to achieve in general so it may not work in with more generic examples or where you have more than 9 groups on a line. More examples of input and expected output would be useful.

Func MAIN()
    Local $sPattern = '([^,]*),([^,]*),?([^,]*),?([^,]*),?([^,]*),?([^,]*),?([^,]*),?([^,]*),?([^,]*)'
    Local $sSource = 'word1,word2,word3'
    ConsoleWrite('1: = ' & RE_Debug($sSource, $sPattern) & @CRLF)
    ;; output: "[x:...]" = captured part, at level x.
    ;; 1: = [0:word1][0:],word2,word3
EndFunc

Func RE_Debug($sSource, $sPattern)

        $aSource = StringRegExp($sSource, $sPattern,3)
        $sSource = StringRegExpReplace($sSource, $sPattern, '[1:\1][0:],[2:\2][0:],[3:\3][0:],[4:\4][0:],[5:\5][0:],[6:\6][0:],[7:\7][0:],[8:\8][0:],[9:\9][0:]')
    If @error Then Return SetError(@error, @extended, '')
    $sSource = StringRegExpReplace($sSource, ',\[[1-9]:\]\[[0]:\]', '')
    If @error Then Return SetError(@error, @extended, '')
    Return $sSource
EndFunc
MAIN()
Edited by Bowmore

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Share this post


Link to post
Share on other sites

Does changing you match pattern to ''[^,]+'' help?


"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

What's this line for? It seems to be replacing the text with the text found, basically doing nothing.

$sSource = StringRegExpReplace($sSource, '\[[1-9]:\]', '\1')

As for why you aren't getting a 0 width match, it's probably because AutoIt is removing it. Very rarely is a 0 width match useful.

Edited by Richard Robertson

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

I fully agree with MvGulik that the [^,]* pattern definitely sounds like if it should match all three word* instance, and it does with another PCRE engine.

For instance:

C:\Documents and Settings\Jean-Christophe>pcretest-7.8
PCRE version 7.8 2008-09-05

 re> /[^,]*/g
data> word1,word2,word3
 0: word1
 0:
 0: word2
 0:
 0: word3
 0:
data>
 re>

The pattern being that simple, is it possible it's due to a PCRE build option combination used for making this module in the AutoIt build?

Still thinking, it escapes me!

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

The closest I can get to what you want (at least as I understand it!) is this way:

Local $sPattern = '([^,]*)(?:,?)'
    Local $sSource = 'word1,word2,,word3,,'
    _ArrayDisplay(StringRegExp($sSource, $sPattern, 3))
    ConsoleWrite(StringRegExpReplace($sSource, $sPattern, '[0:$1]') & @CRLF)

Does that fit your bill?

Edit: sorry, no it doesn't really fit, as it will create a dummy empty capture at the end if the source doesn't have a trailing comma.

You can still easily circumvent this by appending a comma to the source anyway, and remove or ignore that last empty capture.

That's always a problem when you have intermediary separators, e.g. CSV That you create it or parse it, you always have to treat the first (resp. last) differently, due to no leading (resp. trailing) seperator. Regexps don't bent into this N+1 vs. N pattern naturally.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

....

I have this string: "word1,word2,,word3",

and like to get the following output: "[0:word1],[0:word2],[0:],[0:word3]"

....

This is getting off topic, but if we are playing magic parlor tricks, then this works.

Local $sSource = 'word1,word2,,word3'
Local $sPattern = '(?:[^,]+)|(?:,{2})'

ConsoleWrite(StringReplace(StringRegExpReplace($sSource, $sPattern, '[0:$0]'), "[0:,,]", ",[0:],") & @CRLF)

Share this post


Link to post
Share on other sites

#14 ·  Posted (edited)

whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Share this post


Link to post
Share on other sites

Just to be clear, you're trying to match any text between commas, yes? And then display each one using a "[0:match],[0:next match]..." style?

Why not get each match separately and then concatenate the results? Wouldn't that be a much cleaner and easier to understand solution?

Share this post


Link to post
Share on other sites

#16 ·  Posted (edited)

whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Share this post


Link to post
Share on other sites

#17 ·  Posted (edited)

whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Share this post


Link to post
Share on other sites

And then display each one using a "[0:match],[0:next match]..." style?

Hum, No. Thats just the RE_Debug() support part.

Why not get each match separately and then concatenate the results?

I don't think I'm following you here. But it sound slower than what I already have. #16

I didn't realize that you were using it to preview data. I thought that you were trying to format it that way from the beginning. My bad.

Share this post


Link to post
Share on other sites

#19 ·  Posted (edited)

whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Share this post


Link to post
Share on other sites

If you can get along with ignoring empty fields altogether, then this is simpler of course.

In most of the use I have of this kind of stuff I can't do that since it would shift the data in the wrong database columns.

For example if the incomming data is:

123,"text field ""one""", ,"",2009/12/17,,"another text"

then I need to get exactly (I prefer to deal with all in text format):

'123'

'text field "one"'

'<null>'

''

'2009/12/17'

'<null>'

'another text'

I've yet to find a reasonable way to do it in one shot (and it would break anyway in case some dummy is creative enough to insert comments :D as he sees fit).

123,;you can see a nice comment here;"text field ""one""", ,"",2009/12/17,### yet another "comment" <style> I have encountered!! # ,"another text"

If you look around, you find there are so many variants of these "creative" formats that no definitive solution exists.

If only people would stick to RFCs and published reasonable specs!

For comma separated values:

http://en.wikipedia.org/wiki/Comma-separated_values

http://tools.ietf.org/html/rfc4180

For tab separated values: http://www.fileformat.info/info/mimetype/text/tab-separated-values/index.htm

http://www.cs.tut.fi/~jkorpela/TSV.html

Cheers,


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0