Jump to content

[Solved] StringRegExp, weird case.


MvGulik
 Share

Recommended Posts

If you need to support quoted commas use this:

Code

$sRow = 'data1,"data2,a,b",,data3,,,data4'

$result = _StringSplitRegExp($sRow, ',(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))')
;$result = _StringSplitRegExp($sRow, ',')
For $X = 1 to $result[0]
    ConsoleWrite('[' & $X & ']: ' & $result[$X] & @CRLF)
Next

#cs ----------------------------------------------------------------------------

 AutoIt Version: 3.2.10.0
 Author: WeaponX

 Script Function:
    Split string on regular expression

 Parameters:
    String = String to be split
    Pattern = Pattern to split on
    IncludeMatch = True / False - Indicates whether or not to include the match in the return (back-reference)
    Count = Number of splits to perform

#ce ----------------------------------------------------------------------------
Func _StringSplitRegExp($sString, $sPattern, $sIncludeMatch = false, $iCount = 0)

    ;All matches will be replaced with this string
    Local $sReservedPattern = Chr(0)
    Local $sReservedPattern = "#"
    Local $sReplacePattern = $sReservedPattern

    ;Modify the reserve pattern to include back-reference
    If $sIncludeMatch Then $sReplacePattern = "$0" & $sReplacePattern

    ;Replace all occurences of the search pattern with a replace string
    $sTemp = StringRegExpReplace($sString, $sPattern, $sReplacePattern, $iCount)

    ;Consolewrite($sTemp & @CRLF)

    ;Strip trailing character if it matches the reserved pattern
    If StringRight($sTemp, 1) = $sReservedPattern Then $sTemp = StringTrimRight($sTemp, 1)

    ;Split string using entire reserved string
    $aResult = StringSplit($sTemp, $sReservedPattern, 1)

    Return $aResult
EndFunc

Output

[1]: data1
[2]: "data2,a,b"
[3]: 
[4]: data3
[5]: 
[6]: 
[7]: data4
Link to comment
Share on other sites

  • Replies 65
  • Created
  • Last Reply

Top Posters In This Topic

Top Posters In This Topic

if you want to read up more on PCRE and compare where the actual Perl version and the AutoIt version differ try this page

http://www.pcre.org/pcre.txt

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Link to comment
Share on other sites

whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Link to comment
Share on other sites

the REGex TESTER website

Nice site.

You may also want to download and try the RegExp Coach. It sticks to PCRE only but offers a number of goodies, including a step by step visual "debugger" which helps greatly understanding what really happens under the hood.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Link to comment
Share on other sites

Yes, it's likely a distinct set of build options. Look at PCRE sources and .configure and you'll see that there are various options that condition the flavor or mix of flavors that will be built.

The issue discussed here may not qualify as bug, but rather as side-effect of the build options that the AutoIt team decided to use. These options are likely a bit specific to AutoIt.

I use 6 different implementations of regular expession engins at work some based on the PCRE dll and get slightly different results on each of them. It's a right pain in the ass trying to figure why an expression that I've been using on one system doesn't work on another and having to rewrite it. I love the power of regular expresions but hate the fact that there is no agreed standard for the syntax used or precicly what it should do. I've also used RegExBuddy which allows you to test an expression against 20+ different engines and implementations, which is great for developing expressions that will give the same result on the different enviroments I plan to use them on.

I agree with you that this is probably a deliberate choice by the AutoIt developers to stop the inocent getting into trouble with runnawy expressions, such as replacing nothing with something.

Depending on the engine used, it can be very easy to write an innocent-looking regexp that will backtrack as hell and grab 100% of one CPU for years when trying to match on some innocent string.

Been there done that.Posted Image

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Link to comment
Share on other sites

whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Link to comment
Share on other sites

  • Administrators

How do I compare that to how AutoIt implementation(settings) differs from that. I mean to what do I compare it?

We use the default settings as given in config.h.generic. The exception is that we also add:

#define SUPPORT_UTF8

The test application that pcre spits out as part of the build process is here: http://www.autoitscript.com/autoit3/files/beta/autoit/pcretest.exe - but I don't claim to know how to use it :D

Link to comment
Share on other sites

Thanks for clarification.

What we found a bit surprising is that pcretest (I admit I did only try the 7.8 version from AutoIt download area) gives a distinct result from AutoIt. But, as we know and as others have illustrated above, regexps semantics is a kind of fuzzy area.

BTW and only/when you have time to do it, could you compile a pcretest with the latest PCRE source when a new version is incorporated in AutoIt? If it's not too much to ask it would be cool to have them in synch and coming from the same compilation environment.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

  • Administrators

The one I linked is the one in the current beta. (I think it's pcre 8.00)

If there are glaring differences then we might be calling pcre wrong (it's complicated, especially replacements) so if you are sure then log a bug. "Talk" around here won't usually get acted on.

Link to comment
Share on other sites

whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Link to comment
Share on other sites

@jchd: Ahh, So thats the pcretest you revered to in message #10. And I though it was AutoIt unrelated. :huggles:

I do my best to keep things centered on AutoIt, even if I sometimes disgress a bit around :D

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Link to comment
Share on other sites

... looking into a other aspect of this pattern ...

Yup, but as I wrote some time ago here, CSV reveals to be a moving target in the real world.

Beside numerous variants in the conventions used, there are also really dumb errors.

I'm right now writing a repair function to process broken .csv files from Paypal (they send them broken, they weren't damaged during or after download):

1) The header line uses ', ' as separator (comma & space): this extra space isn't "illegal" but useless and doesn't fit direct use (e.g. my SQLite table column names).

2) Every line (including header) has an extra separator at the end (that creates an erroneous extra column with no name and no data)

3) The file is UTF-8 encoded, but text fields are sometimes randomly applied a crazy "double conversion" which destroys chars > 0x7F, e.g.:

"2010-01-04","Dupont Valérie","123456","Tapis de selle western, 3 couleurs différentes", ...

Both chars in red should be 'é'. The second occurence is now two UTF-8 chars which are the ANSI représentation of the code units that make é in UTF-8.

So it must be that this particular field was initially stored correctly in UTF-8, was then read back as if it were ANSI, stored back this way and send to me in this otherwise UTF-8 file.

4) They format dates in user locale format instead of ISO (only ISO format allows direct processing, sorting, ...)

5) They format amounts in user locale format: "1 024,78" instead of a flat "1024.78" with decimal point.

User locale should only be used as a display format, not as a storage / processing format. CSV is bound to database or spreadsheet processing!

Given that once a reasonable .CSV formatting convention is chosen, it only takes _one_ stock blonde to write a compliant output routine in whatever language you choose, I wonder where / how the great eBay-Paypal group get their programs writen and tested!

And shit, they manage much money, among which is mine :D

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

We use the default settings as given in config.h.generic. The exception is that we also add:

#define SUPPORT_UTF8

It took me some time to realize:

Does this mean that native AutoIt UTF-16LE target and pattern strings are converted to UTF-8 before processing by PCRE and then results and replacement(s) back to UTF-16?

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

whatever Edited by MvGulik

"Straight_and_Crooked_Thinking" : A "classic guide to ferreting out untruths, half-truths, and other distortions of facts in political and social discussions."
"The Secrets of Quantum Physics" : New and excellent 2 part documentary on Quantum Physics by Jim Al-Khalili. (Dec 2014)

"Believing what you know ain't so" ...

Knock Knock ...
 

Link to comment
Share on other sites

  • Administrators

It took me some time to realize:

Does this mean that native AutoIt UTF-16LE target and pattern strings are converted to UTF-8 before processing by PCRE and then results and replacement(s) back to UTF-16?

Yes, all AutoIt strings are UTF16-LE internally and we have to convert back and forth to UTF8 for the pcre engine. It gets real interesting when trying to deal will character positions. By interesting I mean ball breaking.
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...