Jump to content
Sign in to follow this  
timevex

string split escape/ignore quoted delimiters

Recommended Posts

timevex

I have a large set of tab separated and comma separated files that I'd like import into an sqlite data base using sqlite.au3 however I'm having trouble splitting the fields correctly due to separators within double quoted text.

example;

Field1,"Field 2", Field3

12334,"Quoted Text","city,state"

Does anyone know a clean method to do string split that ignores double quoted delimiters?

Would replacing quoted delimiters with a temporary string via a regular expressions replace work?

I'm horrible at regular expressions, would anyone be able to provide the regular expression syntax to replace delimiters inside double quotes?

Example;

String: (1234,"quoted text string","quoted string containing a delimiter, example")

SearchString: any double quoted commas

ReplaceString: "!+!"

Output: (1234,"quoted text string","quoted string containing a delimiter!+! example")

Thoughts?

Thanks in advance!

Tim

Share this post


Link to post
Share on other sites
weaponx

$aRow = _StringSplitRegExp($sRow, ',(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))')

#cs ----------------------------------------------------------------------------

 AutoIt Version: 3.3.0.0
 Author: WeaponX

 Script Function:
 Split string on regular expression
 
 Parameters:
 String = String to be split
 Pattern = Pattern to split on
 IncludeMatch = True / False - Indicates whether or not to include the match in the return (back-reference)
 Count = Number of splits to perform

#ce ----------------------------------------------------------------------------
Func _StringSplitRegExp($sString, $sPattern, $sIncludeMatch = false, $iCount = 0)
 
 ;All matches will be replaced with this string
 Local $sReservedPattern = Chr(0)
 ;Local $sReservedPattern = "#"
 Local $sReplacePattern = $sReservedPattern
 
 ;Modify the reserve pattern to include back-reference
 If $sIncludeMatch Then $sReplacePattern = "$0" & $sReplacePattern
 
 ;Replace all occurences of the search pattern with a replace string
 $sTemp = StringRegExpReplace($sString, $sPattern, $sReplacePattern, $iCount)
 
 ;Consolewrite($sTemp & @CRLF)
 
 ;Strip trailing character if it matches the reserved pattern
 If StringRight($sTemp, 1) = $sReservedPattern Then $sTemp = StringTrimRight($sTemp, 1)
 
 ;Split string using entire reserved string
 $aResult = StringSplit($sTemp, $sReservedPattern, 1)

 Return $aResult
EndFunc

Edited by weaponx

Share this post


Link to post
Share on other sites
timevex

WeaponX,

That is absolutely beautiful!

I need to work with CSV's and tab separated files, Could you provide a regular expression string that will split on TAB's but not on double quoted TABs?

Thank you!

Tim

Share this post


Link to post
Share on other sites
Authenticity

#include <Array.au3>

Local $avStrings[3][3] = [['Field1,"Field 2", Field3,"try," "to,", ",beat this, ", one', ',', '"'], ["12334 'Quoted Text'   'city   state'", @TAB, "'"], ["@Lorem#ispm@#dolor#@sit amet@", '#', '@']]
Local $avMatches

For $i = 0 To UBound($avStrings)-1
    $avMatches = _StringSplitSV($avStrings[$i][0], $avStrings[$i][1], $avStrings[$i][2])
    If Not @error Then _ArrayDisplay($avMatches)
Next

Func _StringSplitSV($sString, $sSepChr = ",", $sSpecialChr = '"')
    Return StringRegExp($sString, "\G(?:\Q" & $sSepChr & "\E|^)((?>[^\Q" & $sSepChr & $sSpecialChr & "\E]*(?:" & $sSpecialChr & "[^\Q" & $sSpecialChr & "\E]*" & $sSpecialChr & ")?)+)", 3)
EndFunc

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×