Jump to content

string split escape/ignore quoted delimiters


timevex
 Share

Recommended Posts

I have a large set of tab separated and comma separated files that I'd like import into an sqlite data base using sqlite.au3 however I'm having trouble splitting the fields correctly due to separators within double quoted text.

example;

Field1,"Field 2", Field3

12334,"Quoted Text","city,state"

Does anyone know a clean method to do string split that ignores double quoted delimiters?

Would replacing quoted delimiters with a temporary string via a regular expressions replace work?

I'm horrible at regular expressions, would anyone be able to provide the regular expression syntax to replace delimiters inside double quotes?

Example;

String: (1234,"quoted text string","quoted string containing a delimiter, example")

SearchString: any double quoted commas

ReplaceString: "!+!"

Output: (1234,"quoted text string","quoted string containing a delimiter!+! example")

Thoughts?

Thanks in advance!

Tim

Link to comment
Share on other sites

$aRow = _StringSplitRegExp($sRow, ',(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))')

#cs ----------------------------------------------------------------------------

 AutoIt Version: 3.3.0.0
 Author: WeaponX

 Script Function:
 Split string on regular expression
 
 Parameters:
 String = String to be split
 Pattern = Pattern to split on
 IncludeMatch = True / False - Indicates whether or not to include the match in the return (back-reference)
 Count = Number of splits to perform

#ce ----------------------------------------------------------------------------
Func _StringSplitRegExp($sString, $sPattern, $sIncludeMatch = false, $iCount = 0)
 
 ;All matches will be replaced with this string
 Local $sReservedPattern = Chr(0)
 ;Local $sReservedPattern = "#"
 Local $sReplacePattern = $sReservedPattern
 
 ;Modify the reserve pattern to include back-reference
 If $sIncludeMatch Then $sReplacePattern = "$0" & $sReplacePattern
 
 ;Replace all occurences of the search pattern with a replace string
 $sTemp = StringRegExpReplace($sString, $sPattern, $sReplacePattern, $iCount)
 
 ;Consolewrite($sTemp & @CRLF)
 
 ;Strip trailing character if it matches the reserved pattern
 If StringRight($sTemp, 1) = $sReservedPattern Then $sTemp = StringTrimRight($sTemp, 1)
 
 ;Split string using entire reserved string
 $aResult = StringSplit($sTemp, $sReservedPattern, 1)

 Return $aResult
EndFunc

Edited by weaponx
Link to comment
Share on other sites

#include <Array.au3>

Local $avStrings[3][3] = [['Field1,"Field 2", Field3,"try," "to,", ",beat this, ", one', ',', '"'], ["12334 'Quoted Text'   'city   state'", @TAB, "'"], ["@Lorem#ispm@#dolor#@sit amet@", '#', '@']]
Local $avMatches

For $i = 0 To UBound($avStrings)-1
    $avMatches = _StringSplitSV($avStrings[$i][0], $avStrings[$i][1], $avStrings[$i][2])
    If Not @error Then _ArrayDisplay($avMatches)
Next

Func _StringSplitSV($sString, $sSepChr = ",", $sSpecialChr = '"')
    Return StringRegExp($sString, "\G(?:\Q" & $sSepChr & "\E|^)((?>[^\Q" & $sSepChr & $sSpecialChr & "\E]*(?:" & $sSpecialChr & "[^\Q" & $sSpecialChr & "\E]*" & $sSpecialChr & ")?)+)", 3)
EndFunc

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...