Jump to content

StringRegExp hell


 Share

Recommended Posts

hi  guys 

i have   a problems  i  have  a xml code  and  i want   take  some part of  it 

<?xml version="1.0" encoding="UTF-8"?>
<GetSuggestedCategoriesResponse xmlns="urn:ebay:apis:eBLBaseComponents"><Timestamp>2016-08-24T23:21:20.616Z</Timestamp><Ack>Success</Ack><Version>979</Version><Build>E979_UNI_API5_18046952_R1</Build><SuggestedCategoryArray><SuggestedCategory><Category><CategoryID>9355</CategoryID><CategoryName>Cellulari e smartphone</CategoryName><CategoryParentID>15032</CategoryParentID><CategoryParentName>Telefonia fissa e mobile</CategoryParentName></Category><PercentItemFound>96</PercentItemFound></SuggestedCategory><SuggestedCategory><Category><CategoryID>20349</CategoryID><CategoryName>Cover e custodie</CategoryName><CategoryParentID>15032</CategoryParentID><CategoryParentID>9394</CategoryParentID><CategoryParentName>Telefonia fissa e mobile</CategoryParentName><CategoryParentName>Accessori cellulari e palmari</CategoryParentName></Category><PercentItemFound>3</PercentItemFound></SuggestedCategory><SuggestedCategory><Category><CategoryID>43304</CategoryID><CategoryName>Cellulari: componenti</CategoryName><CategoryParentID>15032</CategoryParentID><CategoryParentName>Telefonia fissa e mobile</CategoryParentName></Category><PercentItemFound>1</PercentItemFound></SuggestedCategory><SuggestedCategory><Category><CategoryID>123422</CategoryID><CategoryName>Cavi e adattatori</CategoryName><CategoryParentID>15032</CategoryParentID><CategoryParentID>9394</CategoryParentID><CategoryParentName>Telefonia fissa e mobile</CategoryParentName><CategoryParentName>Accessori cellulari e palmari</CategoryParentName></Category><PercentItemFound>0</PercentItemFound></SuggestedCategory><SuggestedCategory><Category><CategoryID>1383</CategoryID><CategoryName>Gadget</CategoryName><CategoryParentID>1</CategoryParentID><CategoryParentID>8894</CategoryParentID><CategoryParentName>Collezionismo</CategoryParentName><CategoryParentName>Sorpresine e gadget</CategoryParentName></Category><PercentItemFound>0</PercentItemFound></SuggestedCategory><SuggestedCategory><Category><CategoryID>3516</CategoryID><CategoryName>Manuali e risorse</CategoryName><CategoryParentID>58058</CategoryParentID><CategoryParentName>Informatica</CategoryParentName></Category><PercentItemFound>0</PercentItemFound></SuggestedCategory><SuggestedCategory><Category><CategoryID>42425</CategoryID><CategoryName>Altri accessori per cellulari</CategoryName><CategoryParentID>15032</CategoryParentID><CategoryParentID>9394</CategoryParentID><CategoryParentName>Telefonia fissa e mobile</CategoryParentName><CategoryParentName>Accessori cellulari e palmari</CategoryParentName></Category><PercentItemFound>0</PercentItemFound></SuggestedCategory><SuggestedCategory><Category><CategoryID>112190</CategoryID><CategoryName>Varese</CategoryName><CategoryParentID>1</CategoryParentID><CategoryParentID>914</CategoryParentID><CategoryParentID>20245</CategoryParentID><CategoryParentID>112173</CategoryParentID><CategoryParentName>Collezionismo</CategoryParentName><CategoryParentName>Cartoline</CategoryParentName><CategoryParentName>Paesaggistiche italiane</CategoryParentName><CategoryParentName>Lombardia</CategoryParentName></Category><PercentItemFound>0</PercentItemFound></SuggestedCategory><SuggestedCategory><Category><CategoryID>58540</CategoryID><CategoryName>Proteggi schermo</CategoryName><CategoryParentID>15032</CategoryParentID><CategoryParentID>9394</CategoryParentID><CategoryParentName>Telefonia fissa e mobile</CategoryParentName><CategoryParentName>Accessori cellulari e palmari</CategoryParentName></Category><PercentItemFound>0</PercentItemFound></SuggestedCategory><SuggestedCategory><Category><CategoryID>123417</CategoryID><CategoryName>Caricabatterie e dock</CategoryName><CategoryParentID>15032</CategoryParentID><CategoryParentID>9394</CategoryParentID><CategoryParentName>Telefonia fissa e mobile</CategoryParentName><CategoryParentName>Accessori cellulari e palmari</CategoryParentName></Category><PercentItemFound>0</PercentItemFound></SuggestedCategory></SuggestedCategoryArray><CategoryCount>10</CategoryCount></GetSuggestedCategoriesResponse>+>01:21:33 AutoIt3.exe ended.rc:0

i want  take  only  category id , category name  categoryparentid  category parent name .

i used  this  code

Local $aArray_link_altrePAG1 = StringRegExp($Reslut_getitemstatus, '([^>]*)', 3)

  ,  but not  work good  ,  becase:

i dont  want  a    </PercentItemFound and  number of it associated

and  i can not  remove tag

</Category

</CategoryParentName

<CategoryParentID

<CategoryID

etc..

someone can help me??  thankz  at all

 

 

 

 

Link to comment
Share on other sites

You could use something like this but as the xml is a little inconsistent the results will be dificult to work with  :(
Probably a xml dedicated udf would be of better help

#Include <Array.au3>

$txt = FileRead("1.xml")
$tmp = StringRegExp($txt, '(<CategoryID>|<CategoryName>|<CategoryParentID>|<CategoryParentName>)([^<]*)', 3)
; _ArrayDisplay($tmp)

$res = _ArrayDiv($tmp, 2)
 _ArrayDisplay($res)

Func _ArrayDiv($array, $div, $cols = $div)
   If Mod(UBound($array), $div) <> 0 Then Return SetError(1, 0, 0)
   If $cols < $div Then Return SetError(2, 0, 0)
   Local $tmp[UBound($array)/$div][$cols]
   For $i = 0 to UBound($array)-1 step $div
       For $j = 0 to $div-1
          $tmp[$i/$div][$j] = $array[$i+$j]
       Next
   Next
   Return $tmp
EndFunc

 

Edited by mikell
Link to comment
Share on other sites

Another way :

#Include <Array.au3>

$txt = FileRead("file.xml")
$tmp = StringRegExp($txt, '(?i)(?:<(Category)>|\G)(?|<(Category(?:Parent)?(?:ID|Name))>([^<]+))<\/(?-1)', 3)
_ArrayDisplay($tmp)
$n = 0
For $i = 0 To UBound($tmp) - 1 Step 3
    If $tmp[$i] = "Category" Then
        $n += 1
        ConsoleWrite(@CRLF & "Item #" & $n & @CRLF)
    EndIf
    ConsoleWrite(" " & $tmp[$i + 1] & " : " & $tmp[$i + 2] & @CRLF)
Next

or easiest :

#Include <Array.au3>

$txt = FileRead("file.xml")
$tmp = StringRegExp($txt, '(?is)<Category>(.+?)</Category>', 3)

For $i = 0 To UBound($tmp) - 1
    $aAttributes = _ArrayDiv( StringRegExp($tmp[$i], "<(Category(?:Parent)?(?:ID|Name))>([^<]+)", 3), 2)
    _ArrayDisplay($aAttributes)
Next



Func _ArrayDiv($array, $div, $cols = $div)
   If Mod(UBound($array), $div) <> 0 Then Return SetError(1, 0, 0)
   If $cols < $div Then Return SetError(2, 0, 0)
   Local $tmp[UBound($array)/$div][$cols]
   For $i = 0 to UBound($array)-1 step $div
       For $j = 0 to $div-1
          $tmp[$i/$div][$j] = $array[$i+$j]
       Next
   Next
   Return $tmp
EndFunc

 

Edited by jguinch
Link to comment
Share on other sites

hi @jguinch ,i  tested  your solution  , is  nice  solution but  resolve only  in part  my problem because  i  must create situation like this

15032_Telefonia fissa e mobile___> 9355_Cellulari e smartphone                this is  a first category   by first block 

15032_Telefonia fissa e mobile____>  9394_Accessori cellulari e palmari ____>  20349_Cover e custodie                    this is a second category   by second block

 

the structure of logic  is  in this  figure

http://it.tinypic.com/r/14xf21w/9

thankz again for helps and support

 

Link to comment
Share on other sites

So, another one :

#Include <Array.au3>

$txt = FileRead("file.xml")
$tmp = StringRegExp($txt, '(?is)<Category>(.+?)</Category>', 3)

For $i = 0 To UBound($tmp) - 1
    $aCategory = StringRegExp($tmp[$i], "(?=.*<CategoryID>([^<]+))(?=.*<CategoryName>([^<]+))", 1)
    $aParentsID = StringRegExp($tmp[$i], "<CategoryParentID>([^<]+)", 3)
    $aParentsName = StringRegExp($tmp[$i], "<CategoryParentName>([^<]+)", 3)
    Local $aResult[ 2 + UBound($aParentsID) * 2][2] = [[ "CategoryID", $aCategory[0] ],[ "CategoryName", $aCategory[1] ] ]

    $k = 2
    For $n = 0 To UBound($aParentsID) - 1
        $aResult[$k][0] = "CategoryParentID"
        $aResult[$k][1] = $aParentsID[$n]
        $aResult[$k + 1][0] = "CategoryParentName"
        $aResult[$k + 1][1] = $aParentsName[$n]
        $k += 2
    Next

    _ArrayDisplay($aResult)
Next

 

Link to comment
Share on other sites

On 8/25/2016 at 6:04 AM, faustf said:

 

hi  guys 

i have   a problems  i  have  a xml code  and  i want   take  some part of  it 

 

I'd tidy the XML to clarify the view
 

<?xml version="1.0" encoding="UTF-8"?>
<GetSuggestedCategoriesResponse xmlns="urn:ebay:apis:eBLBaseComponents">
	<Timestamp>2016-08-24T23:21:20.616Z</Timestamp>
	<Ack>Success</Ack>
	<Version>979</Version>
	<Build>E979_UNI_API5_18046952_R1</Build>
	<SuggestedCategoryArray>
		<SuggestedCategory>
			<Category>
				<CategoryID>9355</CategoryID>
				<CategoryName>Cellulari e smartphone</CategoryName>
				<CategoryParentID>15032</CategoryParentID>
				<CategoryParentName>Telefonia fissa e mobile</CategoryParentName>
			</Category>
			<PercentItemFound>96</PercentItemFound>
		</SuggestedCategory>
		<SuggestedCategory>
			<Category>
				<CategoryID>20349</CategoryID>
				<CategoryName>Cover e custodie</CategoryName>
				<CategoryParentID>15032</CategoryParentID>
				<CategoryParentID>9394</CategoryParentID>
				<CategoryParentName>Telefonia fissa e mobile</CategoryParentName>
				<CategoryParentName>Accessori cellulari e palmari</CategoryParentName>
			</Category>
			<PercentItemFound>3</PercentItemFound>
		</SuggestedCategory>
		<SuggestedCategory>
			<Category>
				<CategoryID>43304</CategoryID>
				<CategoryName>Cellulari: componenti</CategoryName>
				<CategoryParentID>15032</CategoryParentID>
				<CategoryParentName>Telefonia fissa e mobile</CategoryParentName>
			</Category>
			<PercentItemFound>1</PercentItemFound>
		</SuggestedCategory>
		<SuggestedCategory>
			<Category>
				<CategoryID>123422</CategoryID>
				<CategoryName>Cavi e adattatori</CategoryName>
				<CategoryParentID>15032</CategoryParentID>
				<CategoryParentID>9394</CategoryParentID>
				<CategoryParentName>Telefonia fissa e mobile</CategoryParentName>
				<CategoryParentName>Accessori cellulari e palmari</CategoryParentName>
			</Category>
			<PercentItemFound>0</PercentItemFound>
		</SuggestedCategory>
		<SuggestedCategory>
			<Category>
				<CategoryID>1383</CategoryID>
				<CategoryName>Gadget</CategoryName>
				<CategoryParentID>1</CategoryParentID>
				<CategoryParentID>8894</CategoryParentID>
				<CategoryParentName>Collezionismo</CategoryParentName>
				<CategoryParentName>Sorpresine e gadget</CategoryParentName>
			</Category>
			<PercentItemFound>0</PercentItemFound>
		</SuggestedCategory>
		<SuggestedCategory>
			<Category>
				<CategoryID>3516</CategoryID>
				<CategoryName>Manuali e risorse</CategoryName>
				<CategoryParentID>58058</CategoryParentID>
				<CategoryParentName>Informatica</CategoryParentName>
			</Category>
			<PercentItemFound>0</PercentItemFound>
		</SuggestedCategory>
		<SuggestedCategory>
			<Category>
				<CategoryID>42425</CategoryID>
				<CategoryName>Altri accessori per cellulari</CategoryName>
				<CategoryParentID>15032</CategoryParentID>
				<CategoryParentID>9394</CategoryParentID>
				<CategoryParentName>Telefonia fissa e mobile</CategoryParentName>
				<CategoryParentName>Accessori cellulari e palmari</CategoryParentName>
			</Category>
			<PercentItemFound>0</PercentItemFound>
		</SuggestedCategory>
		<SuggestedCategory>
			<Category>
				<CategoryID>112190</CategoryID>
				<CategoryName>Varese</CategoryName>
				<CategoryParentID>1</CategoryParentID>
				<CategoryParentID>914</CategoryParentID>
				<CategoryParentID>20245</CategoryParentID>
				<CategoryParentID>112173</CategoryParentID>
				<CategoryParentName>Collezionismo</CategoryParentName>
				<CategoryParentName>Cartoline</CategoryParentName>
				<CategoryParentName>Paesaggistiche italiane</CategoryParentName>
				<CategoryParentName>Lombardia</CategoryParentName>
			</Category>
			<PercentItemFound>0</PercentItemFound>
		</SuggestedCategory>
		<SuggestedCategory>
			<Category>
				<CategoryID>58540</CategoryID>
				<CategoryName>Proteggi schermo</CategoryName>
				<CategoryParentID>15032</CategoryParentID>
				<CategoryParentID>9394</CategoryParentID>
				<CategoryParentName>Telefonia fissa e mobile</CategoryParentName>
				<CategoryParentName>Accessori cellulari e palmari</CategoryParentName>
			</Category>
			<PercentItemFound>0</PercentItemFound>
		</SuggestedCategory>
		<SuggestedCategory>
			<Category>
				<CategoryID>123417</CategoryID>
				<CategoryName>Caricabatterie e dock</CategoryName>
				<CategoryParentID>15032</CategoryParentID>
				<CategoryParentID>9394</CategoryParentID>
				<CategoryParentName>Telefonia fissa e mobile</CategoryParentName>
				<CategoryParentName>Accessori cellulari e palmari</CategoryParentName>
			</Category>
			<PercentItemFound>0</PercentItemFound>
		</SuggestedCategory>
	</SuggestedCategoryArray>
	<CategoryCount>10</CategoryCount>
</GetSuggestedCategoriesResponse>

it's easier to reason when is organized. 

Edit1: added some code. Not what you asked for but it may help

#include <Array.au3>
Global $aXML = xmlThingToArray(FileRead("184274-stringregexp-hell.xml")) ; or the filename you have in the disk
_ArrayDisplay($aXML, "$aXML")
Func xmlThingToArray($sXML)
    $sXML = StringReplace($sXML, @CR, "")
    $sXML = StringReplace($sXML, @LF, "")
    Local $aTmp = StringSplit($sXML, "<>", 0)
    If UBound($aTmp) < 4 Then Return SetError(1, 0, "you do this, I'm just showing an example")
    Local $i, $n, $m, $aXML[UBound($aTmp) + 1][4]
    $aXML[0][0] = "CategoryID"
    $aXML[0][1] = "CategoryName"
    $aXML[0][2] = "CategoryParentID"
    $aXML[0][3] = "CategoryParentName"
    $i = 0 ; counter, nothing yet
    For $n = 5 To $aTmp[0]
        If $aTmp[$n - 2] = $aXML[0][0] And $aTmp[$n] = "/" & $aXML[0][0] Then ; found a new CategoryID
            $i += 1
            $aXML[$i][0] = $aTmp[$n - 1]
        EndIf
        For $m = 1 To 3
            If $aTmp[$n - 2] = $aXML[0][$m] And $aTmp[$n] = "/" & $aXML[0][$m] Then
                $aXML[$i][$m] = $aXML[$i][$m] & $aTmp[$n - 1] & ";"
            EndIf
        Next
    Next
    ReDim $aXML[$i + 1][4]
    Return $aXML
EndFunc   ;==>xmlThingToArray

 

Edited by argumentum
added some code :)

Follow the link to my code contribution ( and other things too ).
FAQ - Please Read Before Posting.
autoit_scripter_blue_userbar.png

Link to comment
Share on other sites

The function _StringBetween() is very useful for this type of task. Although it does make use of regular expressions in the background these are hidden from the novice regular expression user. For situations where you need to extract some information from some from XML data it is ideal, even for badly structured XML like your sample.

 

#include <string.au3>
#include <array.au3>


Global $sXmlData = FileRead("Faustf.xml")
Global $aResults = ParseGetSuggestedCategoriesResponse($sXmlData)
_ArrayDisplay($aResults, "GetSuggestedCategoriesResponse", "", 0, Default, "CategoryID|CategoryName|CategoryParentID 1|CategoryParentName 1|CategoryParentID 2|CategoryParentName 2")

Func ParseGetSuggestedCategoriesResponse($sXmlData)
    Local $aData = _StringBetween($sXmlData, "<Category>", "</Category>", $STR_ENDNOTSTART, True)
    Local $aResults[UBound($aData)][6]
    Local $aTmp

    For $i = 0 To UBound($aData) - 1
        $aTmp = _StringBetween($aData[$i], "<CategoryID>", "</CategoryID>", $STR_ENDNOTSTART, True)
        If Not @error Then $aResults[$i][0] = $aTmp[0]
        $aTmp = _StringBetween($aData[$i], "<CategoryName>", "</CategoryName>", $STR_ENDNOTSTART, True)
        If Not @error Then $aResults[$i][1] = $aTmp[0]
        $aTmp = _StringBetween($aData[$i], "<CategoryParentID>", "</CategoryParentID>", $STR_ENDNOTSTART, True)
        If Not @error Then
            $aResults[$i][2] = $aTmp[0]
            If UBound($aTmp) > 1 Then $aResults[$i][4] = $aTmp[1]
        EndIf
        $aTmp = _StringBetween($aData[$i], "<CategoryParentName>", "</CategoryParentName>", $STR_ENDNOTSTART, True)
        If Not @error Then
            $aResults[$i][3] = $aTmp[0]
            If UBound($aTmp) > 1 Then $aResults[$i][5] = $aTmp[1]
        EndIf
    Next
    Return $aResults
EndFunc   ;==>ParseGetSuggestedCategoriesResponse

 

As an aside the structure of the SuggestedCategory element should be something like this so that each CategoryParentID can be positively associated with the correct CategoryParentName

 

		<SuggestedCategory>
			<Category>
				<CategoryID>123417</CategoryID>
				<CategoryName>Caricabatterie e dock</CategoryName>
				<CategoryParents>
					<CategoryParent>
						<CategoryParentID>15032</CategoryParentID>
						<CategoryParentName>Telefonia fissa e mobile</CategoryParentName>
					</CategoryParent>
					<CategoryParent>
						<CategoryParentID>9394</CategoryParentID>
						<CategoryParentName>Accessori cellulari e palmari</CategoryParentName>
					</CategoryParent>
				</CategoryParents>
			</Category>
			<PercentItemFound>0</PercentItemFound>
		</SuggestedCategory>

 

 

 

 

Edited by Bowmore

"Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the universe trying to build bigger and better idiots. So far, the universe is winning."- Rick Cook

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...