Jump to content

Recommended Posts

Posted

I need to parse string with unknown beforehand count of repeats into another string using StringRegExpReplace()

from : <div class='tags'><h3>Tags</h3><ul><li>aaa</li><li>bbb</li><li>ccc</li><li>ddd</li><li>eee</li></ul></div>

to : aaa, bbb, ccc, ddd, eee

But i still cannot grasp repeatable patterns in StringRegExpReplace()... with array it simple, but with string...

$sTags = StringRegExpReplace($aEntry[$i],"(?si).+?<h3>Tags</h3><ul>((<li>\w++</li>)\K).++","$2"

 

Posted (edited)

This mostly satisfies your request, but leaves a trailing ", ":

$sTags = StringRegExpReplace($aEntry[$i], "(?:.*?)<li>(.*?)<\/li>(?:(?!<\/?li>).)*", "$1, ")
Spoiler

Explanation: (?:.*?)<li>(.*?)<\/li>(?:(?!<\/?li>).)*

(?:.*?) - Capture everything before the first <li> tag, only expanding as needed (lazy)

<li>(.*?)<\/li> - Capture everything between <li> tags, only expanding as needed (lazy)

(?:(?!<\/li>).)* - The beast... uses a negative lookahead to ensure that we don't capture a trailing <li> tag, basically captures everything after the final </li>

Personally though, I would match everything between <li> and </li> tags with StringRegExp option 3 and concatenate the results back together...

Func ConcatRegExp($sText)
    
    Local $aResults = StringRegExp($sText, "<li>(.*?)<\/li>", 3)
    If @error Then Return SetError(1, 0, False)
    
    Local $sReturn = ""
    For $i=0 To Ubound($aResults) - 1
        $sReturn &= $aResults[$i] & ", "
    Next
    
    Return StringTrimRight($sReturn, 2)
    
EndFunc
Edited by seadoggie01
Added explanation

All my code provided is Public Domain... but it may not work. ;) Use it, change it, break it, whatever you want.

Spoiler

My Humble Contributions:
Personal Function Documentation - A personal HelpFile for your functions
Acro.au3 UDF - Automating Acrobat Pro
ToDo Finder - Find #ToDo: lines in your scripts
UI-SimpleWrappers UDF - Use UI Automation more Simply-er
KeePass UDF - Automate KeePass, a password manager
InputBoxes - Simple Input boxes for various variable types

Posted

Thanks! This leaves trailing "</li>" in resulting string, but it still good result as it faster than have to deal with array after StringRegExp()

Posted

Do you have different data? When I try it I don't get a trailing tag:

ConsoleWrite(StringTrimRight(StringRegExpReplace("<div class='tags'><h3>Tags</h3><ul><li>aaa</li><li>bbb</li><li>ccc</li><li>ddd</li><li>eee</li></ul></div>", "(?:.*?)<li>(.*?)<\/li>(?:(?!<\/?li>).)*", "$1, "), 2) & @CRLF)

All my code provided is Public Domain... but it may not work. ;) Use it, change it, break it, whatever you want.

Spoiler

My Humble Contributions:
Personal Function Documentation - A personal HelpFile for your functions
Acro.au3 UDF - Automating Acrobat Pro
ToDo Finder - Find #ToDo: lines in your scripts
UI-SimpleWrappers UDF - Use UI Automation more Simply-er
KeePass UDF - Automate KeePass, a password manager
InputBoxes - Simple Input boxes for various variable types

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...