Jump to content

Complex Regex text extraction


Recommended Posts

I'm trying to write a JSON to XML converter, and need some regex help.

First, assume the input is

{"shows":[{"Title":"Inhabited","ProgramID":29109227,"Schedule":[{"ChanID":28460783,"Affiliate":"MNT","CallLetters":"WUAB"}],"Actors":"Yourdaddy"}]}

note that there is a subnode Schedule. I want to extract the shows node and leave the Schedule node text out.

{"shows":[{"Title":"Inhabited","ProgramID":29109227,"Actors":"Yourdaddy"}]}

so far I have tried many things, such as

StringRegExp($data, '("shows":\[.*?)(?:"[\w]+":\[.*?\])?(.*\])',3)

trying to use the (?:whistle: to leave the inner node out. But nothing is working. Any help?

Or maybe there's a better way to do this without regex?

Link to comment
Share on other sites

[/code][quote name='drlava' post='372864' date='Jul 14 2007, 09:00 AM']I'm trying to write a JSON to XML converter, and need some regex help. 

First, assume the input is
[code]{"shows":[{"Title":"Inhabited","ProgramID":29109227,"Schedule":[{"ChanID":28460783,"Affiliate":"MNT","CallLetters":"WUAB"}],"Actors":"Yourdaddy"}]}

note that there is a subnode Schedule. I want to extract the shows node and leave the Schedule node text out.

{"shows":[{"Title":"Inhabited","ProgramID":29109227,"Actors":"Yourdaddy"}]}
Edited by MisterBates
Link to comment
Share on other sites

[/code][quote name='drlava' post='372864' date='Jul 14 2007, 09:00 AM']I'm trying to write a JSON to XML converter, and need some regex help. 

First, assume the input is
[code]{"shows":[{"Title":"Inhabited","ProgramID":29109227,"Schedule":[{"ChanID":28460783,"Affiliate":"MNT","CallLetters":"WUAB"}],"Actors":"Yourdaddy"}]}

note that there is a subnode Schedule. I want to extract the shows node and leave the Schedule node text out.

{"shows":[{"Title":"Inhabited","ProgramID":29109227,"Actors":"Yourdaddy"}]}

$sIn = '{"shows":[{"Title":"Inhabited","ProgramID":29109227,"Schedule":[{"ChanID":28460783,"Affiliate":"MNT","CallLetters":"WUAB"}],"Actors":"Yourdaddy"}]}'
$Out = StringRegExpReplace($sIn, '"Schedule"\:\[.*?\],','')
                                                                    ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : $Out = ' & $Out & @crlf & '>Error code: ' & @error & @crlf);### Debug Console

produces output

Note that the pattern does not currently notice nodes nested inside the schedule node.

EDIT: This pattern handles nodes nested inside the schedule node (matched pairs of []):

$Out = StringRegExpReplace($sIn, '"Schedule"\:\[.*?(\[.*\].*?)*\],','')
Thank you! I was going to try to use a recursive function to find nodes inside Schedule, so your second pattern is very helpful. Also, the inner node may or may not exist, and may not be called schedule, and may be terminated with a [,\]}] so

$Out = StringRegExpReplace($sIn,'"\w+?"\:\[.*?(\[.*\].*?)*\][,\]}]','$lengthofshows') would probably work. I'll try using the same expression with StringRegExp to extract the inner node for further processing.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...