Jump to content

StringRegExpReplace: multiple delimiters occurrences


Recommended Posts

Hello,
I'm working on command line tool that manipulate text to extract only info I need.
I need this tool to HTML scraping.

I don't know how to solve a problem. I've reducing BIG text to strings like this :

[Lunedì 04/03]<30466>[16:50]<30467>[19:15]<R4nd0m>[21:20]

I need to remove all text inserted in these symbol <>
and obtain this :

[Lunedì 04/03][16:50][19:15][21:20]

But using StringRegExpReplace I obtain this :

[Lunedì 04/03][21:20]

Can you suggest me some code, or syntax, to solve this issue ?

Thank you for your time,
m.

Link to comment
Share on other sites

@myspacee

Global $strString = '[Lunedì 04/03]<30466>[16:50]<30467>[19:15]<R4nd0m>[21:20]'

ConsoleWrite("Before: " & $strString & @CRLF & _
             "After : " & StringRegExpReplace($strString, '(<[^>]+>)', '') & @CRLF)

:)

Click here to see my signature:

Spoiler

ALWAYS GOOD TO READ:

 

Link to comment
Share on other sites

25 minutes ago, mikell said:

Try this :  "(<.*?>)"

This works. Thank you !


Another question : Is it possible to use StringRegExpReplace with words instead of single symbol ?

eg:        StringRegExpReplace($file_input_line, "(div.*?/div)", "")

Thank you,
m.

Link to comment
Share on other sites

@myspacee
You mean something like this? :)

#include <StringConstants.au3>

Global $strString = '<a href = "someurl">Someurl</a>' & @CRLF & _
                    '<div name = "somediv">Div Content </div>'

ConsoleWrite("Before: " & $strString & @CRLF & _
             "After : " & StringRegExpReplace($strString, '<div[^>]*>[^<]*</div[^>]*>', '') & @CRLF)

 

Click here to see my signature:

Spoiler

ALWAYS GOOD TO READ:

 

Link to comment
Share on other sites

13 minutes ago, myspacee said:

This works. Thank you !


Another question : Is it possible to use StringRegExpReplace with words instead of single symbol ?

eg:        StringRegExpReplace($file_input_line, "(div.*?/div)", "")

Thank you,
m.

Yes, it is possible. The regexp you got here will turn  "<body><div><p>Some stuff</p></div></body>" into "<body><></body>" as it would remove the text I have in bold

You can read the StringRegExp help file and the Regular Expression Tutorial for (a lot) more in depth info!

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...