myspacee Posted March 5, 2019 Posted March 5, 2019 Hello, I'm working on command line tool that manipulate text to extract only info I need. I need this tool to HTML scraping. I don't know how to solve a problem. I've reducing BIG text to strings like this : [Lunedì 04/03]<30466>[16:50]<30467>[19:15]<R4nd0m>[21:20] I need to remove all text inserted in these symbol <> and obtain this : [Lunedì 04/03][16:50][19:15][21:20] But using StringRegExpReplace I obtain this : [Lunedì 04/03][21:20] Can you suggest me some code, or syntax, to solve this issue ? Thank you for your time, m.
myspacee Posted March 5, 2019 Author Posted March 5, 2019 StringRegExpReplace($file_input_line, "\<(.*)\>", "") generally above; where <> can be symbols that i want. m.
mikell Posted March 5, 2019 Posted March 5, 2019 Try this : "(<.*?>)" Fr33b0w, hudsonhock and pixelsearch 2 1
Dionysis Posted March 5, 2019 Posted March 5, 2019 Your expression is greedy, this must be ok. StringRegExpReplace($file_input_line, "\<(.*?)\>", "") Also, I don't think that you need the backslashes as "<" and ">" aren't regexp metachars
FrancescoDiMuro Posted March 5, 2019 Posted March 5, 2019 @myspacee Global $strString = '[Lunedì 04/03]<30466>[16:50]<30467>[19:15]<R4nd0m>[21:20]' ConsoleWrite("Before: " & $strString & @CRLF & _ "After : " & StringRegExpReplace($strString, '(<[^>]+>)', '') & @CRLF) Fr33b0w and mikell 2 Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
myspacee Posted March 5, 2019 Author Posted March 5, 2019 25 minutes ago, mikell said: Try this : "(<.*?>)" This works. Thank you ! Another question : Is it possible to use StringRegExpReplace with words instead of single symbol ? eg: StringRegExpReplace($file_input_line, "(div.*?/div)", "") Thank you, m.
FrancescoDiMuro Posted March 5, 2019 Posted March 5, 2019 @myspacee Post a sample string Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
myspacee Posted March 5, 2019 Author Posted March 5, 2019 4 minutes ago, FrancescoDiMuro said: @myspacee Post a sample string StringRegExpReplace($file_input_line, "(div.*?/div)", "") Where div and /div are common HTML codes.
FrancescoDiMuro Posted March 5, 2019 Posted March 5, 2019 @myspacee You mean something like this? #include <StringConstants.au3> Global $strString = '<a href = "someurl">Someurl</a>' & @CRLF & _ '<div name = "somediv">Div Content </div>' ConsoleWrite("Before: " & $strString & @CRLF & _ "After : " & StringRegExpReplace($strString, '<div[^>]*>[^<]*</div[^>]*>', '') & @CRLF) Click here to see my signature: Spoiler ALWAYS GOOD TO READ: Forum Rules Forum Etiquette
Dionysis Posted March 5, 2019 Posted March 5, 2019 13 minutes ago, myspacee said: This works. Thank you ! Another question : Is it possible to use StringRegExpReplace with words instead of single symbol ? eg: StringRegExpReplace($file_input_line, "(div.*?/div)", "") Thank you, m. Yes, it is possible. The regexp you got here will turn "<body><div><p>Some stuff</p></div></body>" into "<body><></body>" as it would remove the text I have in bold You can read the StringRegExp help file and the Regular Expression Tutorial for (a lot) more in depth info!
myspacee Posted March 5, 2019 Author Posted March 5, 2019 Thank you, in fact, correcting syntax as suggested, solves also my word delimiters 'problem' .... Autoit command line tool is core of my PHP site: Thank you all for support, m.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now