Kevitto Posted March 21, 2014 Posted March 21, 2014 Good afternoon, I was wondering if there was a simple way of removing everything that is contained between the '<' and '>' characters in a string. I'm using AutoIT to pull information from HTML files and I need any tags removed. Example: <br /><span style="font-size: 14pt; font-weight: normal; font-style: italic;">(téléchargement manuel ou guide de référence)</span> I have to strip out everything contained in <> tags. But the StringReplace function can't help me, because the tags are different depending on the content. I run about 4,000 files through the script. Any help is appreciated! PS: I'm not including my full code because there are waaaaaaay too many functions in there not relating to this. I just need to find a way to strip the strings of all tags and their content.
Moderators Solution Melba23 Posted March 21, 2014 Moderators Solution Posted March 21, 2014 Kevitto,Just what RegExes are designed for: $sString = '<br /><span style="font-size: 14pt; font-weight: normal; font-style: italic;">(téléchargement manuel ou guide de référence)</span> ' $sStripped = StringRegExpReplace($sString, "(?U)(<.*>)", "") ConsoleWrite($sStripped & @CRLF)Decode:(?U) - Not greedy - look for smallest match (<.*>) - Look for anything between <> "" - Replace any found strings with an empty stringAll clear? M23 PoojaKrishna and yyywww 2 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area
Kevitto Posted March 21, 2014 Author Posted March 21, 2014 Sir, you are a gentleman and a scholar. I've spent a lot of time trying to understand Regexp properly and it always eludes me. Thank you so much! Marking as Solved. Palestinian 1
MaxG Posted March 21, 2014 Posted March 21, 2014 Kevitto, I found regular expressions rather cryptic and this site helped me understand them better than any other: http://regexone.com/ It is interactive, progresses from basic to complicated smoothly, and made all the difference to my understanding. Palestinian 1
Kevitto Posted March 21, 2014 Author Posted March 21, 2014 Thank you, MaxG! Will definitely check it out.
Jury Posted March 23, 2014 Posted March 23, 2014 As someone else once pointed out why not this from the helpfile? ; Open a browser with the basic example, read the body Text ; (the content with all HTML tags removed) and display it in a MsgBox #include <IE.au3> #include <MsgBoxConstants.au3> Local $oIE = _IECreate("http://www.pri.org/about-pri") Local $sText = _IEBodyReadText($oIE) ConsoleWrite($sText & @CRLF) _IEQuit($oIE) or I'm I missing something?
Kevitto Posted April 4, 2014 Author Posted April 4, 2014 As someone else once pointed out why not this from the helpfile? ; Open a browser with the basic example, read the body Text ; (the content with all HTML tags removed) and display it in a MsgBox #include <IE.au3> #include <MsgBoxConstants.au3> Local $oIE = _IECreate("http://www.pri.org/about-pri") Local $sText = _IEBodyReadText($oIE) ConsoleWrite($sText & @CRLF) _IEQuit($oIE) or I'm I missing something? In case you were wondering, I was looking for specific parts of the file (I was using FileReadLine to read line per line) because I was searching for specific tags with StringInStr. I just wanted to strip all tags so I could convert the result to a _Date() format and use it to check the data against the current date. So getting the whole body wouldn't have helped .
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now