Kevitto Posted March 21, 2014 Share Posted March 21, 2014 Good afternoon, I was wondering if there was a simple way of removing everything that is contained between the '<' and '>' characters in a string. I'm using AutoIT to pull information from HTML files and I need any tags removed. Example: <br /><span style="font-size: 14pt; font-weight: normal; font-style: italic;">(téléchargement manuel ou guide de référence)</span> I have to strip out everything contained in <> tags. But the StringReplace function can't help me, because the tags are different depending on the content. I run about 4,000 files through the script. Any help is appreciated! PS: I'm not including my full code because there are waaaaaaay too many functions in there not relating to this. I just need to find a way to strip the strings of all tags and their content. Link to comment Share on other sites More sharing options...
Moderators Solution Melba23 Posted March 21, 2014 Moderators Solution Share Posted March 21, 2014 Kevitto,Just what RegExes are designed for: $sString = '<br /><span style="font-size: 14pt; font-weight: normal; font-style: italic;">(téléchargement manuel ou guide de référence)</span> ' $sStripped = StringRegExpReplace($sString, "(?U)(<.*>)", "") ConsoleWrite($sStripped & @CRLF)Decode:(?U) - Not greedy - look for smallest match (<.*>) - Look for anything between <> "" - Replace any found strings with an empty stringAll clear? M23 yyywww and PoojaKrishna 2 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
Kevitto Posted March 21, 2014 Author Share Posted March 21, 2014 Sir, you are a gentleman and a scholar. I've spent a lot of time trying to understand Regexp properly and it always eludes me. Thank you so much! Marking as Solved. Palestinian 1 Link to comment Share on other sites More sharing options...
MaxG Posted March 21, 2014 Share Posted March 21, 2014 Kevitto, I found regular expressions rather cryptic and this site helped me understand them better than any other: http://regexone.com/ It is interactive, progresses from basic to complicated smoothly, and made all the difference to my understanding. Palestinian 1 Link to comment Share on other sites More sharing options...
Kevitto Posted March 21, 2014 Author Share Posted March 21, 2014 Thank you, MaxG! Will definitely check it out. Link to comment Share on other sites More sharing options...
Jury Posted March 23, 2014 Share Posted March 23, 2014 As someone else once pointed out why not this from the helpfile? ; Open a browser with the basic example, read the body Text ; (the content with all HTML tags removed) and display it in a MsgBox #include <IE.au3> #include <MsgBoxConstants.au3> Local $oIE = _IECreate("http://www.pri.org/about-pri") Local $sText = _IEBodyReadText($oIE) ConsoleWrite($sText & @CRLF) _IEQuit($oIE) or I'm I missing something? Link to comment Share on other sites More sharing options...
Kevitto Posted April 4, 2014 Author Share Posted April 4, 2014 As someone else once pointed out why not this from the helpfile? ; Open a browser with the basic example, read the body Text ; (the content with all HTML tags removed) and display it in a MsgBox #include <IE.au3> #include <MsgBoxConstants.au3> Local $oIE = _IECreate("http://www.pri.org/about-pri") Local $sText = _IEBodyReadText($oIE) ConsoleWrite($sText & @CRLF) _IEQuit($oIE) or I'm I missing something? In case you were wondering, I was looking for specific parts of the file (I was using FileReadLine to read line per line) because I was searching for specific tags with StringInStr. I just wanted to strip all tags so I could convert the result to a _Date() format and use it to check the data against the current date. So getting the whole body wouldn't have helped . Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now