Jump to content

Removing Characters Contained in HTML Tags


Go to solution Solved by Melba23,

Recommended Posts

Good afternoon,

I was wondering if there was a simple way of removing everything that is contained between the '<' and '>' characters in a string.

I'm using AutoIT to pull information from HTML files and I need any tags removed.

Example:

<br /><span style="font-size: 14pt; font-weight: normal; font-style: italic;">(téléchargement manuel ou guide de référence)</span> 

I have to strip out everything contained in <> tags.

But the StringReplace function can't help me, because the tags are different depending on the content.  I run about 4,000 files through the script.

Any help is appreciated!

PS:  I'm not including my full code because there are waaaaaaay too many functions in there not relating to this.  I just need to find a way to strip the strings of all tags and their content.

Link to comment
Share on other sites

  • Moderators
  • Solution

Kevitto,

Just what RegExes are designed for: ;)

$sString = '<br /><span style="font-size: 14pt; font-weight: normal; font-style: italic;">(téléchargement manuel ou guide de référence)</span> '

$sStripped = StringRegExpReplace($sString, "(?U)(<.*>)", "")

ConsoleWrite($sStripped & @CRLF)
Decode:

(?U)     - Not greedy - look for smallest match
(<.*>)   - Look for anything between <>

""       - Replace any found strings with an empty string
All clear? :)

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

As someone else once pointed out why not this from the helpfile?

; Open a browser with the basic example, read the body Text
; (the content with all HTML tags removed) and display it in a MsgBox

#include <IE.au3>
#include <MsgBoxConstants.au3>

Local $oIE = _IECreate("http://www.pri.org/about-pri")
Local $sText = _IEBodyReadText($oIE)
ConsoleWrite($sText & @CRLF)
_IEQuit($oIE)

 or I'm I missing something?

 

Link to comment
Share on other sites

  • 2 weeks later...

As someone else once pointed out why not this from the helpfile?

; Open a browser with the basic example, read the body Text
; (the content with all HTML tags removed) and display it in a MsgBox

#include <IE.au3>
#include <MsgBoxConstants.au3>

Local $oIE = _IECreate("http://www.pri.org/about-pri")
Local $sText = _IEBodyReadText($oIE)
ConsoleWrite($sText & @CRLF)
_IEQuit($oIE)

 or I'm I missing something?

 

In case you were wondering, I was looking for specific parts of the file (I was using FileReadLine to read line per line) because I was searching for specific tags with StringInStr.  I just wanted to strip all tags so I could convert the result to a _Date() format and use it to check the data against the current date.

So getting the whole body wouldn't have helped :P.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...