Jump to content

Removing HTML-style tags from text using a script


leuce
 Share

Recommended Posts

G'day everyone

I have pieces of text with HTML-style tags in them, like this:

<a0>This is a <a0/><b1>house<b1/>.

and I'd like to use a script to remove the tags, so that I end up wit this:

This is a house.

Of course, I could copy the text, past it into a text editor, and use regex find/replace to remove the tags, then copy the text again, but that is a long way and it is dependent on the user's computer having the correct text editor installed. I was hoping that there is some way in AutoIt itself to do this.

Thanks

Samuel

Link to comment
Share on other sites

  • Moderators

Use File Read or _InetGetSource() and try this:

$sString = "<a0>This is a <a0/><b1>house<b1/>."
$sString = StringRegExpReplace($sString, '(?s)(?i)\<[^\>]*\>', '')
MsgBox(0,'', $sString)
The original $sString will be the FileRead() or _InetGetSource().

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

#include <File.au3>

Dim $lines, $file = "test.txt"

If Not _FileReadToArray($file, $lines) Then
    MsgBox(16, "Error 1", "File could not be read to array.")
    Exit
Else
    If IsArray($lines) And $lines[0] > 0 Then
        For $i = 1 To $lines[0]
            $lines[$i] = StringRegExpReplace($lines[$i], '(?s)(?i)\<[^\>]*\>', '')
        Next
        FileDelete($file)
        _FileWriteFromArray($file, $lines, 1)
    EndIf
EndIf

Something like that.

Link to comment
Share on other sites

  • Moderators
#include <File.au3>

Dim $lines, $file = "test.txt"

If Not _FileReadToArray($file, $lines) Then
    MsgBox(16, "Error 1", "File could not be read to array.")
    Exit
Else
    If IsArray($lines) And $lines[0] > 0 Then
        For $i = 1 To $lines[0]
            $lines[$i] = StringRegExpReplace($lines[$i], '(?s)(?i)\<[^\>]*\>', '')
        Next
        FileDelete($file)
        _FileWriteFromArray($file, $lines, 1)
    EndIf
EndIfoÝ÷ Ù*&zØb bëajßêº^6ájÝý²z-­"y¢Óhç~ÅWy§S 
ëk$­®)àEèÆ^¦V{¥Ú"µ©eÂ+-²¶§X¤y«­¢+ØÀÌØíÍMÑÉ¥¹=ÕÑAÕÐôMÑÉ¥¹IáÁIÁ±¡¥±I ÌäíÑÍйÑáÐÌä줰Ìäì ý̤ ý¤¤ÀäÈì±ÐímxÀäÈìÐít¨ÀäÈìÐìÌäì°ÌäìÌäì¤
All you need is 1 line really... and a FileWrite if you are going to write it to a file.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

  • Moderators

I guess you're right. Could just read the entire file into a variable regardless of line breaks. Not really sure what I was thinking.

If it's any consolation, when I was writing it, I was trying to do it the hard way myself with just StringRegExp() and took me 20 minutes to realize that I was just making it hard on myself.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

  • Moderators

Thanks, everyone. All your solutions give me the answer "1", but at least now I know what I should tinker with.

You need Beta 3.2.1.8 at least for it to work, and the answer with be right.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...