Jump to content

Convert HTML in text


Cyber
 Share

Recommended Posts

May be this script can help you

$FilesString = FileOpenDialog( "Choose a folder", @ScriptDir, "HTML files (*.html)", 1+2)
if $FilesString = "" then exit

$Files = StringSplit($FilesString, "|")
$filename = $Files[$Files[0]]
ConvertAndWrite($filename)
exit

func ConvertAndWrite($FileName)
local $OldFile, $NewFile, $Line
  $OldFile = FileOpen ($FileName, 0)
  $NewFile = FileOpen ($FileName & ".txt", 1)
  $Content = FileRead($OldFile)
  $Content = StringStripCr($Content)
  If not @error Then
  ; Strip Head
   $Content = StringRegExpReplace($Content, '<head>(.|\n)+?</head>','')
   $Content = StringRegExpReplace($Content, '<script>(.|\n)+?</script>','')
   $Content = StringRegExpReplace($Content, '<(.|\n)+?>','')

     ; Replace HTML abbrev.
   $Content = StringReplace($Content, '&lt;','<')
   $Content = StringReplace($Content, '&gt;','>')
   $Content = StringReplace($Content, '&nbsp;',' ')
   $Content = StringReplace($Content, '&copy;','©')

   ; Replace Tab to space
   $Content = StringReplace($Content, '\r',' ')
  ; Strip double spaces
   while StringInStr($Content,'  ')
     $Content = StringReplace($Content, '  ',' ')
   wend

   ; Replace space + @Lf lines
   $Content = StringReplace($Content, ' ' & @Lf,@Lf)

  ; Strip empty lines
   while StringInStr($Content,@Lf & @Lf)
     $Content = StringReplace($Content, @Lf & @Lf, @Lf)
   wend


  ; Now you can write text 
   FileWrite($NewFile, $Content)
  endif
  FileClose($OldFile)
  FileClose($NewFile)
endfunc

The point of world view

Link to comment
Share on other sites

Some fixes:

Replace

$FilesString = FileOpenDialog( "Choose a folder", @ScriptDir, "HTML or XML files (*.html)", 1+2)

to

$FilesString = FileOpenDialog( "Choose a folder", @ScriptDir, "HTML files (*.html;*.htm)", 1+2)

And add line

$Content = StringReplace($Content, '&quot','"')

after line

$Content = StringReplace($Content, '&copy;','©')

The point of world view

Link to comment
Share on other sites

  • Moderators

Here's a working example.

#include <IE.au3>

$sHTML = ""
$sHTML &= "<HTML>" & @CR
$sHTML &= "<HEAD>" & @CR
$sHTML &= "<TITLE>HTML Test Page</TITLE>" & @CR
$sHTML &= "</HEAD>" & @CR
$sHTML &= "<BODY>" & @CR
$sHTML &= "<h1>Here is some text within HTML tags.</h1>" & @CR
$sHTML &= "Some more text." & @CR
$sHTML &= "<p>  " & @CR
$sHTML &= "I think we have accomplished our goal!" & @CR
$sHTML &= "</BODY>" & @CR
$sHTML &= "</HTML>"

$oIE = _IECreate()
_IEDocWriteHTML($oIE, $sHTML)
ConsoleWrite(_IEBodyReadText($oIE) & @CR)
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...