Sign in to follow this  
Followers 0
Cyber

Convert HTML in text

5 posts in this topic

Hi!

Can you help me please?

I need a function that convert HTML in Easy text

Exist?

Thanks :)


Console Browse: Navigate on the WEB in a textual consoleMultiPing!: Show computer on the lan and/or show the local/remote task, ALL animated!KillaWin: Event executingCryptPage: Crypt your webpage and show only with key

Share this post


Link to post
Share on other sites



May be this script can help you

$FilesString = FileOpenDialog( "Choose a folder", @ScriptDir, "HTML files (*.html)", 1+2)
if $FilesString = "" then exit

$Files = StringSplit($FilesString, "|")
$filename = $Files[$Files[0]]
ConvertAndWrite($filename)
exit

func ConvertAndWrite($FileName)
local $OldFile, $NewFile, $Line
  $OldFile = FileOpen ($FileName, 0)
  $NewFile = FileOpen ($FileName & ".txt", 1)
  $Content = FileRead($OldFile)
  $Content = StringStripCr($Content)
  If not @error Then
  ; Strip Head
   $Content = StringRegExpReplace($Content, '<head>(.|\n)+?</head>','')
   $Content = StringRegExpReplace($Content, '<script>(.|\n)+?</script>','')
   $Content = StringRegExpReplace($Content, '<(.|\n)+?>','')

     ; Replace HTML abbrev.
   $Content = StringReplace($Content, '&lt;','<')
   $Content = StringReplace($Content, '&gt;','>')
   $Content = StringReplace($Content, '&nbsp;',' ')
   $Content = StringReplace($Content, '&copy;','©')

   ; Replace Tab to space
   $Content = StringReplace($Content, '\r',' ')
  ; Strip double spaces
   while StringInStr($Content,'  ')
     $Content = StringReplace($Content, '  ',' ')
   wend

   ; Replace space + @Lf lines
   $Content = StringReplace($Content, ' ' & @Lf,@Lf)

  ; Strip empty lines
   while StringInStr($Content,@Lf & @Lf)
     $Content = StringReplace($Content, @Lf & @Lf, @Lf)
   wend


  ; Now you can write text 
   FileWrite($NewFile, $Content)
  endif
  FileClose($OldFile)
  FileClose($NewFile)
endfunc

The point of world view

Share this post


Link to post
Share on other sites

Some fixes:

Replace

$FilesString = FileOpenDialog( "Choose a folder", @ScriptDir, "HTML or XML files (*.html)", 1+2)

to

$FilesString = FileOpenDialog( "Choose a folder", @ScriptDir, "HTML files (*.html;*.htm)", 1+2)

And add line

$Content = StringReplace($Content, '&quot','"')

after line

$Content = StringReplace($Content, '&copy;','©')

The point of world view

Share this post


Link to post
Share on other sites

Here's a working example.

#include <IE.au3>

$sHTML = ""
$sHTML &= "<HTML>" & @CR
$sHTML &= "<HEAD>" & @CR
$sHTML &= "<TITLE>HTML Test Page</TITLE>" & @CR
$sHTML &= "</HEAD>" & @CR
$sHTML &= "<BODY>" & @CR
$sHTML &= "<h1>Here is some text within HTML tags.</h1>" & @CR
$sHTML &= "Some more text." & @CR
$sHTML &= "<p>  " & @CR
$sHTML &= "I think we have accomplished our goal!" & @CR
$sHTML &= "</BODY>" & @CR
$sHTML &= "</HTML>"

$oIE = _IECreate()
_IEDocWriteHTML($oIE, $sHTML)
ConsoleWrite(_IEBodyReadText($oIE) & @CR)

Share this post


Link to post
Share on other sites

fantastic!

Thanks!!


Console Browse: Navigate on the WEB in a textual consoleMultiPing!: Show computer on the lan and/or show the local/remote task, ALL animated!KillaWin: Event executingCryptPage: Crypt your webpage and show only with key

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0