Search the Community
Showing results for tags 'another question'.
-
Hi everybody I'm looking for way to clean convert HTML to TEXT I found few examples here (), tryed both scripts, but 1 script - using StringRegExpReplace function that gives me fatal error when im using it on big web-sites 2 script - using _IECreate function that working too slow and i dont wan't to create any new IE porcesses Here is my script that sometimes gives me FATAl error: #include <INet.au3> #include <Constants.au3> #Include <String.au3> #include <Array.au3> #Include <Misc.au3> #include <file.au3> #include <IE.au3> $DATA = _INetGetSource("any web site") checkcode() Func checkcode() local $x,$y,$lnx,$Content ;if StringLen($DATA)<90000 Then $Content = $DATA ;MsgBox(0,"XXX",$LINE&" "&StringLen($DATA)) $Content = StringStripCr($Content) $Content = StringRegExpReplace($Content, '<head>(.|n)+?</head>','') $Content = StringRegExpReplace($Content, '<script(.|n)+?/script>','') $Content = StringRegExpReplace($Content, '<!--(.|n)+?-->','') $Content = StringRegExpReplace($Content, '<(.|n)+?>','') $Content = StringRegExpReplace($Content, 'http://(.|n)+? ','') $Content = StringRegExpReplace($Content, 'ftp://(.|n)+? ','') $Content = StringRegExpReplace($Content, 'https://(.|n)+? ','') $Content = StringRegExpReplace($Content, 'www.(.|n)+? ','') $Content = StringReplace($Content, '<','') $Content = StringReplace($Content, '>','') $Content = StringReplace($Content, '<','<') $Content = StringReplace($Content, '>','>') $Content = StringReplace($Content, ' ',' ') $Content = StringReplace($Content, '©','©') $Content = StringReplace($Content, '“','"') $Content = StringReplace($Content, '»','»') $Content = StringReplace($Content, '«','«') $Content = StringReplace($Content, '”','"') $Content = StringReplace($Content, '"','"') $Content = StringReplace($Content, '&','&') $Content = StringReplace($Content, '•','•') $Content = StringReplace($Content, '•','•') $Content = StringReplace($Content, '‹','') $Content = StringReplace($Content, '›','') $Content = StringReplace($Content, "’","'") $Content = StringReplace($Content, "'","'") $Content = StringReplace($Content, '^[',' [') $Content = StringReplace($Content, ']^',' ]') $Content = StringReplace($Content, ' , ',', ') $Content = StringReplace($Content, ' : ',': ') $Content = StringReplace($Content, ' . ','. ') $Content = StringReplace($Content, ' ? ','? ') $Content = StringReplace($Content, ' ! ','! ') $Content = StringReplace($Content, ' ; ','; ') $Content = StringStripWS($Content, 4) FileWriteLine("DUMP.txt",$Content) Endfunc Any ideas how to do it HTML to TEXT coverstion ?