Enforcer Posted April 4, 2012 Share Posted April 4, 2012 (edited) Hi everybody I'm looking for way to clean convert HTML to TEXTI found few examples here (), tryed both scripts, but1 script - using StringRegExpReplace function that gives me fatal error when im using it on big web-sites2 script - using _IECreate function that working too slow and i dont wan't to create any new IE porcessesHere is my script that sometimes gives me FATAl error:expandcollapse popup#include <INet.au3> #include <Constants.au3> #Include <String.au3> #include <Array.au3> #Include <Misc.au3> #include <file.au3> #include <IE.au3> $DATA = _INetGetSource("any web site") checkcode() Func checkcode() local $x,$y,$lnx,$Content ;if StringLen($DATA)<90000 Then $Content = $DATA ;MsgBox(0,"XXX",$LINE&" "&StringLen($DATA)) $Content = StringStripCr($Content) $Content = StringRegExpReplace($Content, '<head>(.|n)+?</head>','') $Content = StringRegExpReplace($Content, '<script(.|n)+?/script>','') $Content = StringRegExpReplace($Content, '<!--(.|n)+?-->','') $Content = StringRegExpReplace($Content, '<(.|n)+?>','') $Content = StringRegExpReplace($Content, 'http://(.|n)+? ','') $Content = StringRegExpReplace($Content, 'ftp://(.|n)+? ','') $Content = StringRegExpReplace($Content, 'https://(.|n)+? ','') $Content = StringRegExpReplace($Content, 'www.(.|n)+? ','') $Content = StringReplace($Content, '<','') $Content = StringReplace($Content, '>','') $Content = StringReplace($Content, '<','<') $Content = StringReplace($Content, '>','>') $Content = StringReplace($Content, ' ',' ') $Content = StringReplace($Content, '©','©') $Content = StringReplace($Content, '“','"') $Content = StringReplace($Content, '»','»') $Content = StringReplace($Content, '«','«') $Content = StringReplace($Content, '”','"') $Content = StringReplace($Content, '"','"') $Content = StringReplace($Content, '&','&') $Content = StringReplace($Content, '•','•') $Content = StringReplace($Content, '•','•') $Content = StringReplace($Content, '‹','') $Content = StringReplace($Content, '›','') $Content = StringReplace($Content, "’","'") $Content = StringReplace($Content, "'","'") $Content = StringReplace($Content, '^[',' [') $Content = StringReplace($Content, ']^',' ]') $Content = StringReplace($Content, ' , ',', ') $Content = StringReplace($Content, ' : ',': ') $Content = StringReplace($Content, ' . ','. ') $Content = StringReplace($Content, ' ? ','? ') $Content = StringReplace($Content, ' ! ','! ') $Content = StringReplace($Content, ' ; ','; ') $Content = StringStripWS($Content, 4) FileWriteLine("DUMP.txt",$Content) EndfuncAny ideas how to do it HTML to TEXT coverstion ? Edited April 4, 2012 by Enforcer [RU] Zone Link to comment Share on other sites More sharing options...
guinness Posted April 4, 2012 Share Posted April 4, 2012 What about this >> UDF List: _AdapterConnections() • _AlwaysRun() • _AppMon() • _AppMonEx() • _ArrayFilter/_ArrayReduce • _BinaryBin() • _CheckMsgBox() • _CmdLineRaw() • _ContextMenu() • _ConvertLHWebColor()/_ConvertSHWebColor() • _DesktopDimensions() • _DisplayPassword() • _DotNet_Load()/_DotNet_Unload() • _Fibonacci() • _FileCompare() • _FileCompareContents() • _FileNameByHandle() • _FilePrefix/SRE() • _FindInFile() • _GetBackgroundColor()/_SetBackgroundColor() • _GetConrolID() • _GetCtrlClass() • _GetDirectoryFormat() • _GetDriveMediaType() • _GetFilename()/_GetFilenameExt() • _GetHardwareID() • _GetIP() • _GetIP_Country() • _GetOSLanguage() • _GetSavedSource() • _GetStringSize() • _GetSystemPaths() • _GetURLImage() • _GIFImage() • _GoogleWeather() • _GUICtrlCreateGroup() • _GUICtrlListBox_CreateArray() • _GUICtrlListView_CreateArray() • _GUICtrlListView_SaveCSV() • _GUICtrlListView_SaveHTML() • _GUICtrlListView_SaveTxt() • _GUICtrlListView_SaveXML() • _GUICtrlMenu_Recent() • _GUICtrlMenu_SetItemImage() • _GUICtrlTreeView_CreateArray() • _GUIDisable() • _GUIImageList_SetIconFromHandle() • _GUIRegisterMsg() • _GUISetIcon() • _Icon_Clear()/_Icon_Set() • _IdleTime() • _InetGet() • _InetGetGUI() • _InetGetProgress() • _IPDetails() • _IsFileOlder() • _IsGUID() • _IsHex() • _IsPalindrome() • _IsRegKey() • _IsStringRegExp() • _IsSystemDrive() • _IsUPX() • _IsValidType() • _IsWebColor() • _Language() • _Log() • _MicrosoftInternetConnectivity() • _MSDNDataType() • _PathFull/GetRelative/Split() • _PathSplitEx() • _PrintFromArray() • _ProgressSetMarquee() • _ReDim() • _RockPaperScissors()/_RockPaperScissorsLizardSpock() • _ScrollingCredits • _SelfDelete() • _SelfRename() • _SelfUpdate() • _SendTo() • _ShellAll() • _ShellFile() • _ShellFolder() • _SingletonHWID() • _SingletonPID() • _Startup() • _StringCompact() • _StringIsValid() • _StringRegExpMetaCharacters() • _StringReplaceWholeWord() • _StringStripChars() • _Temperature() • _TrialPeriod() • _UKToUSDate()/_USToUKDate() • _WinAPI_Create_CTL_CODE() • _WinAPI_CreateGUID() • _WMIDateStringToDate()/_DateToWMIDateString() • Au3 script parsing • AutoIt Search • AutoIt3 Portable • AutoIt3WrapperToPragma • AutoItWinGetTitle()/AutoItWinSetTitle() • Coding • DirToHTML5 • FileInstallr • FileReadLastChars() • GeoIP database • GUI - Only Close Button • GUI Examples • GUICtrlDeleteImage() • GUICtrlGetBkColor() • GUICtrlGetStyle() • GUIEvents • GUIGetBkColor() • Int_Parse() & Int_TryParse() • IsISBN() • LockFile() • Mapping CtrlIDs • OOP in AutoIt • ParseHeadersToSciTE() • PasswordValid • PasteBin • Posts Per Day • PreExpand • Protect Globals • Queue() • Resource Update • ResourcesEx • SciTE Jump • Settings INI • SHELLHOOK • Shunting-Yard • Signature Creator • Stack() • Stopwatch() • StringAddLF()/StringStripLF() • StringEOLToCRLF() • VSCROLL • WM_COPYDATA • More Examples... Updated: 22/04/2018 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now