Flawblure Posted February 12, 2006 Share Posted February 12, 2006 Yes, this is similar to the other topic about "Stripping HTML from text" But rather than just get a single string such as "dfg" I just want to get rid of all the HTML and javascript, keeping only the text. I figure the easiest way to do it would be to just get rid of everything in between the " < > "'s, and I got that working. But I have no idea how to get rid of the javascript parts.Using Gene's Strip HTML script I got it down to:flawblure function mpd(x,y) { document.df.dx.value=x; document.df.dy.value=y; document.df.submit();}function delmes(m) { document.message.action='/'; document.message.DelMessage.value=m; document.message.submit();}function replym(m) { document.message.action='message.php'; document.message.ReplyMes.value=m; document.message.submit();}function newmsg(p) { document.message.action='message.php'; document.message.WriteTo.value=p; document.message.submit();}inventing (x8) (54 minutes) function move(dir) { if(!dir) return; document.form1.Act2.value = dir; if(dir!='center') document.form1.Action.value = "move"; document.form1.submit(); } Display:TerrainHeightRoadsBriefreloadbuildwarknowledgemagicmessagespossessionsmapforumsskillsoptionsoverviewcontract(19)(4)(3)(40)Chimpy(4)(3)Adrian(3)(110)(120)(120)SpankyRedhillPagezeuschumbawumbaflawblure 80(120)(120)(10)lordgarionfrom jwrtolkien, 1h32m ago: Snork? reply delete3h19m ago: You drift into fanciful reverie... delete5h50m ago: You had a glimmer of insight, but couldn't shape it into an invention... deleteFeb 11th, 6:58: How unfortunate, their head is already empty. delete(not shown: 815 messages)19:25(This is an online game btw) But I have no idea how to get it further than that... I want to be able to get rid of the javascript so that the end result looks something like:flawblureinventing (x8) (54 minutes) ChimpyAdrianSpankyRedhillPagezeuschumbawumbaflawblure 80lordgarionfrom jwrtolkien, 1h32m ago: Snork? delete3h19m ago: You drift into fanciful reverie... delete5h50m ago: You had a glimmer of insight, but couldn't shape it into an invention... deleteFeb 11th, 6:58: How unfortunate, their head is already empty delete(not shown: 815 messages)19:25The other problem is that just getting rid of that wont work... since as its a game, what it gets rid of and keeps would vary based on the situation... is there a way to do this? Link to comment Share on other sites More sharing options...
cdkid Posted February 12, 2006 Share Posted February 12, 2006 well to help with the javascript have it destroy everything between <script language="javascript"> or w/e the script tag is and </script> AutoIt Console written in C#. Write au3 code right at the console :D_FileWriteToLineWrite to a specific line in a file.My UDF Libraries: MySQL UDF Library version 1.6 MySQL Database UDF's for AutoItI have stopped updating the MySQL thread above, all future updates will be on my SVN. The svn location is:kan2.sytes.net/publicsvn/mysqlnote: This will still be available, but due to my new job, and school hours, am no longer developing this udf.My business: www.hirethebrain.com Hire The Brain HireTheBrain.com Computer Consulting, Design, Assembly and RepairOh no! I've commited Scriptocide! Link to comment Share on other sites More sharing options...
Flawblure Posted February 12, 2006 Author Share Posted February 12, 2006 well to help with the javascript have it destroy everything between <script language="javascript"> or w/e the script tag is and </script>How would I do that? The HTML strip one already destroys the <script language="javascript"> and </script> without removing whats inbetween. Link to comment Share on other sites More sharing options...
cdkid Posted February 12, 2006 Share Posted February 12, 2006 well, i havetn seen the HTML strip so i donno if im being useless or not, but you could edit it to disregard (when looping thru all tags) <script> tags and then after that have it delete everything between the <script> tags. sorry, i know i have horrible grammar. --hope this helps ~cdkid AutoIt Console written in C#. Write au3 code right at the console :D_FileWriteToLineWrite to a specific line in a file.My UDF Libraries: MySQL UDF Library version 1.6 MySQL Database UDF's for AutoItI have stopped updating the MySQL thread above, all future updates will be on my SVN. The svn location is:kan2.sytes.net/publicsvn/mysqlnote: This will still be available, but due to my new job, and school hours, am no longer developing this udf.My business: www.hirethebrain.com Hire The Brain HireTheBrain.com Computer Consulting, Design, Assembly and RepairOh no! I've commited Scriptocide! Link to comment Share on other sites More sharing options...
Flawblure Posted February 12, 2006 Author Share Posted February 12, 2006 Bah, grammar doesnt matter I'd edit the HTML strip but I didn't make it so I don't really know how it works. lol This is it though. Global $sWorkVar, $sWorkVar2, $iCodeFlag, $var, $i, $char $char = 0 $sFilePath = FileOpenDialog ( "Select HTML file to strip.", "My Computer", "HTML (*.html)|HTM (*.htm)" , 1 ) $begin = TimerInit() $sFileContent = FileRead($sFilePath) $sWorkVar = $sFileContent While 1 If StringLeft($sWorkVar, 1) = "<" Then $iCodeFlag = 1 EndIf If StringLeft($sWorkVar, 1) = ">" Then $iCodeFlag = 0 $sWorkVar = StringTrimLeft($sWorkVar, 1) EndIf While $iCodeFlag = 1 If StringLeft($sWorkVar, 1) = ">" Then ExitLoop $sWorkVar = StringTrimLeft($sWorkVar, 1) WEnd While $iCodeFlag = 0 If StringLeft($sWorkVar, 1) = "<" Then ExitLoop If Not StringInStr($sWorkVar, ">") Then ExitLoop $sWorkVar2 = $sWorkVar2 & StringLeft($sWorkVar, 1) $sWorkVar = StringTrimLeft($sWorkVar, 1) WEnd If Not StringInStr($sWorkVar, ">") Then $sWorkVar2 = $sWorkVar2 & $sWorkVar ExitLoop EndIf WEnd $var = IniReadSection(@ScriptDir & "\Strip HTML.ini", "BoilerPlate") If @error Then MsgBox(4096, "", "Error occured, probably no INI file.") Else For $i = 1 To $var[0][0] $sWorkVar2 = StringReplace($sWorkVar2,$var[$i][1],"") ;StringReplace ( "string", "searchstring", "replacestring") Next While StringInStr($sWorkVar2,@CRLF) Or StringInStr($sWorkVar2,@CR) Or StringInStr($sWorkVar2,@LF) $sWorkVar2 = StringReplace($sWorkVar2,@CRLF,"") $sWorkVar2 = StringReplace($sWorkVar2,@CR,"") $sWorkVar2 = StringReplace($sWorkVar2,@LF,"") WEnd EndIf FileWrite ( @ScriptDir & "\Stripped.TXT", @YEAR & "/" & @MON & "/" & @MDAY & " " & @HOUR & ":" & @MIN & ":" & @SEC & $sWorkVar2 & @CRLF ) $dif = Round ( (TimerDiff($begin)/1000) , 4 ) MsgBox(0,"Time To Process The File",$dif & " seconds...", 5) MsgBox(0,"Result","The stripped data is " & $sWorkVar2, 5 ) Exit Link to comment Share on other sites More sharing options...
cdkid Posted February 12, 2006 Share Posted February 12, 2006 alright, i'll see what i can do with this... gimme a few minutes to look over it AutoIt Console written in C#. Write au3 code right at the console :D_FileWriteToLineWrite to a specific line in a file.My UDF Libraries: MySQL UDF Library version 1.6 MySQL Database UDF's for AutoItI have stopped updating the MySQL thread above, all future updates will be on my SVN. The svn location is:kan2.sytes.net/publicsvn/mysqlnote: This will still be available, but due to my new job, and school hours, am no longer developing this udf.My business: www.hirethebrain.com Hire The Brain HireTheBrain.com Computer Consulting, Design, Assembly and RepairOh no! I've commited Scriptocide! Link to comment Share on other sites More sharing options...
Flawblure Posted February 12, 2006 Author Share Posted February 12, 2006 alright, i'll see what i can do with this... gimme a few minutes to look over itAwesome, thanks! Link to comment Share on other sites More sharing options...
cdkid Posted February 12, 2006 Share Posted February 12, 2006 (edited) erg.. having a major brainfart, this could take a bit longer could you jsut add a "stringreplace(putallthejavascritpcodehere,'')" to your script? Edited February 12, 2006 by cdkid AutoIt Console written in C#. Write au3 code right at the console :D_FileWriteToLineWrite to a specific line in a file.My UDF Libraries: MySQL UDF Library version 1.6 MySQL Database UDF's for AutoItI have stopped updating the MySQL thread above, all future updates will be on my SVN. The svn location is:kan2.sytes.net/publicsvn/mysqlnote: This will still be available, but due to my new job, and school hours, am no longer developing this udf.My business: www.hirethebrain.com Hire The Brain HireTheBrain.com Computer Consulting, Design, Assembly and RepairOh no! I've commited Scriptocide! Link to comment Share on other sites More sharing options...
Flawblure Posted February 12, 2006 Author Share Posted February 12, 2006 erg.. having a major brainfart, this could take a bit longercould you jsut add a "stringreplace(putallthejavascritpcodehere,'')" to your script?Problem with that is the actual javascript code changes from time to time :/ Link to comment Share on other sites More sharing options...
cdkid Posted February 12, 2006 Share Posted February 12, 2006 hhmm... i got an idea $file = FileOpenDialog('HTML FILE',@DESKTOPDIR,'HTML files (*.html)|htm files (*.htm)') $file = FileRead($file) $jsstart = StringInStr($file, "<script") $jsstop = StringInStr($file, "</script") StringReplace($file, $jsstart, '', $jsstop - $jsstart) put this b4 the html stripper and i think it should work... havent tested. --hope this helps ~cdkid AutoIt Console written in C#. Write au3 code right at the console :D_FileWriteToLineWrite to a specific line in a file.My UDF Libraries: MySQL UDF Library version 1.6 MySQL Database UDF's for AutoItI have stopped updating the MySQL thread above, all future updates will be on my SVN. The svn location is:kan2.sytes.net/publicsvn/mysqlnote: This will still be available, but due to my new job, and school hours, am no longer developing this udf.My business: www.hirethebrain.com Hire The Brain HireTheBrain.com Computer Consulting, Design, Assembly and RepairOh no! I've commited Scriptocide! Link to comment Share on other sites More sharing options...
Flawblure Posted February 12, 2006 Author Share Posted February 12, 2006 hhmm... i got an idea $file = FileOpenDialog('HTML FILE',@DESKTOPDIR,'HTML files (*.html)|htm files (*.htm)') $file = FileRead($file) $jsstart = StringInStr($file, "<script") $jsstop = StringInStr($file, "</script") StringReplace($file, $jsstart, '', $jsstop - $jsstart) put this b4 the html stripper and i think it should work... havent tested. --hope this helps ~cdkid I ran it, got the dialogue box twice, didnt change.... Is that because its not saving it when it gets rid of the javascript? Link to comment Share on other sites More sharing options...
cdkid Posted February 12, 2006 Share Posted February 12, 2006 yes, that's why. this is just something that u can build off of AutoIt Console written in C#. Write au3 code right at the console :D_FileWriteToLineWrite to a specific line in a file.My UDF Libraries: MySQL UDF Library version 1.6 MySQL Database UDF's for AutoItI have stopped updating the MySQL thread above, all future updates will be on my SVN. The svn location is:kan2.sytes.net/publicsvn/mysqlnote: This will still be available, but due to my new job, and school hours, am no longer developing this udf.My business: www.hirethebrain.com Hire The Brain HireTheBrain.com Computer Consulting, Design, Assembly and RepairOh no! I've commited Scriptocide! Link to comment Share on other sites More sharing options...
Flawblure Posted February 12, 2006 Author Share Posted February 12, 2006 yes, that's why. this is just something that u can build off ofOk, got it running using this:$file = _IEBodyReadHTML($oGK)$jsstart = StringInStr($file, "<script")$jsstop = StringInStr($file, "</script")$JSstripped = StringReplace($file, $jsstart, '', $jsstop - $jsstart)FileWrite ( @ScriptDir & "\JSStripped.TXT", @YEAR & "/" & @MON & "/" & @MDAY & " " & @HOUR & ":" & @MIN & ":" & @SEC & $JSstripped & @CRLF )The problem is what it writes to the file JSStripped.txt is no differant that actual source of the page, still alot of <script> and </SCRIPT>'s.... Did I do anything to the code that would make it not work? Link to comment Share on other sites More sharing options...
MHz Posted February 12, 2006 Share Posted February 12, 2006 Add this just before your While loop. Do $start_js = StringInStr($sWorkVar, '<script language=') $end_js = StringInStr($sWorkVar, '</script>') If $start_js And $end_js Then $sScriptline = StringMid($sWorkVar, $start_js, ($end_js - $start_js) + 9) If $sScriptline Then $sWorkVar = StringReplace($sWorkVar, $sScriptline, '') EndIf Until Not $start_js Or Not $end_js Link to comment Share on other sites More sharing options...
Flawblure Posted February 12, 2006 Author Share Posted February 12, 2006 (edited) Add this just before your While loop. Do $start_js = StringInStr($sWorkVar, '<script language=') $end_js = StringInStr($sWorkVar, '</script>') If $start_js And $end_js Then $sScriptline = StringMid($sWorkVar, $start_js, ($end_js - $start_js) + 9) If $sScriptline Then $sWorkVar = StringReplace($sWorkVar, $sScriptline, '') EndIf Until Not $start_js Or Not $end_js I dont have a While loop... Erm... that would completely screw it all up, wouldnt it :/. EDIT: ohh... your talking about the HTML strip one, nvm EDIT2: It works!! Thanks! Edited February 12, 2006 by Flawblure Link to comment Share on other sites More sharing options...
DaleHohm Posted February 12, 2006 Share Posted February 12, 2006 (edited) Yes, this is similar to the other topic about "Stripping HTML from text" But rather than just get a single string such as "dfg" I just want to get rid of all the HTML and javascript, keeping only the text. I figure the easiest way to do it would be to just get rid of everything in between the " < > "'s, and I got that working. But I have no idea how to get rid of the javascript parts. Using Gene's Strip HTML script I got it down to: (This is an online game btw) But I have no idea how to get it further than that... I want to be able to get rid of the javascript so that the end result looks something like: The other problem is that just getting rid of that wont work... since as its a game, what it gets rid of and keeps would vary based on the situation... is there a way to do this?You could consider letting IE and the DOM do the heavy lifting for you... something like: #include <IE.au3> $oIE = _IECreate() _IENavigate($oIE,"c:\yourfile") $myText = $oIE.document.innerText Dale edit: typo Edited February 12, 2006 by DaleHohm Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model Automate input type=file (Related) Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded Better Better? IE.au3 issues with Vista - Workarounds SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead? Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble Link to comment Share on other sites More sharing options...
neogia Posted February 12, 2006 Share Posted February 12, 2006 Could you post the raw HTML file before, and what you would want it to look like after? Maybe a couple examples, and then I could work something up. Regular Expressions reign supreme in this area. [u]My UDFs[/u]Coroutine Multithreading UDF LibraryStringRegExp GuideRandom EncryptorArrayToDisplayString"The Brain, expecting disaster, fails to find the obvious solution." -- neogia Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now