Jump to content

Remove text between tags


Recommended Posts

I have .htm files and there is javascript.

<script type="text/javascript">
moment().format('MMMM Do YYYY, h:mm:ss a');
moment().format('dddd');
moment().format("MMM Do YY");
moment().format('YYYY [escaped] YYYY');
moment().format();
//--></script>

How to delete all javascipt code my .htm files? And footer!

#include <File.au3>

Local $aFileList = _FileListToArray(@ScriptDir, '*.htm', 1)
If Not @error Then
    For $i = 1 To $aFileList[0]
        _found(@ScriptDir & '\' & $aFileList[$i])
    Next
EndIf

Func _found($sFilePath)
    Local $hFileOpen = FileOpen($sFilePath, $FO_READ)
    Local $sFileRead = FileRead($hFileOpen)
    FileClose($hFileOpen)

    $hFileOpen = FileOpen($sFilePath, $FO_OVERWRITE)
   FileWrite($hFileOpen, StringRegExpReplace($sFileRead, '<footer>[^<>]+</footer>' ; ERROR
    Return FileClose($hFileOpen)
 EndFunc
Link to comment
Share on other sites

Something like this

$vString = "<script type=""text/javascript"">moment().format('MMMM Do YYYY, h:mm:ss a');moment().format('dddd');moment().format(""MMM Do YY"");moment().format('YYYY [escaped] YYYY');moment().format();//--></script>"

$vStringReg = StringRegExpReplace($vString, "(<script[^>]+>).*?(</script>)", "\1\2")

MsgBox(1, "StringRegExpReplace", $vStringReg)

You can remove the back-references to completely remove everything

Edited by mrflibblehat

[font="'courier new', courier, monospace;"]Pastebin UDF | Prowl UDF[/font]

Link to comment
Share on other sites

....

How to delete all javascipt code my .htm files? And footer! 

....

Try this.

Local $sFileContents = "Start of HTML <script type=""text/javascript"">moment().format('MMMM Do YYYY, h:mm:ss a');" & @LF & _
        "moment().format('dddd');moment().format(""MMM Do YY"");moment().format('YYYY [escaped] YYYY');" & @LF & _
        "moment().format();//--></script> Near end of Html<footer> A Foot </footer> End of Html"
;ConsoleWrite($vString & @LF)

Local $aModifiedFileContents = StringRegExpReplace($sFileContents, "(?is)(<script[^>]+javascript.*?/script>|<footer.+?/footer>)", "")

MsgBox(1, "StringRegExpReplace", $aModifiedFileContents)
Edited by Malkey
Link to comment
Share on other sites

Must take javascript tags and delete all:

<script type="text/javascript">

code

</script>

And FOOTER

<footer>

code

</footer>

This line can't used my script ???:  

= StringRegExpReplace($sFileContents, "(?is)(<script[^>]+javascript.*?/script>|<footer.+?/footer>)", "")




			
				


	Edited  by Read
	
	

			
		
Link to comment
Share on other sites

....

How to delete all javascipt code my .htm files? And footer!

 

 

#include <File.au3>

Local $aFileList = _FileListToArray(@ScriptDir, '*.htm', 1)
If Not @error Then
    For $i = 1 To $aFileList[0]
        _found(@ScriptDir & '\' & $aFileList[$i])
    Next
EndIf

Func _found($sFilePath)
    Local $hFileOpen = FileOpen($sFilePath, $FO_READ)
    Local $sFileRead = FileRead($hFileOpen)
    FileClose($hFileOpen)

    $hFileOpen = FileOpen($sFilePath, $FO_OVERWRITE)
   FileWrite($hFileOpen, StringRegExpReplace($sFileRead, '<footer>[^<>]+</footer>' ; ERROR
    Return FileClose($hFileOpen)
 EndFunc

 

If this doesn't work in your example,

= FileWrite($hFileOpen, StringRegExpReplace($sFileRead,"(?is)(<script[^>]+javascript.*?/script>|<footer.+?/footer>)", ""))

we'll fix your example.

 

Your example slightly changed to:-

#include <File.au3>

Local $aFileList = _FileListToArray(@ScriptDir, '*.htm', 1)
If Not @error Then
    For $i = 1 To $aFileList[0]
        _found(@ScriptDir & '\' & $aFileList[$i])
    Next
EndIf

Func _found($sFilePath)
    Local $sFileRead = FileRead($sFilePath) ; FileRead by itself auto-opens & closes
    Local $hFileOpen = FileOpen($sFilePath, $FO_OVERWRITE)
    FileWrite($hFileOpen, StringRegExpReplace($sFileRead, '(<footer>[^<>]+</footer>)', "")) ; Replaces what the reg. exp. pattern matches with "", nothing.
    Return FileClose($hFileOpen)
EndFunc   ;==>_found

should work now.

Edit: Added an extra bracket to my Reg.Exp.

Edited by Malkey
Link to comment
Share on other sites

In your regular expression pattern in the StringRegExpReplace() function you have "[^<>]" which means match any character that is not a "<" or a ">".
You can not match '<div class="container">' and '<nav id="medic">' to delete (replace with nothing) because of the four "<" and ">" "not-characters" present.

See [^ ... ] in the "Matching Characters" table under the StringRegExp function in the AutoIt help file for more of an explanation.

For a Regular Expression pattern you can use, dissect the Reg.Exp. pattern in post #3.  Guess, if you can't work it out.  Try something.  Change the "matching characters" in the RE pattern and notice any changes in the result.  That's what I'd do - test, debug, use the help file.

Good luck.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...