Jump to content

Recommended Posts

Posted

hello,  i am looking for a way to save a webpage as mht file [ without using browser app] ,

i would need someting that having as input like "url" ,"final destination / folder" and "name.mht" save the page.

I found an oooold executable ( 2007) "SavePage" but it works halfway , a lot of things are non written; 

and i have a similar result  with examples found in forum post like :

" convert to mht  "

 

 

 

 

the code  i found  is  

Global $iMsg, $Flds, $iConf, $prueba, $cdoSuppressAll

$iMsg = ObjCreate("CDO.Message") ; Create Message object
$iConf = ObjCreate("CDO.Configuration") ; Create Message Configuration Object

$Flds = $iConf.Fields
$iMsg.CreateMHTMLBody("http://mangazukinew.online/tensei-ouji-wa-gakuen-demo-daraketai-side-story-chapter-1",0);("http://www.dbforums.com/archive/index.php/t-783832.html", 0) ;, "Username", "Password" 'If needed (user & pass)

Global $Stm
$Stm = ObjCreate("ADODB.Stream")
$Stm.Type = 2 ; TypeBinary
$Stm.Charset = "US-ASCII"
$Stm.Open
Global $iDsrc, $Filepath
$Filepath = "G:\TEST\testAu3.mht";"c:\archivename.mht" ; Path to save the file & filename
$iDsrc = $iMsg.DataSource ; response.Write("Ha cargado el contenido de la pagina en el stream")
$iDsrc.SaveToObject($Stm, "_Stream")
FileDelete($Filepath) ; overwrite seems invalid
$Stm.SaveToFile($Filepath, 1) ; 1 = overwrite if file exists

and the result ( file.mht) is almost (+/-) the same

 and i can't use the function in the same topic  because it report an error / lack of "something" declaration on the <CDO.Message>  

#include <ie.au3>
#include <MsgBoxConstants.au3>

 _INetGetMHT("http://mangazukinew.online/tensei-ouji-wa-gakuen-demo-daraketai-side-story-chapter-1" , "G:\test\testAU3.mht");
Func _INetGetMHT( $url, $file )
        Local $msg = ObjCreate("CDO.Message"),
    If @error Then Return False
    Local $ado = ObjCreate("ADODB.Stream")
    If @error Then Return False

    With $ado
        .Type = 2
        .Charset = "US-ASCII"
        .Open
    EndWith
    $msg.CreateMHTMLBody($url, 0)
    $msg.DataSource.SaveToObject($ado, "_Stream")
    FileDelete($file)
    $ado.SaveToFile($file, 1)
    $msg = ""
    $ado = ""
    Return True
 EndFunc

 

so , how  update the code in order to get more data in the obtainable file.mht ?

or can a better result be obtained by completely changing the commands? I need a hand, please.

I would need a result similar to the one obtainable with the "add on" for  browsers such as <Save as Mht> for chrome...
help ...

 

 

 

 

 

Posted (edited)

thanks for the reply.

the trailing comma after "CDO.Message" I had noticed after posting, what does not satisfy me is the final result, that is the file.mht obtained.

The idea was to get a result very similar to what you get using an app like "Save As Mht"  in chrome [ h##ps://github.com/vsDizzy/SaveAsMHT   or   h##ps://chrome.google.com/webstore/detail/save-as-mht/hfmodljjaibbdndlikgagimhhodmobkc/related?hl=it  ] / or to the chrome <"inspect" + reload page> command.

To simplify, I do not need to save Everything in a single file [images, scripts, css, ...] (the weight of the file would go to the sky), what I would like to obtain is only the equivalent of the HTML code of the page AFTER the complete loading , that is when script / java (etc.) have modified it by inserting references to links of images or text that are not present on the starting "html" page.

[  or to the chrome <"inspect" + reload page> command ].

So I was wondering if you could somehow improve the script by integrating / inspired by the app code or if similar functions have already been written over the years.

 

place an example:

result obtained with the script  [1] (492Kb) and  result  obtained with app for chrom [2] (13Mb)

  Reveal hidden contents
  Reveal hidden contents

 

 

 

Edited by BLNJ000
insert example
Posted

 I would also like to get this part of html which at the moment is not inserted in the.mht file created by the script: 

for example  view  part 1    and  part  2 :

part 1 [ from file AAA.mht obtained from app Chrome (13mb)] 

  Reveal hidden contents

 

or  this part (02 ) [ link + "Ctrl+Shift+I" + reload page ]

 

  Reveal hidden contents

 

 

I do not need to store images, videos and the like in the final mht, I would ONLY need the HTML code of the page AFTER the complete loading (i.e. when scripts and commands inside it have generated and modified the various <tag>, <url> , <link> that are not in the "simple" html page).

I need to grabb the html codeof the final moment, after the complete modification by the various .js .css. java, app, api etc.

 

This code [ is good] but grab just something , ( such as strings written by async commands ?  or .Js ?)

the part in the 2nd spoiler it should have been added to the html via a * .js loaded with the web page.

 

 

 

 

 

 

 

 

 

 

Posted (edited)

if you added a sort of  "wait tot seconds"  between the open url and the html save could it allow the capture of the html variations introduced by the asynchronous scripts / commands?

If yes ... how to do it?

 

 

I do not need to store images, videos and the like in the final mht, I would ONLY need the HTML code of the page AFTER the complete loading (i.e. when scripts and commands inside it have generated and modified the various <tag>, <url> , <link> that are not in the "simple" html page).

I need to grabb the html codeof the final moment, after the complete modification by the various .js .css. java, app, api etc.

 

Edited by BLNJ000

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...