Sign in to follow this  
Followers 0
nitro322

MHTML file extractor/decoder

17 posts in this topic

I could use some assistance with this program. Before I get into the details of my problem, let me give some details on what the program does.

I wrote a utility for decoding MHTML files and extracting individual files from it. MHTML files are essentially single file web page archives, which can be created with Internet Explorer and various other Microsoft applications. MHT files are essentially ASCII text files, with each nested file broken up into its own "part". Binary parts are encoded with base64, and text parts are encoded with quoted-printable.

So, in order to split this file back up into its individual components, I read through the file line by line searching for the boundary between parts. I copy all of the encoded data for one particular part into memory, run the decode, write it out to a new file, then proceed with the next part.

Functionally it works fine, but as the subject says it works very slowly, using all CPU resources in the process. I can't figure out why, though my best guess would be that file reading and/or decoding algorithms are simply very inefficient.

I wrote the QPDec function myself, and I used the B64Dec function that jjohn posted here.

I know this is a pretty general and vague question, but does anyone have any suggestions for improvement here? I'm sure the QPDec function can be optimized (running a stringreplace on the entire data stream for each and every detected code has to be very expensive), and I just don't understand the B64Dec function well enough to try to tune it any.

Like I said, it's a pretty vague question, but I know there are some hardcore AutoIt users here that could code circles around me, so I'd really appreciate any assistance you could provide.

That attached RAR file contains both the source code (requires v3.2) and an example MHT file. Thanks a bunch!

extractMHT.rar

Share this post


Link to post
Share on other sites



there is NO way that could have worked at ALL

i tried to clean it upp... not sure if i got it all or not

************ not tested

; ----------------------------------------------------------------------------
;
; extractMHT v1.0
; Author:    Jared Breland <jbreland@legroom.net>
; Homepage:    http://www.legroom.net/mysoft
; Language:    AutoIt v3.2.0.1
; License:    GNU General Public License (http://www.gnu.org/copyleft/gpl.html)
;
; Script Function:
;    Extract files from MHT harchives
;
; ----------------------------------------------------------------------------
; Setup environment
#include <GUIConstants.au3>
#include <file.au3>
Opt("ExpandVarStrings", 1)
Opt("GUIOnEventMode", 1)
Global $name = "extractMHT"
Global $version = "1.0"
Global $title = $name & $version
Global $mht, $outdir, $filedir, $filename, $boundary
Global $part, $parts, $newpart, $type, $encoding, $location, $content
; Check parameters
If $cmdline[0] = 0 Then
    $prompt = 1
Else
    If $cmdline[1] == "/help" Or $cmdline[1] == "/h" Or $cmdline[1] == "/?" _
            Or $cmdline[1] == "-h" Or $cmdline[1] == "-?" Then
        terminate("syntax")
    Else
        If FileExists($cmdline[1]) Then
            $mht = $cmdline[1]
        Else
            terminate("syntax")
        EndIf
        If $cmdline[0] > 1 Then
            $outdir = $cmdline[2]
        Else
            $prompt = 1
        EndIf
    EndIf
EndIf
; If no file passed, display GUI to select file and set options
If $prompt Then
    ; Create GUI
    GUICreate($title, 300, 115, -1, -1, -1, $WS_EX_ACCEPTFILES)
    $dropzone = GUICtrlCreateLabel("", 0, 0, 300, 115)
    GUICtrlCreateLabel("MHT archive to extract:", 5, 5, -1, 15)
    $filecont = GUICtrlCreateInput("", 5, 20, 260, 20)
    $filebut = GUICtrlCreateButton("...", 270, 20, 25, 20)
    GUICtrlCreateLabel("Target directory:", 5, 45, -1, 15)
    $dircont = GUICtrlCreateInput("", 5, 60, 260, 20)
    $dirbut = GUICtrlCreateButton("...", 270, 60, 25, 20)
    $ok = GUICtrlCreateButton("&OK", 55, 90, 80, 20)
    $cancel = GUICtrlCreateButton("&Cancel", 165, 90, 80, 20)
    ; Set properties
    GUICtrlSetBkColor($dropzone, $GUI_BKCOLOR_TRANSPARENT)
    GUICtrlSetState($dropzone, $GUI_DISABLE)
    GUICtrlSetState($dropzone, $GUI_DROPACCEPTED)
    GUICtrlSetState($filecont, $GUI_FOCUS)
    GUICtrlSetState($ok, $GUI_DEFBUTTON)
    If $mht <> "" Then
        GUICtrlSetData($filecont, $mht)
        $filedir = StringLeft($mht, StringInStr($mht, '\', 0, -1) - 1)
        $filename = StringTrimRight(StringTrimLeft($mht, StringLen($filedir) + 1), 4)
        GUICtrlSetData($dircont, $filedir & "\" & $filename)
        GUICtrlSetState($dircont, $GUI_FOCUS)
    EndIf
    ; Set events
    GUISetOnEvent($GUI_EVENT_DROPPED, "GUI_Drop")
    GUICtrlSetOnEvent($filebut, "GUI_File")
    GUICtrlSetOnEvent($dirbut, "GUI_Directory")
    GUICtrlSetOnEvent($ok, "GUI_Ok")
    GUICtrlSetOnEvent($cancel, "GUI_Exit")
    GUISetOnEvent($GUI_EVENT_CLOSE, "GUI_Exit")
    ; Display GUI and wait for action
    GUISetState(@SW_SHOW)
    $finishgui = 0
    While 1
        If $finishgui Then ExitLoop
    WEnd
EndIf
; Set full output directory
$filedir = StringLeft($mht, StringInStr($mht, '\', 0, -1) - 1)
$filename = StringTrimRight(StringTrimLeft($mht, StringLen($filedir) + 1), 4)
If $outdir = '/sub' Then
    $outdir = $filedir & "\" & $filename
ElseIf StringMid($outdir, 2, 1) <> ":" Then
    If StringLeft($outdir, 1) == '\' Then
        $outdir = StringLeft($filedir, 2) & $outdir
    Else
        $outdir = _PathFull($filedir & '\' & $outdir)
    EndIf
EndIf
; Determine boundry
$infile = FileOpen($mht, 0)
$line = FileReadLine($infile)
Do
    If StringInStr($line, "boundary=", 0) Then
        $temp = StringTrimLeft($line, StringInStr($line, "boundary=") + 9)
        $boundary = StringLeft($temp, StringInStr($temp, '"') - 1)
        ; Continue processing to count number of parts
    ElseIf StringInStr($line, $boundary) Then
        $parts += 1
    EndIf
    $line = FileReadLine($infile)
Until @error
FileClose($infile)
$parts -= 1
; Verify boundary exists
If $boundary == '' Then
    MsgBox(48, $title, "Error: This does not appear to be a valid MHT file.  " & @CRLF & "No boundary could be detected.")
    Exit
EndIf
; Begin processing MHT file
; Initialize file and parse until first part
ProgressOn($title, "Extracting " & $filename & ".mht", "", -1, -1, 16)
$part = 0
$infile = FileOpen($mht, 0)
While 1
    $line = FileReadLine($infile)
    If StringInStr($line, $boundary) And Not StringInStr($line, '"') Then
        $newpart = True
        ExitLoop
    EndIf
WEnd
; Process individual parts
While 1
    $line = FileReadLine($infile)
    If @error Then ExitLoop
    ; Initialize variables
    If $newpart Then
        $type = ""
        $encoding = ""
        $location = ""
        $content = ""
        $newpart = False
        $part += 1
    EndIf
    ; Determine filetype
    If StringInStr($line, "Content-Type:", 0) Then
        $temp = StringRegExp($line, ":\s*([A-Za-z0-9/-]+)", 1)
        $type = $temp[0]
        ; Determine encoding method
    ElseIf StringInStr($line, "Content-Transfer-Encoding:", 0) Then
        $temp = StringRegExp($line, ":\s*([A-Za-z0-9-]+)", 1)
        $encoding = $temp[0]
        ; Determine filename
    ElseIf StringInStr($line, "Content-Location:", 0) Then
        $temp = StringTrimLeft($line, StringInStr($line, "Content-Location:") + 17)
        $location = getFName($temp, $type)
        ProgressSet(Round($part / $parts, 2) * 100, "Processing file " & $part & " of " & $parts & @CRLF & $location & "  ")
        ; Decode and write out new file when new boundary reached
    ElseIf StringInStr($line, $boundary) Then
        writeFile($encoding, $location, $content)
        $newpart = True
        ; Read encoded file content into memory until new boundary reached
    ElseIf $type <> "" And $encoding <> "" And $location <> "" Then
        If $encoding = "base64" And $line <> "" Then
            $content = $content & $line
        ElseIf $encoding <> "base64" Then
            $content = $content & $line & @CRLF
        EndIf
    EndIf
WEnd
FileClose($infile)
ProgressOff()
Exit
; -------------------------- Begin Custom Functions ---------------------------
Func terminate($status)
    ; Display error message if file could not be extracted
    Select
        ; Display usage information and exit
        Case $status == "syntax"
            $syntax = "Extract files from MHT web archives."
            $syntax = $syntax & @CRLF & "Usage:  " & @ScriptName & " [/help] [filename [destination]]"
            $syntax = $syntax & @CRLF & @CRLF & "Supported Arguments:"
            $syntax = $syntax & @CRLF & "     /help" & @TAB & @TAB & "Display this help information"
            $syntax = $syntax & @CRLF & "     filename" & @TAB & "Name of file to extract"
            $syntax = $syntax & @CRLF & "     destination" & @TAB & "Directory to which to extract"
            $syntax = $syntax & @CRLF & @CRLF & "Passing /sub instead of a destination directory name instructs" & @CRLF & $title & " to extract to subdirectory named after the archive."
            $syntax = $syntax & @CRLF & @CRLF & "Example:"
            $syntax = $syntax & @CRLF & "     " & @ScriptName & " c:\1\example.mht c:\test"
            $syntax = $syntax & @CRLF & @CRLF & "Running " & $title & " without any arguments will" & @CRLF & "prompt the user for the filename and destination directory."
            MsgBox(48, $title, $syntax)
    EndSelect
    Exit
EndFunc   ;==>terminate
; Return the filename from the passed URL
Func getFName($url, $type)
    Local $ext
    ; Determine file extension
    If StringInStr($type, "jpeg") Then
        $ext = "jpg"
    Else
        $ext = StringTrimLeft($type, StringInStr($type, '/'))
    EndIf
    ; If no filename specified, generate based on content-type
    If StringRight($url, 1) == "/" Then
        Return unique("index", $ext)
        ; Otherwise take directlry from URL
    Else
        $temp = StringTrimLeft($url, StringInStr($url, '/', 0, -1))
        $temp = StringRegExp($temp, "(.*?\.\a*)", 1)
        If Not @error And @extended Then
            $fname = StringLeft($temp[0], StringInStr($temp[0], '.', 0, -1) - 1)
            $fext = StringTrimLeft($temp[0], StringInStr($temp[0], '.', 0, -1))
            Return unique($fname, $fext)
        Else
            Return unique("unknown", $ext)
        EndIf
    EndIf
EndFunc   ;==>getFName
; Ensure a unique filename is returned
Func unique($fname, $ext)
    If FileExists($outdir & "\" & $fname & "." & $ext) Then
        $i = 1
        While FileExists($outdir & "\" & $fname & $i & "." & $ext)
            $i += 1
        WEnd
        Return $fname & $i & "." & $ext
    Else
        Return $fname & "." & $ext
    EndIf
EndFunc   ;==>unique
; Write contents to file
Func writeFile($encoding, $location, $content)
    If Not FileExists($outdir) Then DirCreate($outdir)
    $outfile = FileOpen($outdir & "\" & $location, 2)
    ; Decode file according to encoding type
    If $encoding = "base64" Then
        $content = B64Dec($content)
    ElseIf $encoding = "quoted-printable" Then
        $content = QPDec($content)
    EndIf
    ; Write decoded file
    FileWriteLine($outfile, $content)
    FileClose($outfile)
EndFunc   ;==>writeFile
; Decode quoted-printable data
Func QPDec($text)
    ; Replace line terminators (RFC Rule 5)
    $text = StringRegExpReplace($text, "=\N\n", "")
    ; Strip malformed content from HTML pages (debugging)
    $text = StringRegExpReplace($text, "=EF=BB=BF", "")
    ; Find all remaining hex codes in text
    $codes = StringRegExp($text, "=(\x{2})", 3)
    ; Convert each hex code to ASCII character and replace in text (RFC rule 1)
    For $i = 0 To UBound($codes) - 1
        $text = StringReplace($text, '=' & $codes[$i], Chr(Dec($codes[$i])))
    Next
    Return $text
EndFunc   ;==>QPDec
; Decode base64 data
; code by jjohn (http://www.autoitscript.com/forum/index.php?showtopic=28734)
Func B64Dec($t)
    Const $B64Rch[123] =[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, _
            0, 0, 62, 0, 0, 0, 63, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 0, 0, 0, 0, _
            0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, _
            19, 20, 21, 22, 23, 24, 25, 0, 0, 0, 0, 0, 0, 26, 27, 28, 29, 30, 31, 32, _
            33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51]
    $res = ''
    $arrt = StringSplit(StringReplace($t, '=', ''), '')
    For $a1 = 1 To $arrt[0]
        If Mod($a1, 4) = 1 Then ContinueLoop
        Select
            Case Mod($a1, 4) = 2
                $at = BitShift($B64Rch[Asc($arrt[$a1 - 1])], -2) + BitShift($B64Rch[Asc($arrt[$a1])], 4)
                $aa = BitShift($B64Rch[Asc($arrt[$a1])], -4)
                If $a1 = $arrt[0]Then
                    $res &= Chr($at)
                    Return $res
                EndIf
            Case Mod($a1, 4) = 3
                $at = $aa + BitShift($B64Rch[Asc($arrt[$a1])], 2)
                $aa = BitShift($B64Rch[Asc($arrt[$a1])], -6)
                If $a1 = $arrt[0]Then
                    $res &= Chr($at)
                    Return $res
                EndIf
            Case Mod($a1, 4) = 0
                $at = $aa + $B64Rch[Asc($arrt[$a1])]
                If $a1 = $arrt[0]Then
                    $res &= Chr($at)
                    Return $res
                EndIf
        EndSelect
        $res &= Chr($at)
    Next
    Return $res
EndFunc   ;==>B64Dec
; ------------------------ Begin GUI Control Functions ------------------------
; Prompt user for file
Func GUI_File()
    $mht = FileOpenDialog("Open file", "", "Select file (*.mht)", 1)
    If Not @error Then
        GUICtrlSetData($filecont, $mht)
        If GUICtrlRead($dircont) = "" Then
            $filedir = StringLeft($mht, StringInStr($mht, '\', 0, -1) - 1)
            $filename = StringTrimRight(StringTrimLeft($mht, StringLen($filedir) + 1), 4)
            GUICtrlSetData($dircont, $filedir & "\" & $filename)
        EndIf
        GUICtrlSetState($ok, $GUI_FOCUS)
    EndIf
EndFunc   ;==>GUI_File
; Prompt user for directory
Func GUI_Directory()
    If FileExists(GUICtrlRead($dircont)) Then
        $defdir = GUICtrlRead($dircont)
    ElseIf FileExists(GUICtrlRead($filecont)) Then
        $defdir = StringLeft(GUICtrlRead($filecont), StringInStr(GUICtrlRead($filecont), '\', 0, -1) - 1)
    Else
        $defdir = '';
    EndIf
    $outdir = FileSelectFolder("Extract to", "", 3, $defdir)
    If Not @error Then
        GUICtrlSetData($dircont, $outdir)
    EndIf
EndFunc   ;==>GUI_Directory
; Set file to extract and target directory, then exit
Func GUI_Ok()
    $mht = GUICtrlRead($filecont)
    If FileExists($mht) Then
        If GUICtrlRead($dircont) == "" Then
            $outdir = '/sub'
        Else
            $outdir = GUICtrlRead($dircont)
        EndIf
        GUIDelete()
        $finishgui = True
    Else
        If $mht == '' Then
            $mht = '';
        Else
            $mht = $mht & " does not exist." & @CRLF;
        EndIf
        MsgBox(48, $title, $mht & "Please select valid file.")
    EndIf
EndFunc   ;==>GUI_Ok
; Process dropped files outside of file input box
Func GUI_Drop()
    If FileExists(@GUI_DragFile) Then
        $mht = @GUI_DragFile
        GUICtrlSetData($filecont, $mht)
        If GUICtrlRead($dircont) = "" Then
            $filedir = StringLeft($mht, StringInStr($mht, '\', 0, -1) - 1)
            $filename = StringTrimRight(StringTrimLeft($mht, StringLen($filedir) + 1), 4)
            GUICtrlSetData($dircont, $filedir & "\" & $filename)
        EndIf
    EndIf
EndFunc   ;==>GUI_Drop
; Exit if Cancel clicked or window closed
Func GUI_Exit()
    Exit
EndFunc   ;==>GUI_Exit

8)


NEWHeader1.png

Share this post


Link to post
Share on other sites

there is NO way that could have worked at ALL

i tried to clean it upp... not sure if i got it all or not

Valuater, thanks for taking the time to reply. I don't quite understand your response, though. You say there's "no way that could have worked at all," but I assure you it did, aside from the performance issues mentioned in my first post. I compared your revised copy with my original using WinMerge, and it looks like the only types of changes you made were
  • Pulling all variables and macros out of strings and referencing them independently, and
  • code formatting (capitalization, spacing, etc.
The second issue is simply a matter of preference. I usually prefer to type in all lowercase when coding because it's faster for me. Regardless of coding style, though, the code should function the same.

The first issue is a bigger deal, but if you check at the beginning of the file you'll see that I have Opt("ExpandVarStrings", 1) set. This let's me embed strings and macros inside of strings, and still have them expand during execution. It may make the code more difficult to read without proper syntax highlighting for these variables, but again, this is a matter of personal preference. I can code faster (and find the code neater and shorter) without escaping every single variable and macro, and with the ExpandVarStrings option set, it'll function the same.

If you're curious, you can find a copy of the binary, compiled using AutoIt 3.2.0.1, here:

www.legroom.net/~jbreland/transfer/extractMHT.rar

Like I said in my first post, the code functionally works fine, I'm just having severe performance issues. I'm still very interested in any suggestions you or other forum members may have.

Thanks again.

Share this post


Link to post
Share on other sites

A couple of thoughts... Perhaps usinf _FileReadToArray and paying the read I/O price up front and see if working in memory improves overall performance. Second, I'd suggest adding some timers in key functions to see if there is a particular function using a disproportionate amount of time.

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

if fileexists("$outdir$\$fname$.$ext$") then

while fileexists("$outdir$\$fname$$i$.$ext$")

progressset(round($part/$parts, 2)*100, "Processing file $part$ of $parts$:@CRLF@$location$") ; will show up but ???

$outdir = "$filedir$\$filename$"

GUICtrlSetData($dircont, "$filedir$\$filename$") ; will work but ???

no #include <file.au3>... there is more.... but if this thing can work... i have alot to learn

8)

Edited by Valuater

NEWHeader1.png

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

A couple of thoughts... Perhaps usinf _FileReadToArray and paying the read I/O price up front and see if working in memory improves overall performance. Second, I'd suggest adding some timers in key functions to see if there is a particular function using a disproportionate amount of time.

Thanks for the suggestions, Dale. I added a timer and tested how long it took for normal extraction vs. disabling base64 decoding and quoted-printable decoding. Disabling decoding makes a huge difference in extraction time (as I expected). I also tried switching over to using _FileReadToArray(), but unfortunately speed gains were not significant.

You can see my full results below. Time is in seconds, and I took the average of two runs for each type.

microsoft.mht:

Default - 16.80 (FileReadLine) / 16.48 (_FileReadToArray)

No base64 - 5.30

No QP - 13.50

THE GUN THAT NEVER WAS Heckler & Koch G11.mht:

Default - 53.58 (FileReadLine) / 53.09 (_FileReadToArray)

No base64 - 17.65

No QP - 39.88

progressset(round($part/$parts, 2)*100, "Processing file $part$ of $parts$:@CRLF@$location$") ; will show up but ???oÝ÷ Ûú®¢×éÞÆÚÚrا=ÚÊ·«{Ov{kiËL==ÚË©¦íÓÝzËt÷i¢rº,¡ûfÛ!£.¥«¡×®²È§ø¥{ZÛn|w^Æf÷(uꮢקjg
Sets directory field to:

Y:\software\extractMHT\support\Test\microsoft

no #include <file.au3>... there is more.... but if this thing can work... i have alot to learn

Ok, I did miss the file.au3. :nuke: I forgot that I was using _PathFull(). I looked through my code again and didn't see anything else that was missing, though.

In both of these cases, the reason why I do it is simply because I find the code neater and more compact. Eg, I consider this:

"Processing file $part$ of $parts$:@CRLF@$location$"

much easier to type and deal with than this:

"Processing file " & $part & " of " & $parts & ":" & @CRLF@ & "$location"

because anytime I make changes to it I'll usually ended up forgetting to add an & or a " and then I have to debug the syntax, which just gets really annoying.

Like I said above, I do appreciate that you took the time to review this and post some suggestions, and given the quality of your posts on this board I'm quite confident you know this stuff better than I. I guess in this case you're just not used to the ExpandVarStrings option. It can happen. :P

Edited by nitro322

Share this post


Link to post
Share on other sites

Well, for anyone interested, I was able to make some very significant improvements. The greatest speed-up resulted from simply switching to another base64 implementation, which I found here:

http://www.autoitscript.com/forum/index.ph...st&p=148460

This is significantly faster than the algorithm I was using before, and makes a huge difference in total time.

The second improvement was tweaking my quoted-printable decoding function. This is where most of the slowdown occured:

$codes = stringregexp($text, "=(\x{2})", 3)
for $i = 1 to $codes[0]
    $text = stringreplace($text, '=' & $codes[$i], chr(dec($codes[$i])))
next

In doing some testing, I noticed that in one particular file $codes contained over 9000 elements. I then wrote them to a text file and noticed that they were mostly duplicates. In fact, there were only 7 unique elements. So, I'm doing a stringreplace() >9000 times on a fairly large string (about 3000 lines) when I only need to replace 7 elements. Can you say, "inefficient?" :P

So, I wrote a function to very quickly strip out duplicate elements before running through the for loop. It's not a huge gain, but it is noticable on multiple files.

Anyway, thanks again for the responses. I've attached a final copy for anyone interested.

extractMHT.au3

Share this post


Link to post
Share on other sites

Another quicker method could be using Com. I have googled some CDO examples that can convert HTML to MHTML. Nothing found on the inverse, but the code could perhaps be reversed.

Some links of interest

http://www.codeproject.com/aspnet/aspnetht...867#xx1559867xx

http://www.dbforums.com/archive/index.php/t-783832.html

http://msdn.microsoft.com/library/default....b36a9d50e03.asp

Perhaps worth checking out...

Share this post


Link to post
Share on other sites

Thanks for the suggestion, MHz, but I'm pretty happy with the performance at this point, and am pretty much ready to just be done with it. :) If I get an itch to work on it again in the future I'll probably look into this, but for now I think it works pretty well as is.

For anyone interested, the final copy can be obtain from here:

http://www.legroom.net/modules.php?op=modl...;app=extractmht

Thanks to everyone for their suggestions.

Share this post


Link to post
Share on other sites

Hi nitro322

Any chance of you knocking up a reversal of your wonderful program?

i.e. convert an .html file plus the accompanying _files folder content into a single .mht file.

I'm presuming here, that all we would need to do, is reverse what you've done?

With a little bit more knowledge, I could maybe do it myself ... but I'm not sure I'm that clever?

I posted here in regard to a related erifash request ... so you can see I'm not entirely lazy ... just a basic backyard scripter!

;):lmao:


TheSaints' Robust Chat

Make sure brain is in gear before opening mouth!
Remember, what is not said, can be just as important as what is said.

Spoiler

If I put effort into communication, I expect you to read properly & fully, or just not comment.
Ignoring those who try to divert conversation with irrelevancies.
If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it.
I'm only big and bad, to those who have an over-active imagination.

I may have the Artistic Liesense ;) to disagree with you. TheSaint's Toolbox

userbar.png

Share this post


Link to post
Share on other sites

Hi. Unfortunately, simply reversing the process is a bit more difficult than it may seem at first glance. I'm sure it's doable, but I'm not sure it'd be worth the time and effort involved.

I saw that MHz posted a possible solution to this problem in the thread you linked to. Have you tried that? It looks like it utilizes the built-in Windows/IE capabilities for this, which would be far more efficient. Let me know if it works (I can't test myself, as I have the required component ripped out of my Windows install).

Share this post


Link to post
Share on other sites

Sets directory field to:

Y:\software\extractMHT\support\Test\microsoft

No it wont.

# MY LOVE FOR YOU... IS LIKE A TRUCK- #

Share this post


Link to post
Share on other sites

Sets directory field to:

Y:\software\extractMHT\support\Test\microsoft

No it wont.

Oh ;) , try this ripped part of code. Notice that $mht = $CMDLINE[1] in Nitro322's code so I hardcoded with the mentioned string.

$mht = 'Y:\software\extractMHT\support\Test\microsoft.mht'

; Nitro322's code
Opt("ExpandVarStrings", 1)

if $mht <> "" then
;~  GUICtrlSetData($filecont, $mht)
    $filedir = stringleft($mht, stringinstr($mht, '\', 0, -1)-1)
    $filename = stringtrimright(stringtrimleft($mht, stringlen($filedir)+1), 4)
;~  GUICtrlSetData($dircont, "$filedir$\$filename$")
;~  GUICtrlSetState($dircont, $GUI_FOCUS)
endif

; show result
MsgBox(0, '', "$filedir$\$filename$")

Share this post


Link to post
Share on other sites

Oh ;) , try this ripped part of code. Notice that $mht = $CMDLINE[1] in Nitro322's code so I hardcoded with the mentioned string.

$mht = 'Y:\software\extractMHT\support\Test\microsoft.mht'

; Nitro322's code
Opt("ExpandVarStrings", 1)

if $mht <> "" then
;~  GUICtrlSetData($filecont, $mht)
    $filedir = stringleft($mht, stringinstr($mht, '\', 0, -1)-1)
    $filename = stringtrimright(stringtrimleft($mht, stringlen($filedir)+1), 4)
;~  GUICtrlSetData($dircont, "$filedir$\$filename$")
;~  GUICtrlSetState($dircont, $GUI_FOCUS)
endif

; show result
MsgBox(0, '', "$filedir$\$filename$")
Mkay, well thats just scares me.

# MY LOVE FOR YOU... IS LIKE A TRUCK- #

Share this post


Link to post
Share on other sites

Sets directory field to:

Y:\software\extractMHT\support\Test\microsoft

No it wont.

AzKay, can you please explain your comment here? Are you saying it doesn't work in certain situations, or that it just flat out does not work? In my testing I can verify that it does indeed work, provided that the ExpandVarStrings option is set to 1 as I explained in the third post. Perhaps I'm misunderstanding you.

Share this post


Link to post
Share on other sites

The only reason I said it didnt work, was using the given information in the quote

GUICtrlSetData($dircont, "$filedir$\$filename$")

I tested by doing:

$filedir = "beef"

$filename = "pie"

msgbox(0, "", "$filedir$\$filename")


# MY LOVE FOR YOU... IS LIKE A TRUCK- #

Share this post


Link to post
Share on other sites

Hi nitro322

Sorry to take so long to get back to you, but my second Antec power supply went belly up - lasted only about 3 months this one - the last which was 450W (this one was 500) lasted about 6 months.

I'm sure it's doable, but I'm not sure it'd be worth the time and effort involved.

It would be for me, as I've saved a lot of pages over the years, and ran into problems with real long filenames and backing up to CD (I'm not sure if DVD has this same issue). I made up a program to rename the .html file and matching folder, as well as basic elements of the Html code, but ran into issues with pages within the _files folder having really long and unpleasant names. I eventually opted for a work-a-round, where I could elect to zip the original file & folder, but this necessitated (at the time), using a third party zip program with command-line ability. Recently I've been thinking that ".Mht" would be better.

I saw that MHz posted a possible solution to this problem in the thread you linked to. Have you tried that?

Yes I saw that after I posted the original, just before I logged off, but haven't had a chance to check it out yet!

Let me know if it works.

Will do when I can, however I've since found out some more info and another program, which I will give more detail of at the other link shortly.

;)


TheSaints' Robust Chat

Make sure brain is in gear before opening mouth!
Remember, what is not said, can be just as important as what is said.

Spoiler

If I put effort into communication, I expect you to read properly & fully, or just not comment.
Ignoring those who try to divert conversation with irrelevancies.
If I'm intent on insulting you or being rude, I will be obvious, not ambiguous about it.
I'm only big and bad, to those who have an over-active imagination.

I may have the Artistic Liesense ;) to disagree with you. TheSaint's Toolbox

userbar.png

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0