Sign in to follow this  
Followers 0
nobbe

my HTML file splitter

5 posts in this topic

#1 ·  Posted (edited)

i needed a HTML file splitter to cut larger HTML files into smaller parts (since my mobile phone only reads up to 500 KB files of html)

regular files splitters are "dumb" as they take only "xyz" bytes and cut the files in exactly the same size pieces

my splitter now leaves all formatting (HEADER with stylesheet etc) intact for all groups of files

you will have to look into your HTML to find the string (html tag) where you would like to split

this can be something like "<H1>" for heading or in my case "<span class="c4">" as style information for paragraph changes

have fun nobbe

CODE
;

; split HTML files into smaller HMTL, leaving header information with formatting intact

;

; in the edit field enter the string which is used to separate the HTML

; eg "<H1>"

;

; by nobbe (2008)

;

; - change : different method of getting "<body " Tag : should keep all body stile intact

; - added : navigation << -- >> for easier navigation in files

;

#include <GUIConstants.au3>

#include <String.au3>

#include <Array.au3>

#include <File.au3>

#Region ### START Koda GUI section ### Form=

$GUI = GUICreate("HTML File Splitter", 600, 150)

$edit_input = GUICtrlCreateInput('<span class="c4">', 16, 48, 449, 21)

$lbl = GUICtrlCreateLabel("Enter string which is used to split HTML eg '<H1>'", 16, 75, 442, 17)

$btn_doit = GUICtrlCreateButton("open file and split", 480, 48, 100, 25, 0)

GUISetState(@SW_SHOW)

#EndRegion ### START Koda GUI section ### Form=

;; main loop

While 1

$nMsg = GUIGetMsg()

Switch $nMsg

Case $btn_doit

; find fiel

$file = FileOpenDialog("Please select a HTML file", @ScriptDir, "htm (*.htm)|html (*.html)", 1 + 2)

If Not @error = 1 Then

$split_text = GUICtrlRead($edit_input)

_parse_html($file, $split_text)

EndIf

Case $GUI_EVENT_CLOSE

Exit

EndSwitch

WEnd

;

; do all the work ..

;

Func _parse_html($html_in, $split)

;; read file in var

Local $myhtml;

Local $all_header;

Local $all_footer;

Local $body_start_tag

Local $bodytext;

Local $tmp

Local $to_split_where

Local $iCC

Local $szDrive

Local $szDir

Local $szFName

Local $szExt;

Local $outfilename = $html_in

; ----

$outfilename = StringReplace($html_in, ".html", "");

$outfilename = StringReplace($outfilename, ".htm", "");

;; we have it in a single varaible now

$myhtml = _LoadFiletoVar($html_in)

;; try to find body

$array = StringRegExp($myhtml, '<body(.*?)>', 1);

If @error = 1 Then

MsgBox(0, "BODY", "no body Tag Found in HTML?");

Return

Else

$body_start_tag = "<body " & $array[0] & ">";

; MsgBox(0, "BODY", $body_start_tag);

EndIf

;; find full body tag now

$location_body_start = StringInStr($myhtml, '<body');

$all_header = StringLeft($myhtml, $location_body_start - 1);

; MsgBox(0, "Header", $all_header);

;; footer

$location_body_end = StringInStr($myhtml, '</body', 0, -1); from right

$all_footer = StringRight($myhtml, StringLen($myhtml) - $location_body_end + 1);

;MsgBox(0, "footer", $all_footer)

$tmp = _StringBetween($myhtml, "<body", "</body>")

If @error = 1 Then

MsgBox(0, "BODY", "no body found between tags in Source HTML?");

Return

EndIf

$bodytext = $tmp[0]; the complete "old" body

$to_split_where = $split ;

$tmp = StringSplit($bodytext, $to_split_where, 1) ; need all characters

For $iCC = 1 To UBound($tmp, 1) - 1

$new_body = $tmp[$iCC];

; MsgBox(0, "new body ", $new_body)

;; split now in several file

If $iCC < 10 Then

$inbr = "0" & $iCC

Else

$inbr = $iCC

EndIf

; we need only filename

$tmp_1 = _PathSplit($outfilename, $szDrive, $szDir, $szFName, $szExt)

$fout = $outfilename & "_" & $inbr & ".htm";

$fh_out = FileOpen($fout, 2) ; write

; write header for all

FileWrite($fh_out, $all_header & @CRLF);

FileWrite($fh_out, $body_start_tag & @CRLF); we need a new body tag

;

; add navigation on TOP

;

;; add navigation on top (<< back | forward >>)

If $iCC > 1 And $iCC < UBound($tmp, 1) Then ; back

$nbr = $iCC - 1

If $nbr < 10 Then

$nbr = "0" & $nbr

Else

$nbr = $nbr

EndIf

$l = '<A HREF="' & $szFName & "_" & $nbr & ".htm" & '">&lt;&lt;</A>' ;

FileWrite($fh_out, $l & @CRLF);

EndIf

If $iCC < UBound($tmp, 1) - 1 Then ; forward

$nbr = $iCC + 1

If $nbr < 10 Then

$nbr = "0" & $nbr

Else

$nbr = $nbr

EndIf

$l = '<A HREF="' & $szFName & "_" & $nbr & ".htm" & '"> | &gt;&gt;</A><BR>' ;

FileWrite($fh_out, $l & @CRLF);

EndIf

FileWrite($fh_out, $to_split_where & @CRLF); ; the split characters

FileWrite($fh_out, $new_body & @CRLF); ; the new body

;

; add navigation on bottom

;

;; add navigation on top (<< back | forward >>)

If $iCC > 1 And $iCC < UBound($tmp, 1) Then ; back

$nbr = $iCC - 1

If $nbr < 10 Then

$nbr = "0" & $nbr

Else

$nbr = $nbr

EndIf

$l = '<A HREF="' & $szFName & "_" & $nbr & ".htm" & '">&lt;&lt;</A>' ;

FileWrite($fh_out, $l & @CRLF);

EndIf

If $iCC < UBound($tmp, 1) - 1 Then ; forward

$nbr = $iCC + 1

If $nbr < 10 Then

$nbr = "0" & $nbr

Else

$nbr = $nbr

EndIf

$l = '<A HREF="' & $szFName & "_" & $nbr & ".htm" & '"> | &gt;&gt;</A><BR>' ;

FileWrite($fh_out, $l & @CRLF);

EndIf

;; nav ---

FileWrite($fh_out, $all_footer & @CRLF); ; the footer

FileClose($fh_out);

Next

MsgBox(0, "DONE", "Split into " & $iCC & " files")

EndFunc ;==>_parse_html

; ==============================================

; Description ..:

; Parameters ...:

; Return values :

; Author .......: nobbe

; Notes ........:

; ==============================================

Func _LoadFiletoVar($which_file)

Local $ret = "";

Local $line;

Local $file = FileOpen($which_file, 0) ; öffnen und lesen

If $file = -1 Then

MsgBox(0, "Error", "Unable to open file " & $which_file)

Return -1;

EndIf

While 1

; read only 1 line .. all info is in one line

$line = FileReadLine($file)

If @error = -1 Then ExitLoop

$ret &= $line & @CRLF;

WEnd

FileClose($file)

;return var

Return $ret;

EndFunc ;==>_LoadFiletoVar

;

; print s debug text in CONSOLE view of scite..

;

Func _DebugPrint($s_text)

$s_text = StringReplace($s_text, @LF, @LF & "-->")

ConsoleWrite($s_text & @LF);

EndFunc ;==>_DebugPrint

Edited by nobbe

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

I get the following error:

CODE
C:\Documents and Settings\Ludo\Bureaublad\htmlSplit.au3 (89) : ==> Subscript used with non-Array variable.:

$bodytext = $tmp[0]

$bodytext = $tmp^ ERROR

->11:22:50 AutoIT3.exe ended.rc:1

>Exit code: 1 Time: 60.344

my htm file looks like this:

CODE
<HTML>

<BODY BgColor="Green">

heloooo!<BR>

Goodbye<P>

</BODY>

</HTML>

Edited by ludocus

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

well for now it needs now EXACTLY "<BODY>" and "</BODY>"

<BODY BgColor="Green">" wont be found..

.. i updated script ..

Edited by nobbe

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

So is it able to do <BODY BgColor="Green"> or not? thnx

-edit:

Ok so, I tried like this: <BODY BgColor="Green"> :

It doesn't work doesn't give an error but gives a msgbox saying unable to find body tag in HTML

then, I tried like this <BODY> :

It doesn't work doesn't give an error but gives a msgbox saying unable to find body tag in HTML

Edited by ludocus

Share this post


Link to post
Share on other sites

hm ...

it also might be you havent entered any "splitter" values ..

you might want to try on a larger HTML file ??

it works fine on my files..

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0