nobbe Posted March 26, 2008 Share Posted March 26, 2008 (edited) i needed a HTML file splitter to cut larger HTML files into smaller parts (since my mobile phone only reads up to 500 KB files of html) regular files splitters are "dumb" as they take only "xyz" bytes and cut the files in exactly the same size pieces my splitter now leaves all formatting (HEADER with stylesheet etc) intact for all groups of files you will have to look into your HTML to find the string (html tag) where you would like to split this can be something like "<H1>" for heading or in my case "<span class="c4">" as style information for paragraph changes have fun nobbe CODE; ; split HTML files into smaller HMTL, leaving header information with formatting intact ; ; in the edit field enter the string which is used to separate the HTML ; eg "<H1>" ; ; by nobbe (2008) ; ; - change : different method of getting "<body " Tag : should keep all body stile intact ; - added : navigation << -- >> for easier navigation in files ; #include <GUIConstants.au3> #include <String.au3> #include <Array.au3> #include <File.au3> #Region ### START Koda GUI section ### Form= $GUI = GUICreate("HTML File Splitter", 600, 150) $edit_input = GUICtrlCreateInput('<span class="c4">', 16, 48, 449, 21) $lbl = GUICtrlCreateLabel("Enter string which is used to split HTML eg '<H1>'", 16, 75, 442, 17) $btn_doit = GUICtrlCreateButton("open file and split", 480, 48, 100, 25, 0) GUISetState(@SW_SHOW) #EndRegion ### START Koda GUI section ### Form= ;; main loop While 1 $nMsg = GUIGetMsg() Switch $nMsg Case $btn_doit ; find fiel $file = FileOpenDialog("Please select a HTML file", @ScriptDir, "htm (*.htm)|html (*.html)", 1 + 2) If Not @error = 1 Then $split_text = GUICtrlRead($edit_input) _parse_html($file, $split_text) EndIf Case $GUI_EVENT_CLOSE Exit EndSwitch WEnd ; ; do all the work .. ; Func _parse_html($html_in, $split) ;; read file in var Local $myhtml; Local $all_header; Local $all_footer; Local $body_start_tag Local $bodytext; Local $tmp Local $to_split_where Local $iCC Local $szDrive Local $szDir Local $szFName Local $szExt; Local $outfilename = $html_in ; ---- $outfilename = StringReplace($html_in, ".html", ""); $outfilename = StringReplace($outfilename, ".htm", ""); ;; we have it in a single varaible now $myhtml = _LoadFiletoVar($html_in) ;; try to find body $array = StringRegExp($myhtml, '<body(.*?)>', 1); If @error = 1 Then MsgBox(0, "BODY", "no body Tag Found in HTML?"); Return Else $body_start_tag = "<body " & $array[0] & ">"; ; MsgBox(0, "BODY", $body_start_tag); EndIf ;; find full body tag now $location_body_start = StringInStr($myhtml, '<body'); $all_header = StringLeft($myhtml, $location_body_start - 1); ; MsgBox(0, "Header", $all_header); ;; footer $location_body_end = StringInStr($myhtml, '</body', 0, -1); from right $all_footer = StringRight($myhtml, StringLen($myhtml) - $location_body_end + 1); ;MsgBox(0, "footer", $all_footer) $tmp = _StringBetween($myhtml, "<body", "</body>") If @error = 1 Then MsgBox(0, "BODY", "no body found between tags in Source HTML?"); Return EndIf $bodytext = $tmp[0]; the complete "old" body $to_split_where = $split ; $tmp = StringSplit($bodytext, $to_split_where, 1) ; need all characters For $iCC = 1 To UBound($tmp, 1) - 1 $new_body = $tmp[$iCC]; ; MsgBox(0, "new body ", $new_body) ;; split now in several file If $iCC < 10 Then $inbr = "0" & $iCC Else $inbr = $iCC EndIf ; we need only filename $tmp_1 = _PathSplit($outfilename, $szDrive, $szDir, $szFName, $szExt) $fout = $outfilename & "_" & $inbr & ".htm"; $fh_out = FileOpen($fout, 2) ; write ; write header for all FileWrite($fh_out, $all_header & @CRLF); FileWrite($fh_out, $body_start_tag & @CRLF); we need a new body tag ; ; add navigation on TOP ; ;; add navigation on top (<< back | forward >>) If $iCC > 1 And $iCC < UBound($tmp, 1) Then ; back $nbr = $iCC - 1 If $nbr < 10 Then $nbr = "0" & $nbr Else $nbr = $nbr EndIf $l = '<A HREF="' & $szFName & "_" & $nbr & ".htm" & '"><<</A>' ; FileWrite($fh_out, $l & @CRLF); EndIf If $iCC < UBound($tmp, 1) - 1 Then ; forward $nbr = $iCC + 1 If $nbr < 10 Then $nbr = "0" & $nbr Else $nbr = $nbr EndIf $l = '<A HREF="' & $szFName & "_" & $nbr & ".htm" & '"> | >></A><BR>' ; FileWrite($fh_out, $l & @CRLF); EndIf FileWrite($fh_out, $to_split_where & @CRLF); ; the split characters FileWrite($fh_out, $new_body & @CRLF); ; the new body ; ; add navigation on bottom ; ;; add navigation on top (<< back | forward >>) If $iCC > 1 And $iCC < UBound($tmp, 1) Then ; back $nbr = $iCC - 1 If $nbr < 10 Then $nbr = "0" & $nbr Else $nbr = $nbr EndIf $l = '<A HREF="' & $szFName & "_" & $nbr & ".htm" & '"><<</A>' ; FileWrite($fh_out, $l & @CRLF); EndIf If $iCC < UBound($tmp, 1) - 1 Then ; forward $nbr = $iCC + 1 If $nbr < 10 Then $nbr = "0" & $nbr Else $nbr = $nbr EndIf $l = '<A HREF="' & $szFName & "_" & $nbr & ".htm" & '"> | >></A><BR>' ; FileWrite($fh_out, $l & @CRLF); EndIf ;; nav --- FileWrite($fh_out, $all_footer & @CRLF); ; the footer FileClose($fh_out); Next MsgBox(0, "DONE", "Split into " & $iCC & " files") EndFunc ;==>_parse_html ; ============================================== ; Description ..: ; Parameters ...: ; Return values : ; Author .......: nobbe ; Notes ........: ; ============================================== Func _LoadFiletoVar($which_file) Local $ret = ""; Local $line; Local $file = FileOpen($which_file, 0) ; öffnen und lesen If $file = -1 Then MsgBox(0, "Error", "Unable to open file " & $which_file) Return -1; EndIf While 1 ; read only 1 line .. all info is in one line $line = FileReadLine($file) If @error = -1 Then ExitLoop $ret &= $line & @CRLF; WEnd FileClose($file) ;return var Return $ret; EndFunc ;==>_LoadFiletoVar ; ; print s debug text in CONSOLE view of scite.. ; Func _DebugPrint($s_text) $s_text = StringReplace($s_text, @LF, @LF & "-->") ConsoleWrite($s_text & @LF); EndFunc ;==>_DebugPrint Edited March 27, 2008 by nobbe Link to comment Share on other sites More sharing options...
ludocus Posted March 26, 2008 Share Posted March 26, 2008 (edited) I get the following error: CODEC:\Documents and Settings\Ludo\Bureaublad\htmlSplit.au3 (89) : ==> Subscript used with non-Array variable.: $bodytext = $tmp[0] $bodytext = $tmp^ ERROR ->11:22:50 AutoIT3.exe ended.rc:1 >Exit code: 1 Time: 60.344 my htm file looks like this: CODE<HTML> <BODY BgColor="Green"> heloooo!<BR> Goodbye<P> </BODY> </HTML> Edited March 26, 2008 by ludocus Link to comment Share on other sites More sharing options...
nobbe Posted March 26, 2008 Author Share Posted March 26, 2008 (edited) well for now it needs now EXACTLY "<BODY>" and "</BODY>" <BODY BgColor="Green">" wont be found.. .. i updated script .. Edited March 26, 2008 by nobbe Link to comment Share on other sites More sharing options...
ludocus Posted March 26, 2008 Share Posted March 26, 2008 (edited) So is it able to do <BODY BgColor="Green"> or not? thnx-edit:Ok so, I tried like this: <BODY BgColor="Green"> :It doesn't work doesn't give an error but gives a msgbox saying unable to find body tag in HTMLthen, I tried like this <BODY> :It doesn't work doesn't give an error but gives a msgbox saying unable to find body tag in HTML Edited March 26, 2008 by ludocus Link to comment Share on other sites More sharing options...
nobbe Posted March 26, 2008 Author Share Posted March 26, 2008 hm ... it also might be you havent entered any "splitter" values .. you might want to try on a larger HTML file ?? it works fine on my files.. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now