Sign in to follow this  
Followers 0
Mat

HTML tidier - "Possessing beauty"

11 posts in this topic

#1 ·  Posted (edited)

I was wondering if someone have code regarding beautifying plain HTML code and is willing to share it.

I'm not interested in additional software, tools, whatever, just plain AutoIt code.

Indentation is what this is about.

Would be much obliged.

So I made one. It deals only with indentation, and presumes you are inputting a valid HTML file. And that means VALID. no "<br>"'s, it will stuff it up. "<br />"'s only. same applies to any other tag that does not have a separate closing tag. HTML is generally a messy language, but this is very much "garbage in, garbage out" software. Not literally, you'll just have ridiculous tabs in there.

Command lines:

-i, installs it into the context menu for ".html" and ".htm" files. Use install.bat if your lazy

-u, uninstalls the above. (or click uninstall.bat)

Filename [tab], the in file and then the tab width in spaces (default = -1 = @TAB)

Haven't tried it with a huge HTML file yet... Neither has it been tested on anything other than WinXP SP3. Although I see no reason for any change.

source + exe: http://code.google.com/p/m-a-t/downloads/detail?name=HTMLTidy.zip

Mat

Updated: Fixed problem with comments not following <!-- convention. Tested on a large Wikipedia article, worked perfectly

Update2: Significant performance gain. parsed the same wiki article, completed VERY fast. Should also give a better affect?

Update3: anchor tags stay on the same line. Much easier to read blocks of text now. Will deal with scripts now.

Update4: Another big update, Can now recursively tidy entire directories of HTML files (very useful :) ), I have also finally managed to sort out problems with which tags are indented.

Edited by Mat

Share this post


Link to post
Share on other sites



Hey works great and very fast good work!


[quote name='PsaltyDS' post='635433' date='Jan 27 2009, 07:04 AM']Larry is a mass murderer?! It's always the quiet, clean cut, bald guys... [/quote]

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Thanks... I was about to test it on a big html file for how fast it actually was, but realised that I don't have a big web page to test.

I used: http://en.wikipedia.org/wiki/Maths

They hadn't closed their opening DOCTYPE tag with a "/>" so it messed up the layout. It also showed up an error in my code. I always thaught that <!-- was necessary for a comment, although it appears this is more a convention, and only <! is required. I have updated now! The <! solves the doctype to, that 912 line wikipedia page tidied perfectly (Scrolling down to the bottom showed a nice ending anyhow.)

It was a lot faster than I expected though! Still slow on reading the file.

Mat

Edited by Mat

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Updated again, with no loop for reading the file, and a bit more stringregexp, significantly faster.

I have also noticed another fault in that the anchor tag is treated as any other tag would be... Leading to unreadable text in some cases. I will look for a solution.

Mat

Edit: Also completely messes up any internal scripts, such as CSS. Use external files or tidy it yourself.

Edited by Mat

Share this post


Link to post
Share on other sites

Big Update.

Can now recursively tidy entire directories of HTML files using the "-d" command line, or simply by entering a directory as $cmdLine[0].

I managed to sort out the problem Trancexx brought up with indenting after tags such as "input" or "base", and I also added in tag correction.

Mat

Share this post


Link to post
Share on other sites

Mat,

Old topic, but since it's an interesting piece of code, it deserves to be re-opened :blink:

You know how your code only works with XHTML source, due to the <br /> being required, why not use a regular expression to allow either <br> or <br />

James

Share this post


Link to post
Share on other sites

Mat,

Old topic, but since it's an interesting piece of code, it deserves to be re-opened ;)

You know how your code only works with XHTML source, due to the <br /> being required, why not use a regular expression to allow either <br> or <br />

James

It does at the moment (I think). There is supposed to be tag correction in there as well, I think it's broken though (I got a pm about something :blink: )

I have updated the script with code corrections working.

Mat

Share this post


Link to post
Share on other sites

Mat,

I'm going to have a play with this some time :blink: I think you could make it into a neat little app if you added enough features.

James

I know, it's been a while but I would like to make a full check and tidy tool, as well as building in different levels like xhtml 1.0 Strict, frames, transitional and 1.1 as well as HTML 4 and 5 and also general xml.

The best way to go would be a big database with tag,description,supported versions,supported browsers,self closing... Do you know if there are any out there already?

Share this post


Link to post
Share on other sites

They hadn't closed their opening DOCTYPE tag with a "/>" so it messed up the layout. It also showed up an error in my code. I always thaught that <!-- was necessary for a comment, although it appears this is more a convention, and only <! is required.

You totally got it wrong. Semantically and structurally the '<!' has nothing in common with '<!--'. The '<!DOCTYPE ... >' is the whole tag and is the HTML analog of the (pre)processing instruction in the programing.

As for the comments the opening is '<!--' and the closing is '-->' no '-'s allowed in between.

P.S.

Sorry for grumbling about this one-year-old post of yours.


; Opt('MustDeclareVars', 1)

Share this post


Link to post
Share on other sites

You totally got it wrong. Semantically and structurally the '<!' has nothing in common with '<!--'. The '<!DOCTYPE ... >' is the whole tag and is the HTML analog of the (pre)processing instruction in the programing.

As for the comments the opening is '<!--' and the closing is '-->' no '-'s allowed in between.

P.S.

Sorry for grumbling about this one-year-old post of yours.

I'll defend the me of a year ago by saying that it was the lack of closing tag on the doctype not the lack of hyphens that was the first error I fixed. As for comments:

<! test --  123 -- 123 123>

does not show in the browser, it is simply taken to be a preprocessor as you said. This is a simple tidier, as far as it's concerned its a comment node, so does not start a block. That's all I care about.

But yes, you are right that I would never write my comments like that, and more complex programs would throw an error.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0