Jump to content
natedog102

HTML Pretty Print UDF

Recommended Posts

natedog102

Hi everyone. I want to format the output of _INetGetSource to look nice and pretty. 

Example google.com source output: 

<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script>(function(){window.google={kEI:'DJtTWvCOI6WGjwSE9JrICg',kEXPI:'18167,1354277,1354916,1355218,1355675,1355793,1356171,1356806,1357219,1357326,3700304,3700519,3700521,4003510,4029815,4031109,4043492,4045841,4048347,4081038,4081164,4095909,4096834,4097153,4097195,4097922,4097929,4098733,4098740,4098752,4102237,4102827,4103475,4103845,4106084,4107914,4109316,4109490,4112770,4113217,4115697,4116349,4116724,4116731,4116926,4116927,4116935,4117980,4118798,4119032,4119034,4119036,4120285,4120286,4120660,4121175,4121518,4122511,4123830,4123850,4124091,4124850,4125837,4126202,4126754,4126869,4127262,4127418,4127473,4127744,4127863,4128586,4128622,4129001,4129520,4129556,4129633,4130362,4130783,4131247,4131834,4132956,4133114,4133509,4135025,4135088,4135249,4135934,4136073,4136092,4136137,4137597,4137646,4140792,4140849,4141281,4141707,4141915,4142071,4142328,4142420,4142443,4142503,4142678,4142729,4142829,4142834,4142847,4143278,4143527,4143902,4144442,4144550,4144704,4145074,4145075,4145082,4145088,4145461,4145485,4145622,4145688,4145713,4145836,4146146,4146183,4146874,4147032,4147043,4147096,4147443,4147800,4147951,4148257,4148304,4148436,4148498,4148573,6512220,10200083,10202524,10202562,15807763,19000288,19000423,19000427,19001999,19002287,19002288,19002366,19002548,19002880,19003321,19003323,19003325,19003326,19003328,19003329,19003330,19003407,19003408,19003409,19004309,19004516,19004517,19004518,19004519,19004520,19004521,19004531,19004656,19004668,19004670,19004692,41317155',authuser:0,kscs:'c9c918f0_DJtTWvCOI6WGjwSE9JrICg',u:'c9c918f0',kGL:'US'};google.kHL='en';})();(function(){google.lc=[];google.li=0;google.getEI=function(a){for(var b;a&&(!a.getAttribute||!(b=a.getAttribute("eid")));)a=a.parentNode;return b||google.kEI};google.getLEI=function(a){for(var b=null;a&&(!a.getAttribute||!(b=a.getAttribute("leid")));)a=a.parentNode;return b};google.https=function(){return"https:"==window.location.protocol};google.ml=function(){return null};google.wl=function(a,b){try{google.ml(Error(a),!1,b)}catch(d){}};google.time=function(){return(new Date).getTime()};google.log=function(a,b,d,c,g){if(a=google.logUrl(a,b,d,c,g)){b=new Image;var e=google.lc,f=google.li;e[f]=b;b.onerror=b.onload=b.onabort=function(){delete e[f]};google.vel&&google.vel.lu&&google.vel.lu(a);b.src=a;google.li=f+1}};google.logUrl=function(a,b,d,c,g){var e="",f=google.ls||"";d||-1!=b.search("&ei=")||(e="&ei="+google.getEI(c),-1==b.search("&lei=")&&(c=google.getLEI(c))&&(e+="&lei="+c));c="";!d&&google.cshid&&-1==b.search("&cshid=")&&(c="&cshid="+google.cshid);a=d||"/"+(g||"gen_204")+"?atyp=i&ct="+a+"&cad="+b+e+f+"&zx="+google.time()+c;/^http:/i.test(a)&&google.https()&&(google.ml(Error("a"),!1,{src:a,glmm:1}),a="");return a};}).call(this);(function(){google.y={};google.x=function(a,b){if(a)var c=a.id;else{do c=Math.random();while(google.y[c])}google.y[c]=[a,b];return!1};google.lm=[];google.plm=function(a){google.lm.push.apply(google.lm,a)};google.lq=[];google.load=function(a,b,c){google.lq.push([[a],b,c])};google.loadAll=function(a,b){google.lq.push([a,b])};}).call(this);google.f={};var a=window.location,b=a.href.indexOf("#");if(0<=b){var c=a.href.substring(b+1);/(^|&)q=/.test(c)&&-1==c.indexOf("#")&&a.replace("/search?"+c.replace(/(^|&)fp=[^&]*/g,"")+"&cad=h")};</script><style>#gbar,#guser{font-size:13px;padding-top:1px !important;}#gbar{height:22px}#guser{padding-bottom:7px !important;text-align:right}.gbh,.gbd{border-top:1px solid #c9d7f1;font-size:1px}.gbh{height:0;position:absolute;top:24px;width:100%}@media all{.gb1{height:22px;margin-right:.5em;vertical-align:top}#gbar{float:left}}a.gb1,a.gb4{text-decoration:underline !important}a.gb1,a.gb4{color:#00c !important}.gbi .gb4{color:#dd8e27 !important}.gbf .gb4{color:#900 !important}

But I want it outputted like this:

<!doctype html>
<html itemscope="" itemtype="http://schema.org/WebPage" lang="en">

<head>
    <meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description">
    <meta content="noodp" name="robots">
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
    <meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image">
    <title>Google</title>
    <script>
        (function() {
            window.google = {
                kEI: 'DJtsdfgWGjwSE9JrICg',
                kEXPI: '18167,1354277,1354916,1355218,1355675,1355793,1356171,1356806,1357219,1357326,37sdfg0304,3700519,3700521,4003510,4029815,4031109,4043492,4045841,4048347,4081038,4081164,4095909,4096834,4097153,4097195,4097922,4097929,4098733,4098740,4098752,4102237,4102827,4103475,4103845,4106084,4107914,4109316,4109490,4112770,4113217,4115697,4116349,4116724,4116731,4116926,4116927,4116935,4117980,4118798,4119032,4119034,4119036,4120285,4120286,4120660,4121175,4121518,4122511,4123830,4123850,4124091,4124850,4125837,4126202,4126754,4126869,4127262,4127418,4127473,4127744,4127863,4128586,4128622,4129001,4129520,4129556,4129633,4130362,4130783,4131247,4131834,413sdfg56,4133114,4133509,4135025,4135088,4135249,4135934,4136073,4136092,4136137,4137597,4137646,4140792,4140849,4141281,4141707,4141915,4142071,4142328,4142420,4142443,4142503,4142678,4142729,4142829,4142834,4142847,4143278,4143527,4143902,4144442,4144550,4144704,4145074,4145075,4145082,4145088,4145461,4145485,4145622,4145688,4145713,4145836,4146146,4146183,4146874,4147032,4147043,4147096,4147443,4147800,4147951,4148257,4148304,4148436,4148498,4148573,6512220,10200083,10202524,10202562,15807763,19000288,190sdfg23,19000427,19001999,19002287,19002288,19002366,19002548,19002880,19003321,19003323,19003325,19003326,19003328,19003329,19003330,19003407,19003408,19003409,19004309,19004516,19004517,19004518,19004519,19004520,19004521,19004531,19004656,19004668,19004670,19004692,41317155',
                authuser: 0,
                kscs: 'c9c918f0_DJtTWvCOI6WGjwSE9JrICg',
                u: 'c9c918f0',
                kGL: 'US'
            };
            google.kHL = 'en';
        })();
        
.......

I checked the forums and did not see any UDFs that allow for this. I see the Chilkat UDF but that only supports JSON. Any help would be greatly appreciated.

Share this post


Link to post
Share on other sites
genius257
Posted (edited)

Hi @natedog102.

So i took a stab at it for 30mins, and got it to work with google html. (I was doing something related anyway, and i got to address a problem in my hTMLParser.au3 lib i can implement when i find a way to make it less messy)

the file you need to run in the same folder as the two other files is prettyhtml.au3

the html you need to parse, currently need to be in a file named: prettyhtml.txt

the output will be in the same folder and be named: prettyhtml_output.txt

Hope you can use it.

Btw. there might be some strange that can give you trouble still, and if you find them, be sure to let me know, i will appreciate it.

prettyhtml.au3

HTMLParser.au3

TokenList.au3

Edit: credit to @Zedna for the StringRepeat Function

Edited by genius257
  • Like 1

Share this post


Link to post
Share on other sites
natedog102

Thanks for the quick response! If the HTML is already partially formatted, it doubles the whitespaces and returns. If the HTML contains javascript, it sometimes doesn't appear in the formatted text file. Same thing with CSS.

Hope that helps. Let me know if you want me to post any examples.

Share this post


Link to post
Share on other sites
genius257
2 minutes ago, natedog102 said:

If the HTML is already partially formatted, it doubles the whitespaces and returns.

hmmm I imagine it might be an easy fix with StringStripWS(..., 1+2)

3 minutes ago, natedog102 said:

If the HTML contains javascript, it sometimes doesn't appear in the formatted text file. Same thing with CSS.

Hmmm i suspect it might be the cases i have a tough time testing for myself ^^ examples would be greatly appreciated :)

4 minutes ago, natedog102 said:

Hope that helps.

Oh yeah, it helps :) The more bugs i know of, the more i can try to improve it ^^

  • Like 1

Share this post


Link to post
Share on other sites
genius257

Hey @natedog102.

So here's the most i'll do on the script for now: prettyhtml.au3

What's missing that i know of without your special case examples, would be start tags without end tags. There's just too many for me to do without some kind of usage of the end product for me ^^, see https://html.spec.whatwg.org/multipage/syntax.html#syntax-tag-omission

The "An ... element's ... tag may be omitted if ..." cases are many and very specific for each case :)

Anyway i hope the updated script may help a little.

 

  • Like 1

Share this post


Link to post
Share on other sites
natedog102

Hi @genius257 

I'll use it and update this thread with any more examples. Thanks so much for working on this

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • stafe
      By stafe
      Hello
      I am trying to get chrome to save it's current webpage to a pdf using autoit.  I wold like to save the file into a folder on the desktop called "ChromeFiles" .  However when I use the send command chrome only occasionally responds.
      Thank you
      Simon
       
    • Rskm
      By Rskm
      Hi, i have a variable (floating number - could be positive or negative), i wish to print it into a text file, how can i keep the format in such a way that it always has 4 digits ahead of decimal and 2 after decimal
      basically i have total 7 columns in the text file to print the variable  ;
      eg: variable = 1.235, output requied = 0001.24
      variable=-23.55555, output required =-023.56
    • ur
      By ur
      Is there any UDF to remove all anchor tags <a> with a particular class (and also its sub elements completely) in a html document.
      Here the classes are browse and breadcrumbs
      Like in the below image.


       
      I am not able to find that option in IE.au3
       
      Please suggest.
    • nacerbaaziz
      By nacerbaaziz
      Hello
      Dear Sirs, I have a question please
      About UPDown ctrl
      Where I want to adjust its format
      For example, I want to create a window to set the clock like the window in the system
      Can this be done?
      here's the example
      ______

      #include <WindowsConstants.au3> #include <EditConstants.au3> #include <GUIConstantsEx.au3> #include <UpDownConstants.au3> goto() func Goto() local $wGoto = GUICreate("go to spissific position", 250, 180, @DesktopWidth / 2 - 192, @DesktopHeight / 2 - 235, -1) GUICtrlCreateLabel("please write a corect position to go to it", 50, 10, 220, 20) GUICtrlCreateLabel("hours", 0, 80, 100, 30) local $Inp1 = GUICtrlCreateInput("00", 0, 90, 100, 20, $ES_NUMBER + $WS_TABSTOP)     GUICtrlCreateUpdown($Inp1, $UDS_ARROWKEYS) GUICtrlSetLimit(-1, 23, 0) GUICtrlCreateLabel("minuts", 110, 80, 100, 30) local $Inp2 = GUICtrlCreateInput("00", 110, 90, 100, 20, $ES_NUMBER + $WS_TABSTOP)     GUICtrlCreateUpdown($Inp2, $UDS_ARROWKEYS) GUICtrlSetLimit(-1, 60, 0) GUICtrlCreateLabel("seconds", 220, 80, 100, 30) local $Inp3 = GUICtrlCreateInput("00", 220, 90, 100, 20, $ES_NUMBER + $WS_TABSTOP)     GUICtrlCreateUpdown($Inp3, $UDS_ARROWKEYS) GUICtrlSetLimit(-1, 60, 0) local $Ok = GUICtrlCreateButton("&ok", 0, 150, 50, 30, 0x01) local $cancel = GUICtrlCreateButton("&cancel", 200, 150, 50, 30) GUISetState(@sw_show, $WgoTo) while 1 switch GUIGetMSG() case $GUI_Event_Close, $cancel exit case $OK local $read1 = GUICtrlRead($inp1) if $read1 >= 1 then $Read1 = ($Read1*60)*60 local $read2 = GUICtrlRead($inp2) if $read2 >= 1 then $Read2 = $Read2*60 local $read3 = GUICtrlRead($inp3) GUIDelete($WgoTo) Return $read1+$read2+$read3 exit endSwitch Wend EndFunc
      ________
      What I need is to be the contents of the inputs as follows when i change the value of the UPDown CTRL
      00
      01
      02
      03
      ...
       
    • milkmoron
      By milkmoron
      I am trying to automate something in a web browser but i need some help with finding the html code to a web applet. How do I access the code.
×