Jump to content
natedog102

HTML Pretty Print UDF

Recommended Posts

natedog102

Hi everyone. I want to format the output of _INetGetSource to look nice and pretty. 

Example google.com source output: 

<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script>(function(){window.google={kEI:'DJtTWvCOI6WGjwSE9JrICg',kEXPI:'18167,1354277,1354916,1355218,1355675,1355793,1356171,1356806,1357219,1357326,3700304,3700519,3700521,4003510,4029815,4031109,4043492,4045841,4048347,4081038,4081164,4095909,4096834,4097153,4097195,4097922,4097929,4098733,4098740,4098752,4102237,4102827,4103475,4103845,4106084,4107914,4109316,4109490,4112770,4113217,4115697,4116349,4116724,4116731,4116926,4116927,4116935,4117980,4118798,4119032,4119034,4119036,4120285,4120286,4120660,4121175,4121518,4122511,4123830,4123850,4124091,4124850,4125837,4126202,4126754,4126869,4127262,4127418,4127473,4127744,4127863,4128586,4128622,4129001,4129520,4129556,4129633,4130362,4130783,4131247,4131834,4132956,4133114,4133509,4135025,4135088,4135249,4135934,4136073,4136092,4136137,4137597,4137646,4140792,4140849,4141281,4141707,4141915,4142071,4142328,4142420,4142443,4142503,4142678,4142729,4142829,4142834,4142847,4143278,4143527,4143902,4144442,4144550,4144704,4145074,4145075,4145082,4145088,4145461,4145485,4145622,4145688,4145713,4145836,4146146,4146183,4146874,4147032,4147043,4147096,4147443,4147800,4147951,4148257,4148304,4148436,4148498,4148573,6512220,10200083,10202524,10202562,15807763,19000288,19000423,19000427,19001999,19002287,19002288,19002366,19002548,19002880,19003321,19003323,19003325,19003326,19003328,19003329,19003330,19003407,19003408,19003409,19004309,19004516,19004517,19004518,19004519,19004520,19004521,19004531,19004656,19004668,19004670,19004692,41317155',authuser:0,kscs:'c9c918f0_DJtTWvCOI6WGjwSE9JrICg',u:'c9c918f0',kGL:'US'};google.kHL='en';})();(function(){google.lc=[];google.li=0;google.getEI=function(a){for(var b;a&&(!a.getAttribute||!(b=a.getAttribute("eid")));)a=a.parentNode;return b||google.kEI};google.getLEI=function(a){for(var b=null;a&&(!a.getAttribute||!(b=a.getAttribute("leid")));)a=a.parentNode;return b};google.https=function(){return"https:"==window.location.protocol};google.ml=function(){return null};google.wl=function(a,b){try{google.ml(Error(a),!1,b)}catch(d){}};google.time=function(){return(new Date).getTime()};google.log=function(a,b,d,c,g){if(a=google.logUrl(a,b,d,c,g)){b=new Image;var e=google.lc,f=google.li;e[f]=b;b.onerror=b.onload=b.onabort=function(){delete e[f]};google.vel&&google.vel.lu&&google.vel.lu(a);b.src=a;google.li=f+1}};google.logUrl=function(a,b,d,c,g){var e="",f=google.ls||"";d||-1!=b.search("&ei=")||(e="&ei="+google.getEI(c),-1==b.search("&lei=")&&(c=google.getLEI(c))&&(e+="&lei="+c));c="";!d&&google.cshid&&-1==b.search("&cshid=")&&(c="&cshid="+google.cshid);a=d||"/"+(g||"gen_204")+"?atyp=i&ct="+a+"&cad="+b+e+f+"&zx="+google.time()+c;/^http:/i.test(a)&&google.https()&&(google.ml(Error("a"),!1,{src:a,glmm:1}),a="");return a};}).call(this);(function(){google.y={};google.x=function(a,b){if(a)var c=a.id;else{do c=Math.random();while(google.y[c])}google.y[c]=[a,b];return!1};google.lm=[];google.plm=function(a){google.lm.push.apply(google.lm,a)};google.lq=[];google.load=function(a,b,c){google.lq.push([[a],b,c])};google.loadAll=function(a,b){google.lq.push([a,b])};}).call(this);google.f={};var a=window.location,b=a.href.indexOf("#");if(0<=b){var c=a.href.substring(b+1);/(^|&)q=/.test(c)&&-1==c.indexOf("#")&&a.replace("/search?"+c.replace(/(^|&)fp=[^&]*/g,"")+"&cad=h")};</script><style>#gbar,#guser{font-size:13px;padding-top:1px !important;}#gbar{height:22px}#guser{padding-bottom:7px !important;text-align:right}.gbh,.gbd{border-top:1px solid #c9d7f1;font-size:1px}.gbh{height:0;position:absolute;top:24px;width:100%}@media all{.gb1{height:22px;margin-right:.5em;vertical-align:top}#gbar{float:left}}a.gb1,a.gb4{text-decoration:underline !important}a.gb1,a.gb4{color:#00c !important}.gbi .gb4{color:#dd8e27 !important}.gbf .gb4{color:#900 !important}

But I want it outputted like this:

<!doctype html>
<html itemscope="" itemtype="http://schema.org/WebPage" lang="en">

<head>
    <meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description">
    <meta content="noodp" name="robots">
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
    <meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image">
    <title>Google</title>
    <script>
        (function() {
            window.google = {
                kEI: 'DJtsdfgWGjwSE9JrICg',
                kEXPI: '18167,1354277,1354916,1355218,1355675,1355793,1356171,1356806,1357219,1357326,37sdfg0304,3700519,3700521,4003510,4029815,4031109,4043492,4045841,4048347,4081038,4081164,4095909,4096834,4097153,4097195,4097922,4097929,4098733,4098740,4098752,4102237,4102827,4103475,4103845,4106084,4107914,4109316,4109490,4112770,4113217,4115697,4116349,4116724,4116731,4116926,4116927,4116935,4117980,4118798,4119032,4119034,4119036,4120285,4120286,4120660,4121175,4121518,4122511,4123830,4123850,4124091,4124850,4125837,4126202,4126754,4126869,4127262,4127418,4127473,4127744,4127863,4128586,4128622,4129001,4129520,4129556,4129633,4130362,4130783,4131247,4131834,413sdfg56,4133114,4133509,4135025,4135088,4135249,4135934,4136073,4136092,4136137,4137597,4137646,4140792,4140849,4141281,4141707,4141915,4142071,4142328,4142420,4142443,4142503,4142678,4142729,4142829,4142834,4142847,4143278,4143527,4143902,4144442,4144550,4144704,4145074,4145075,4145082,4145088,4145461,4145485,4145622,4145688,4145713,4145836,4146146,4146183,4146874,4147032,4147043,4147096,4147443,4147800,4147951,4148257,4148304,4148436,4148498,4148573,6512220,10200083,10202524,10202562,15807763,19000288,190sdfg23,19000427,19001999,19002287,19002288,19002366,19002548,19002880,19003321,19003323,19003325,19003326,19003328,19003329,19003330,19003407,19003408,19003409,19004309,19004516,19004517,19004518,19004519,19004520,19004521,19004531,19004656,19004668,19004670,19004692,41317155',
                authuser: 0,
                kscs: 'c9c918f0_DJtTWvCOI6WGjwSE9JrICg',
                u: 'c9c918f0',
                kGL: 'US'
            };
            google.kHL = 'en';
        })();
        
.......

I checked the forums and did not see any UDFs that allow for this. I see the Chilkat UDF but that only supports JSON. Any help would be greatly appreciated.

Share this post


Link to post
Share on other sites
genius257

Hi @natedog102.

So i took a stab at it for 30mins, and got it to work with google html. (I was doing something related anyway, and i got to address a problem in my hTMLParser.au3 lib i can implement when i find a way to make it less messy)

the file you need to run in the same folder as the two other files is prettyhtml.au3

the html you need to parse, currently need to be in a file named: prettyhtml.txt

the output will be in the same folder and be named: prettyhtml_output.txt

Hope you can use it.

Btw. there might be some strange that can give you trouble still, and if you find them, be sure to let me know, i will appreciate it.

prettyhtml.au3

HTMLParser.au3

TokenList.au3

Edit: credit to @Zedna for the StringRepeat Function

Edited by genius257
  • Like 1

Share this post


Link to post
Share on other sites
natedog102

Thanks for the quick response! If the HTML is already partially formatted, it doubles the whitespaces and returns. If the HTML contains javascript, it sometimes doesn't appear in the formatted text file. Same thing with CSS.

Hope that helps. Let me know if you want me to post any examples.

Share this post


Link to post
Share on other sites
genius257
2 minutes ago, natedog102 said:

If the HTML is already partially formatted, it doubles the whitespaces and returns.

hmmm I imagine it might be an easy fix with StringStripWS(..., 1+2)

3 minutes ago, natedog102 said:

If the HTML contains javascript, it sometimes doesn't appear in the formatted text file. Same thing with CSS.

Hmmm i suspect it might be the cases i have a tough time testing for myself ^^ examples would be greatly appreciated :)

4 minutes ago, natedog102 said:

Hope that helps.

Oh yeah, it helps :) The more bugs i know of, the more i can try to improve it ^^

  • Like 1

Share this post


Link to post
Share on other sites
genius257

Hey @natedog102.

So here's the most i'll do on the script for now: prettyhtml.au3

What's missing that i know of without your special case examples, would be start tags without end tags. There's just too many for me to do without some kind of usage of the end product for me ^^, see https://html.spec.whatwg.org/multipage/syntax.html#syntax-tag-omission

The "An ... element's ... tag may be omitted if ..." cases are many and very specific for each case :)

Anyway i hope the updated script may help a little.

 

  • Like 1

Share this post


Link to post
Share on other sites
natedog102

Hi @genius257 

I'll use it and update this thread with any more examples. Thanks so much for working on this

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • SkysLastChance
      By SkysLastChance
       
      WinActivate("MEDITECH - Internet Explorer") Sleep (500) $oIE = _IEAttach("MEDITECH") $oDiv1 = _IEGetObjById($oIE, "sysmenu-searchbarbutton") _IEAction($oDiv1, "click") I am just trying to click the little magnifying glass, next to the gear button with no luck. I was hoping someone might have an idea why this is not working?
       

    • Burgs
      By Burgs
      Hello,
        I have a website with a Google Map I setup using the Google Map API.  It works and displays just fine.  However to make it useful to me I need to be able to dynamically change the map to display different areas by sending new Latitude and Longitude coordinates.  I am having difficulty making this happen.  Here is my code thus far:
      #include <IE.au3> $oIE3 = _IECreate("http://my_sample_website.html") ;just an example, not an actual site... _IELoadWait($oIE3) $s_word = "lat:" $oInputs = _IETagNameAllGetCollection($oIE3) if @error <> 0 Then MsgBox($MB_SYSTEMMODAL, "ERROR", "Error is: " & @error) EndIf ;@error For $oInput In $oInputs if Number($iPos) == -1 Then $iPos = StringInStr($oInput.innerHTML, String($s_word)) if (Number($iPos) > 0) AND (@error == 0) Then ConsoleWrite("I FOUND IT...! " & String($s_word) & @CRLF) $sHTML = _IEBodyReadHTML($oIE3) $_lat_look = 0 $_lng_look = 0 $_end_look = 0 ;default $_lat_look = StringInStr(String($sHTML), "lat:") if Number($_lat_look) <> 0 Then $_lng_look = StringInStr(String($sHTML), "lng:") if Number($_lng_look) <> 0 Then $_end_look = StringInStr(String($sHTML), "}") if Number($_end_look) <> 0 Then ConsoleWrite("HTML BODY: " & $sHTML & @CRLF) $_old_lat = String(StringMid(String($sHTML), $_lat_look, ($_lng_look - $_lat_look))) $_old_lng = String(StringMid(String($sHTML), $_lng_look, ($_end_look - $_lng_look))) ConsoleWrite("$_old_lat: " & $_old_lat & @CRLF) ConsoleWrite("$_old_lng: " & $_old_lng & @CRLF) $_new_lat = "lat: " & String("-34.397") & ", " $_new_lng = "lng: " & String("150.644") & "}; " ConsoleWrite("...new lat is: " & String($_new_lat) & " new lng is: " & String($_new_lng) & @CRLF) $_LOOK = StringReplace($_old_lat, 1, String($_new_lat)) $_LOOK2 = StringReplace($_old_lng, 1, String($_new_lng)) ConsoleWrite("$_LOOK: " & $_LOOK & "$_LOOK2: " & $_LOOK2 & @CRLF) EndIf ;'$_end_look' NOT "0"... $iPos = -1 EndIf ;'String($s_word)' was found in the collection '$oInputs' EndIf ;'$iPos' is "-1" Next  
        I am having trouble trying to replace the line in the HTML ($sHTML variable in my example) that contains the "lat:" and "lng:" information.  I figure if I can replace that line everything else remains the same, and in theory, the map should cycle to display a map with the new latitude and longitude coordinates...I hope. 
        I have attempted to write the $sHTML to a text document and then use '_IEBodyWriteHTML' to read it back into the webpage HTML however that is not working.  There must be an easier method to accomplish this...what am I missing here...?  Any thoughts greatly appreciated.  Regards.       
    • iamtech
      By iamtech
      When I click on print button (Javascript Button) in webpage using _IEAction($btnprint,"click") then display print dialog, like this one:
      i want to Automate this print dialog box and print file using default printer. (not using controlclick).
      its possible to print without showing print dialog box?

    • rkr
      By rkr
      I have a text file which has over 1000 lines, and I wish to replace one particular - I was able to do the replacement - but I have a issue - I want the numbers to be of specific format 
      Eg; my command was as follows, 
      my command >>>>> _filewritetoline($inp_file,$inp_replacement,"WAVE1.00STOK"&$hdet&"      "&$tasso&"        "&"270.0      D          5.0  72MS     1",true)
      my output >>>>>>       WAVE1.00STOK10.06        9.800         270.0      D          5.0  72MS     1
      how do I make sure that the $hdet=10.06 is printed as 10.060(3 digits after decimel) and same with $tasso and so on.. also, how to maintain the required gap between the variables - is it by manually putting spaces ?
       
      thanks guys
    • argumentum
      By argumentum
      by  eltorro Posted April 26, 2010
      Most print preview solutions use the MFC Doc/View architecture which limits it usefulness outside of c++. I found an article where a Delphi programmer used Enhanced Meta Files wrapped in a custom header and packaged together to create a print preview control. After a little more searching, It looks like using EMFs is a solution that would work.
      Some people suggest to create the document in Word or HTML and use Word or a browser to view it. Indeed, I have rendered documents to HTML and used the IE UDF to display the contents and/or print them. Not quite as ideal as one would like.
      Using GRS's printwin.au3 as a start, I came up with a print preview solution which others may be able to expand upon
×