Jump to content
Decipher

HTML Parser UDF

Recommended Posts

Decipher

Hi,

I'm inviting all autoit forum members to contribute to a HTML parser udf. I going to attempt to replicate a python module called BeautifulSoup. It would be greatly appreciated if some senior Autoit programmers took interest in this topic. There is no template other than the module written in python located here and the documentation here.

I can't wait to see what this develops into. :robot:


Spoiler

censored.jpg

 

Share this post


Link to post
Share on other sites
jdelaney

I don't see the need, there is the _IE UDF, and with my link below, you can focus on any node(s) through an XPATH.


IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites
Decipher

I see the need for simplicity. Would you care to give an example of how to use the functions mentioned above for data extraction.


Spoiler

censored.jpg

 

Share this post


Link to post
Share on other sites
KaFu
Decipher

I stand corrected. There are templates for parsing HTML... and in UDF format. Thank You @ Jdelaney & KaFu.

If anyone else has UDF's or examples they'd like to share for anyone following path here that would be great. :)


Spoiler

censored.jpg

 

Share this post


Link to post
Share on other sites
eimhym

Finally found a way to "mute" IE -- making it unable to load external resources except the already cached ones -- makes it load page faster. Other IE instances won't be affected, only the one we used as html parser.

This simple wrapper functions:

_HtmlParser_Startup

_HtmlParser_GetDocument ; the IHtmlDocument ref

_HtmlParser_LoadHtml ; load plain text, optionally remove all script tags

_HtmlParser_LoadUrl

_HtmlParser_LoadScript ; to use jquery, xpath, etc.

_HtmlParser_ClearScript

_HtmlParser_Exec ; execute js and get return value, see sample

Save as "HtmlParser.au3" and run, the sample code is included. Clear IE cache first for the best result. Hope it helps :)

#include-once
#include <WinAPI.au3>
#include <WindowsConstants.au3>
#include <GuiConstantsEx.au3>
#include <IE.au3>

; Opt("MustDeclareVars", 1)

Global $_HtmlParser_Debug = False
Global $_HtmlParser_Script = ""

Global Const $_HtmlParser_ScriptName = "HtmlParser.au3"
Global Const $tagINTERNET_PROXY_INFO = "dword dwAccessType; ptr lpszProxy; ptr lpszProxyBypass";
Global Const $tagCOPYDATA = _
  "ULONG_PTR;" & _  ; dwData, The data to be passed to the receiving application
  "DWORD;" & _    ; cbData, The size, in bytes, of the data pointed to by the lpData member
  "PTR"          ; lpData, The data to be passed to the receiving application. This member can be NULL.

Func _HtmlParser_Startup ($iPort=843)
    If IsDeclared("_HtmlParser_IE") AND IsObj($_HtmlParser_IE) Then Return 1

    Global $_HtmlParser_Port = $iPort
    __HtmlParser_ParseCmdLine()
    
    Global $_HtmlParser_HWND = WinWait($_HtmlParser_GUID, "", 5)
    If NOT $_HtmlParser_HWND Then Return SetError(1, 0, 0) ; Daemon failed to start
    WinSetTitle($_HtmlParser_HWND, "", "")
    
    ; Attach IE
    Global $_HtmlParser_IE = _IEAttach(WinGetHandle($_HtmlParser_GUID), "embedded")
    If NOT IsObj(_HtmlParser_GetDocument()) Then Return SetError(2, 0, 0)
    
    OnAutoItExitRegister("__HtmlParser_Shutdown")
    Return 1
EndFunc

Func _HtmlParser_GetDocument ()
    If IsObj($_HtmlParser_IE) Then Return _IEDocGetObj($_HtmlParser_IE)
    Return SetError(1, 0, 0)
EndFunc

Func _HtmlParser_LoadHtml ( $sHtml, $fRemoveScriptTags=0 )
    _IENavigate($_HtmlParser_IE, "about:blank")
    Local $doc = _HtmlParser_GetDocument()
    If $fRemoveScriptTags Then $sHtml = StringRegExpReplace($sHtml, "<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>", "")
    $doc.Write($sHtml & @CRLF & '<script language="javascript">' & @CRLF & $_HtmlParser_Script & @CRLF & 'Array.prototype.set=function(i,v){this[i]=v};Array.prototype.get=function(i){return this[i]};document.scripts[document.scripts.length-1].removeNode(false)</script>')
    Return $doc
EndFunc

Func _HtmlParser_LoadUrl ( $sUrl, $fRemoveScriptTags=0 )
    Local $http = ObjCreate("winhttp.winhttprequest.5.1")
    $http.Open("GET", $sUrl)
    $http.Send()
    Return _HtmlParser_LoadHtml($http.Responsetext, $fRemoveScriptTags)
EndFunc

Func _HtmlParser_LoadScript ( $sFilename )
    Local $script = ""
    If FileExists ($sFilename) Then
        $script = FileRead($sFilename)
    Else
        If StringInStr($sFilename, "http://") OR StringInStr($sFilename, "https://") OR StringInStr($sFilename, "ftp://") Then
            Local $http = ObjCreate("winhttp.winhttprequest.5.1")
            $http.Open("GET", $sFilename)
            $http.Send()
            $script = $http.Responsetext
        EndIf
    EndIf
    If $script Then
        $_HtmlParser_Script &= @CRLF & @CRLF & $script
        Return 1
    EndIf
    Return 0
EndFunc

Func _HtmlParser_ClearScript ( $sFilename )
    $_HtmlParser_Script = ""
EndFunc

Func _HtmlParser_Exec ( $sScript )
    Local $doc = _HtmlParser_GetDocument()
    If IsObj($doc) AND IsObj($doc.parentWindow) Then
        Local $window = $doc.parentWindow
        $window.execScript('window._HtmlParser_Result=(function(){' & $sScript & '})();', 'Javascript')
        If $window._HtmlParser_Result OR IsObj($window._HtmlParser_Result) Then
            Return $window._HtmlParser_Result
        Else
            Return SetError(2, 0, 0)
        EndIf
    EndIf
    Return SetError(1, 0, 0)
EndFunc

#region >> Internals

Func __HtmlParser_Shutdown ()
    Global $_HtmlParser_IE = 0
    If IsDeclared("_HtmlParser_HWND") Then WinClose($_HtmlParser_HWND)
    OnAutoItExitUnregister("__HtmlParser_Shutdown")
EndFunc

Func __HtmlParser_ParseCmdLine ()
    If @compiled Then
      Global Const $_HtmlParser_Exec = '"' & @ScriptFullPath & '"'
    Else
      Global Const $_HtmlParser_Exec = '"' & @AutoItExe & '" "' & @ScriptFullPath & '"'
    EndIf

    Local $cmd = StringInStr($CmdLineRaw, "/_HtmlParser_", 1), $val
    If $cmd > 0 Then
        #NoTrayIcon
        $cmd = StringRegExp(StringMid($CmdLineRaw, $cmd), '/(_HtmlParser_[a-zA-Z]+)\:"([^"]*)"', 3)
        For $i = 0 To UBound($cmd)-1 Step 2
            $val = $cmd[$i+1]
            If $val = ("" & Number($val)) Then $val = Number($val)
            Assign($cmd[$i], $val, 2)
        Next
        __HtmlParser_DaemonRun()
        Exit
    Else
        Global $_HtmlParser_GUID = __WinAPI_CreateGUID()
        __HtmlParser_Daemonstart()
    EndIf
EndFunc

Func __HtmlParser_Daemonstart ()
    Run ( $_HtmlParser_Exec & _
        ' /_HtmlParser_GUID:"' & $_HtmlParser_GUID & '"' & _
        ' /_HtmlParser_Debug:"' & (0 + $_HtmlParser_Debug) & '"' & _
        ' /_HtmlParser_Port:"' & $_HtmlParser_Port & '"' )
EndFunc

Func __HtmlParser_DaemonRun ()
    ; Initialize GUI
    Local $hGUI = GUICreate("", 500, 400, 10, 10, $WS_SIZEBOX, $WS_EX_TOOLWINDOW)
    Global $_HtmlParser_IE = _IECreateEmbedded()
    Local $hIE = GUICtrlCreateObj($_HtmlParser_IE, 0, 0, _WinAPI_GetClientWidth($hGUI), _WinAPI_GetClientHeight($hGUI))
    GUICtrlSetResizing($hIE, $GUI_DOCKAUTO)
    GUIRegisterMsg($WM_SYSCOMMAND, "__HtmlParser_SysCommand")
    If $_HtmlParser_Debug Then GUISetState()
    
    ; Initialize IE
    _IE_SetSessionProxy("127.0.0.1:" & $_HtmlParser_Port)
    _IENavigate($_HtmlParser_IE, "about:blank")
    _IEPropertySet($_HtmlParser_IE, "silent", True)
    WinSetTitle($hGUI, "", $_HtmlParser_GUID)
    
    Local $timeout = TimerInit()
    Do
        If TimerDiff($timeout) > 5000 Then Exit
    Until WinGetTitle($hGUI) <> $_HtmlParser_GUID
    If $_HtmlParser_Debug Then WinSetTitle($hGUI, "", "HtmlParser Debug Window")
    
    While 1
        Sleep(100)
    WEnd
EndFunc

Func __HtmlParser_SysCommand ($hWnd, $Msg, $wParam, $lParam)
    #forceref $Msg, $wParam, $lParam
    If BitAND($wParam, 0xFFF0) = 0xF060 Then Exit
    Return $GUI_RUNDEFMSG
EndFunc

#endregion << Internals

#region >> WinAPIEx

Func __WinAPI_CreateGUID()

    Local $tGUID, $Ret

    $tGUID = DllStructCreate($tagGUID)
    $Ret = DllCall('ole32.dll', 'uint', 'CoCreateGuid', 'ptr', DllStructGetPtr($tGUID))
    If @error Then
        Return SetError(1, 0, '')
    Else
        If $Ret[0] Then
            Return SetError(1, $Ret[0], 0)
        EndIf
    EndIf
    $Ret = DllCall('ole32.dll', 'int', 'StringFromGUID2', 'ptr', DllStructGetPtr($tGUID), 'wstr', '', 'int', 39)
    If (@error) Or (Not $Ret[0]) Then
        Return SetError(1, 0, '')
    EndIf
    Return $Ret[2]
EndFunc   ;==>__WinAPI_CreateGUID

Func _IE_SetSessionProxy ($sProxyAddress, $sBypassList="")
    Local $tProxyAddress = DllStructCreate("char[" & StringLen($sProxyAddress) + 1 & "]"), _
          $tBypassList = DllStructCreate("char[" & StringLen($sBypassList) + 1 & "]"), _
          $tINTERNET_PROXY_INFO = DllStructCreate($tagINTERNET_PROXY_INFO)
          
    DllStructSetData($tINTERNET_PROXY_INFO, "dwAccessType", 0x3)
    DllStructSetData($tProxyAddress, 1, $sProxyAddress)
    DllStructSetData($tINTERNET_PROXY_INFO, "lpszProxy", DllStructGetPtr($tProxyAddress))
    DllStructSetData($tBypassList, 1, $sBypassList)
    DllStructSetData($tINTERNET_PROXY_INFO, "lpszProxyBypass", DllStructGetPtr($tBypassList))
    
    Local $aRet = DllCall("urlmon.dll", "INT", "UrlMkSetSessionOption", _
                            "uint", 0x26, _
                            "ptr", DllStructGetPtr($tINTERNET_PROXY_INFO), _
                            "int", DllStructGetSize($tINTERNET_PROXY_INFO), _
                            "int", 0 )
    If @error OR $aRet[0] Then Return SetError(1, @error, 0)
    
    Return 1
EndFunc

#endregion << WinAPIEx

#region >> Debugging

Func _Alert ($msg, $fDialog=1)
  If $fDialog Then
    MsgBox(0, @ScriptName, $msg)
  Else
    TrayTip(@ScriptName, $msg, 5000)
  EndIf
EndFunc

Func _Critical ($ret, $rel=0, $msg="Fatal Error", $err=@error, $ext=@extended, $ln = @ScriptLineNumber)
  If $err Then
    $ln += $rel
    Local $LastError = _WinAPI_GetLastError(), _
          $LastErrorMsg = _WinAPI_GetLastErrorMessage(), _
          $LastErrorHex = Hex($LastError)
    $LastErrorHex = "0x" & StringMid($LastErrorHex, StringInStr($LastErrorHex, "0", 1, -1)+1)
    $msg &= @CRLF & "at line " & $ln & @CRLF & @CRLF & "AutoIt Error: " & $err & " (0x" & Hex($err)  & ") Extended: " & $ext
    If $LastError Then $msg &= @CRLF & "WinAPI Error: " & $LastError & " (" & $LastErrorHex & ")" & @CRLF & $LastErrorMsg
    ClipPut($msg)
    MsgBox(270352, "Fatal Error - " & @ScriptName, $msg)
    Exit
  EndIf
  Return $ret
EndFunc

#endregion << Debugging

; ==============================================================================

Func _HtmlParser_Test ()
    $_HtmlParser_Debug = True
    
    _Critical( _HtmlParser_Startup() )
    
    ; warming up
    Local $doc = _HtmlParser_GetDocument()
    $doc.write("Hello AutoIt World")
    _Alert("now for real")
    
    _HtmlParser_LoadScript("https://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js")
    ; Or:
    ; _HtmlParser_LoadScript(@ScriptDir & "\jquery.min.js")
    Local $doc = _HtmlParser_LoadUrl("http://www.autoitscript.com", True)
    Local $pages = _Critical( _HtmlParser_Exec('var p=[];$("p").each(function(index) { p.push("[" + (index+1) + "] " + $(this).text());});return p') )
    Local $s = ""
    For $i = 0 To $pages.length-1
        $s &= $pages.get($i) & @CRLF
    Next
    _Alert($s)
    
    Local $saved = $doc.body.parentElement.outerHTML
    Local $divs = _HtmlParser_Exec('return $(".featitem.clearfix")')
    Local $div, $features = ""
    For $i = 0 To $divs.length-1
        $div = $divs.get($i)
        $features &= $div.outerHTML
    Next
    $doc.body.innerHTML = $features
    _Alert("A lot faster parsing in javascript though")
    
    _HtmlParser_LoadHtml($saved, True)
    Local $features = _HtmlParser_Exec('var s=""; $(".featitem.clearfix").each(function(){ s += this.outerHTML }); document.body.innerHTML=s; return s')  
    _Alert($features)
    
EndFunc
If @ScriptName = $_HtmlParser_ScriptName Then _HtmlParser_Test()

Edit: added $fRemoveScriptTags option for _HtmlParser_LoadUrl and _HtmlParser_LoadHtml

Edited by eimhym

Share this post


Link to post
Share on other sites
eimhym

Heres another, IE-less, example. Needs libtidy (attached) to clean pages and pass the well-formed html into MSXML ActiveX control (see MSDN documentation here).

pros: 1) lightweight, lighting fast. 2) XPath available by default. 3) Script tags won't executed, css or images won't loaded. 4) No problem in deleting SCRIPT tag.

cons: 1) does fail with some pages, always check @error after calling _HXmlParser_LoadUrl or _HXmlParser_LoadHtml. 2) libtidy crash on HTML5 pages, you have to reload the dll. 3) Doesn't handle html tags within textarea correctly, suggestion for workaround expected. 4) Can't use JS framework.

The sample code do the same as HtmlParser, for comparison.

#include-once
#include "libtidy.au3"

; Opt("MustDeclareVars", 1)

Global Const $_HXmlParser_ScriptName = "HXmlParser.au3"
Global $_HXmlParser_DOM = 0

Func _HXmlParser_Startup ( $sConfFilename="tidy-xml-settings.cfg" )
    _LibTidy_Startup()
    If @error Then Return SetError(@error, @extended, 0)
    _LibTidy_LoadConfig($sConfFilename)
    If @error Then
        _LibTidy_Shutdown()
        Return SetError(@error, @extended, 0)
    EndIf
    $_HXmlParser_DOM = ObjCreate("MSXML2.DOMDocument")
    OnAutoItExitRegister("__HXmlParser_Shutdown")
    $_HXmlParser_DOM.validateOnParse = False;
    $_HXmlParser_DOM.resolveExternals = False;
    Return 1
EndFunc

Func __HXmlParser_Shutdown ()
    $_HXmlParser_DOM = 0
    OnAutoItExitUnRegister("__HXmlParser_Shutdown")
EndFunc

Func _HXmlParser_GetErrorString ()
    If IsObj($_HXmlParser_DOM.parseError) AND $_HXmlParser_DOM.parseError.errorCode Then
        Return "Error loading page " & _
                " (" & Hex($_HXmlParser_DOM.parseError.errorCode) & _
                ") at line: " & $_HXmlParser_DOM.parseError.line & _
                ", position: " & $_HXmlParser_DOM.parseError.linepos & _
                ", reason: " & $_HXmlParser_DOM.parseError.reason
    EndIf
    Return 0
EndFunc

Func _HXmlParser_LoadHtml ( $sHtml )
    If NOT $sHtml Then Return SetError(4, 0, 0)
    _LibTidy_LoadString( $sHtml )
    If @error Then Return SetError(@error, @extended, 0)
    $sHtml = _LibTidy_CleanAndRepair()
    If @error OR NOT $sHtml Then Return SetError(5, @error, 0)
    
    $sHtml = StringRegExp(StringMid($sHtml, StringInStr($sHtml, "<html")), "(?s)^<html[^>]*>[^<]*(<.*)", 1)
    If @error Then Return SetError(6, @error, 0)

    $sHtml = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd" [' _
        & '<!ENTITY nbsp " "><!ENTITY iexcl "¡"><!ENTITY cent "¢"><!ENTITY pound "£"><!ENTITY curren "¤"><!ENTITY yen "¥"><!ENTITY brvbar "¦"><!ENTITY sect "§"><!ENTITY uml "¨"><!ENTITY copy "©"><!ENTITY ordf "ª"><!ENTITY laquo "«"><!ENTITY not "¬"><!ENTITY shy "­"><!ENTITY reg "®"><!ENTITY macr "¯"><!ENTITY deg "°"><!ENTITY plusmn "±"><!ENTITY sup2 "²"><!ENTITY sup3 "³"><!ENTITY acute "´"><!ENTITY micro "µ"><!ENTITY para "¶"><!ENTITY middot "·"><!ENTITY cedil "¸"><!ENTITY sup1 "¹"><!ENTITY ordm "º"><!ENTITY raquo "»"><!ENTITY frac14 "¼"><!ENTITY frac12 "½"><!ENTITY frac34 "¾"><!ENTITY iquest "¿"><!ENTITY times "×"><!ENTITY divide "÷">' _
        & '<!ENTITY Agrave "À"><!ENTITY Aacute "Á"><!ENTITY Acirc "Â"><!ENTITY Atilde "Ã"><!ENTITY Auml "Ä"><!ENTITY Aring "Å"><!ENTITY AElig "Æ"><!ENTITY Ccedil "Ç"><!ENTITY Egrave "È"><!ENTITY Eacute "É"><!ENTITY Ecirc "Ê"><!ENTITY Euml "Ë"><!ENTITY Igrave "Ì"><!ENTITY Iacute "Í"><!ENTITY Icirc "Î"><!ENTITY Iuml "Ï"><!ENTITY ETH "Ð"><!ENTITY Ntilde "Ñ"><!ENTITY Ograve "Ò"><!ENTITY Oacute "Ó"><!ENTITY Ocirc "Ô"><!ENTITY Otilde "Õ"><!ENTITY Ouml "Ö"><!ENTITY Oslash "Ø"><!ENTITY Ugrave "Ù"><!ENTITY Uacute "Ú"><!ENTITY Ucirc "Û"><!ENTITY Uuml "Ü"><!ENTITY Yacute "Ý"><!ENTITY THORN "Þ"><!ENTITY szlig "ß"><!ENTITY agrave "à"><!ENTITY aacute "á"><!ENTITY acirc "â"><!ENTITY atilde "ã"><!ENTITY auml "ä"><!ENTITY aring "å"><!ENTITY aelig "æ"><!ENTITY ccedil "ç"><!ENTITY egrave "è"><!ENTITY eacute "é"><!ENTITY ecirc "ê"><!ENTITY euml "ë"><!ENTITY igrave "ì"><!ENTITY iacute "í"><!ENTITY icirc "î"><!ENTITY iuml "ï"><!ENTITY eth "ð"><!ENTITY ntilde "ñ"><!ENTITY ograve "ò"><!ENTITY oacute "ó"><!ENTITY ocirc "ô"><!ENTITY otilde "õ"><!ENTITY ouml "ö"><!ENTITY oslash "ø"><!ENTITY ugrave "ù"><!ENTITY uacute "ú"><!ENTITY ucirc "û"><!ENTITY uuml "ü"><!ENTITY yacute "ý"><!ENTITY thorn "þ"><!ENTITY yuml "ÿ">' _
        & ']>' & @CRLF _
        & '<html>' & @CRLF _
        & $sHtml[0]

    $_HXmlParser_DOM.loadXML($sHtml);
    If IsObj($_HXmlParser_DOM.parseError) AND $_HXmlParser_DOM.parseError.errorCode Then
        SetError(7, $_HXmlParser_DOM.parseError.errorCode, 0)
    EndIf
    
    $_HXmlParser_DOM.setProperty("SelectionLanguage", "XPath");
    
    Return $_HXmlParser_DOM
EndFunc

Func _HXmlParser_LoadUrl ( $sUrl )
    Local $http = ObjCreate("winhttp.winhttprequest.5.1")
    $http.Open("GET", $sUrl)
    $http.Send()
    Local $ret = _HXmlParser_LoadHtml($http.Responsetext)
    If @error Then Return SetError(@error, @extended, $ret)
    Return $ret
EndFunc


#region >> Debugging

Func _Alert ($msg, $fDialog=1, $err=@error, $ext=@extended, $ln = @ScriptLineNumber)
  If $fDialog Then
    MsgBox(0, @ScriptName, $msg)
  Else
    TrayTip(@ScriptName, $msg, 5000)
  EndIf
  If $err Then Return SetError($err, $ext, $ln)
  Return 0
EndFunc

Func _Critical ($ret, $rel=0, $msg="Fatal Error", $err=@error, $ext=@extended, $ln = @ScriptLineNumber)
  If $err Then
    $ln += $rel
    Local $LastError = _WinAPI_GetLastError(), _
          $LastErrorMsg = _WinAPI_GetLastErrorMessage(), _
          $LastErrorHex = Hex($LastError)
    $LastErrorHex = "0x" & StringMid($LastErrorHex, StringInStr($LastErrorHex, "0", 1, -1)+1)
    $msg &= @CRLF & "at line " & $ln & @CRLF & @CRLF & "AutoIt Error: " & $err & " (0x" & Hex($err)  & ") Extended: " & $ext
    If $LastError Then $msg &= @CRLF & "WinAPI Error: " & $LastError & " (" & $LastErrorHex & ")" & @CRLF & $LastErrorMsg
    $msg &= @CRLF & @CRLF & _HXmlParser_GetErrorString()
    ClipPut($msg)
    MsgBox(270352, "Fatal Error - " & @ScriptName, $msg)
    Exit
  EndIf
  Return $ret
EndFunc

#endregion << Debugging

Func _HXmlParser_Test ()
    
    _Critical( _HXmlParser_Startup() )
    Local $dom = _Critical( _HXmlParser_LoadUrl("http://www.AutoItScript.com") )
   
    _Alert("Removing all SCRIPT tags")
    Local $begin = TimerInit()
    Local $nodes = $dom.selectNodes("//script")
    If $nodes.length Then
        For $i = 0 To $nodes.length - 1
            Local $node = $nodes.item($i)
            $node.parentNode.removeChild($node)
        Next
    EndIf
    _Alert("Done in " & TimerDiff($begin) & " ms")
    
    _Alert("Collecting all P tags")
    Local $count = 1, $s = ""
    $begin = TimerInit()
    $nodes = $dom.selectNodes("//p")
    If $nodes.length Then
        For $i = 0 To $nodes.length - 1
            Local $node = $nodes.item($i)
            $s &= "[" & $count & "] " & $node.text & @CRLF
            $count += 1
        Next
    EndIf
    _Alert("Done in " & TimerDiff($begin) & " ms")
    _Alert($s)

    $s = ""
    _Alert("Collecting all feature DIVs")
    $begin = TimerInit()
    $nodes = $dom.selectNodes("//div[contains(@class, 'featitem')]")
    If $nodes.length Then
        For $i = 0 To $nodes.length - 1
            Local $node = $nodes.item($i)
            $s &= $node.xml & @CRLF
        Next
    EndIf
    _Alert("Done in " & TimerDiff($begin) & " ms")
    _Alert($s)
    
    ClipPut($dom.xml)
    _Alert("HTML content is in clipboard")
    
EndFunc
If @ScriptName = $_HXmlParser_ScriptName Then _HXmlParser_Test()

libtidy.7z

Edited by eimhym

Share this post


Link to post
Share on other sites
level20peon

I get an "libtidy.au3(106,68) : ERROR: $tidyLoadConfig: undeclared global variable."

What am I doing wrong here ?

Edited by level20peon

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • MFrancisca
      By MFrancisca
      Hello! 
      I've been checking the logging UDFs in the wiki, mostly Log4a and Loga and I wanted to ask which one would you recommend for a script that will be executed remotely through PSExec. My main question is where the logs are located in that situation and if I can change that location to a custom one., because I need to retrieve them at the end of execution.
      So in a rough description the process is
      Open PSExec
      Send compiled AutoIT script to remote machine
      Execute script
      Copy logs from the remote to the local machine.
       
      Any opinions? 
    • TheAutomator
      By TheAutomator
      I'm writing a recursive decent parser in Autoit!
      The programming language i'm making is called HighLevel.
      I'm doing this for learning purposes, because it's fun and because I can implement it into my other project:
      Fullscreen Console With custom programming language!
       
      It's not easy...

      In Autoit you don't have objects like in Java or Visual Basic, so I had to figure out a way to still convert the code to an abstract syntax tree.
      I used nested array's and array based dictionary's instead of objects.
      The code is still very dirty and I need to make a lot of modifications but if you're careful with testing you'll see what it can do already.
       
      Console window
      Because this code eventually will get implemented into my console project I crafted a nice little console window (with a custom sci-fi looking theme, yeah i was a little bored haha).
      {ESC} is your panic button for now, it terminates the script completely.
      If you get an error while opening a script the text will turn red.
      To minimize it press the blue button, to close it use the red one, to drag the gui just grab it on one of the sides.
      The console window will display what you write to it with your "HighLevel-script" and some additional information:

       
      How to test it:
      Download: HighLevel.Au3, Debug.Au3 (includes a function to display nested arrays for debugging), GUI.bmp (for the console)
      Compile the Autoit code to EXE.
      The GUI.bmp must be in the same folder as the EXE file!
      Write a HighLevel-script (text file) and drag it into the compiled autoit-exe.
      The custom made little console window will pop up in the left top corner of your screen and your HighLevel-script (the text file) will be interpreted and executed.
       
      The Language:
      exit script:     Abort      show / hide the console:     Show     Hide      write to/clear the console:     Write 'this is a ''string''!'     Clear variables: test_var_1 = 123 some_list = ['a', 5, true] some_list[1] = 3 math = 1 + 2 * 3 / 4 - -5 & test_var beep (under construction):     Beep F, optD wait X seconds:     Wait X      Messages:     Message 'Hello World!'      move/click the mouse:     Move X, Y     Click      send keys (under construction):     Send 'HighLevel', True      if's:     If false     ElseIf true         # this part will run     Else     End subs:     Sub X         # do stuff     End     Call X      for loops:     For X = 1 to 10         # X iterates     End Values:     Input 'Give me input'     Random     YesNo 'yes or no' operators:     + - * / & > = ! < ( ) And Not Or  
      Example script:
      # my first HighLevel script message 'Hello World!' message 'Lets write to the console...' clear # clear the console... list = ['a', 16, true] for i = 0 to 2     write list[i]     wait 1 end sub test     if YesNo 'would you like to quit?'         message 'Goodbye!'         abort     else         write 1 + 2 * 3 & ' math!'     end end call test  
      test script.HighLevel
      GUI.bmp
      Debug.au3
      HighLevel.au3
    • caramen
      By caramen
      I watched _OL_ItemSend&_OL_ItemFind&_OL_ItemCreate in OutlookEx UDF but hmmm
       
      Since it use the test environnement i cant get the orders of the mail creation
       
      Can someone make me win some time of reading all exemples script and lead me how to do one ?
      That whould be lovely.
      Gnight
    • Tersion
      By Tersion
      Here the this wiki page with list of available UDFs for data compression. For my tasks I only need ZIP support, so I started looking at pure AutoIt UDFs without any 3rd party dlls. And found out that most of available realizations uses standard ("native method") Windows dll - "zipfldr.dll". So for now I chose ZIP UDF by wraithdu. I've tested it on Windows 7 (x64) and it seem works fine. But here the comment from another topic where user says that Windows 10 discontinued support of "zipfldr.dll". Now I confused. I don't have around any Windows 10 machine to tested it. So maybe someone could confirm or deny that? Or maybe would better to switch to UDF with 7zip dll?
      I need an advice...
    • TheSaint
      By TheSaint
      Here is the bare bones of a UDF I have started work on.
      Mostly just a proof of concept at this stage, and still need to add some functions and dress the UDF up a bit ... to look like a UDF ... though it has my own distinct styling, especially as I have never really developed a UDF before now .... used plenty and modified plenty though. I've even invented my own UDF variable naming convention, which I am sure some of you will be aghast at. I work with what feels best for me, but others are free to adapt if they wish.
      The idea is to emulate the simplicity of INI files, but gain the benefits of SQL.
      Two scripts are provided.
      (1) The UDF, a work in progress - SimpleSQL_UDF.au3
      (2) An example or testing script - UDF_Test.au3
      Another first for me, is creating a 2D array from scratch, never done that before, that I can recall ... never had a need, and even for 1 dimension arrays, for a long time now, I have just used _StringSplit to create them. So I needed a bit of a refresher course, which my good buddy @TheDcoder assisted me with ... not without some angst I might add. LOL
      SimpleSQL_UDF.zip  (12 downloads previously)
      (I have now completed all the functions I intended to. My next update will be a big improvement, bringing things more inline with my latest INItoSQL DB program changes.)
      Program requires the sqlite3.dll, not included, but easily enough obtained.
      Hopefully the usage is self-evident ... just change the Job number variable in the UDF_Test.au3 file to check the existing functions out.
      Enjoy!
      P.S. This is also related to a new program I have just finished and uploaded - INItoSQL DB
×