Jump to content
Decipher

HTML Parser UDF

Recommended Posts

Decipher

Hi,

I'm inviting all autoit forum members to contribute to a HTML parser udf. I going to attempt to replicate a python module called BeautifulSoup. It would be greatly appreciated if some senior Autoit programmers took interest in this topic. There is no template other than the module written in python located here and the documentation here.

I can't wait to see what this develops into. :robot:


Spoiler

censored.jpg

 

Share this post


Link to post
Share on other sites
jdelaney

I don't see the need, there is the _IE UDF, and with my link below, you can focus on any node(s) through an XPATH.


IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites
Decipher

I see the need for simplicity. Would you care to give an example of how to use the functions mentioned above for data extraction.


Spoiler

censored.jpg

 

Share this post


Link to post
Share on other sites
KaFu
Decipher

I stand corrected. There are templates for parsing HTML... and in UDF format. Thank You @ Jdelaney & KaFu.

If anyone else has UDF's or examples they'd like to share for anyone following path here that would be great. :)


Spoiler

censored.jpg

 

Share this post


Link to post
Share on other sites
eimhym

Finally found a way to "mute" IE -- making it unable to load external resources except the already cached ones -- makes it load page faster. Other IE instances won't be affected, only the one we used as html parser.

This simple wrapper functions:

_HtmlParser_Startup

_HtmlParser_GetDocument ; the IHtmlDocument ref

_HtmlParser_LoadHtml ; load plain text, optionally remove all script tags

_HtmlParser_LoadUrl

_HtmlParser_LoadScript ; to use jquery, xpath, etc.

_HtmlParser_ClearScript

_HtmlParser_Exec ; execute js and get return value, see sample

Save as "HtmlParser.au3" and run, the sample code is included. Clear IE cache first for the best result. Hope it helps :)

#include-once
#include <WinAPI.au3>
#include <WindowsConstants.au3>
#include <GuiConstantsEx.au3>
#include <IE.au3>

; Opt("MustDeclareVars", 1)

Global $_HtmlParser_Debug = False
Global $_HtmlParser_Script = ""

Global Const $_HtmlParser_ScriptName = "HtmlParser.au3"
Global Const $tagINTERNET_PROXY_INFO = "dword dwAccessType; ptr lpszProxy; ptr lpszProxyBypass";
Global Const $tagCOPYDATA = _
  "ULONG_PTR;" & _  ; dwData, The data to be passed to the receiving application
  "DWORD;" & _    ; cbData, The size, in bytes, of the data pointed to by the lpData member
  "PTR"          ; lpData, The data to be passed to the receiving application. This member can be NULL.

Func _HtmlParser_Startup ($iPort=843)
    If IsDeclared("_HtmlParser_IE") AND IsObj($_HtmlParser_IE) Then Return 1

    Global $_HtmlParser_Port = $iPort
    __HtmlParser_ParseCmdLine()
    
    Global $_HtmlParser_HWND = WinWait($_HtmlParser_GUID, "", 5)
    If NOT $_HtmlParser_HWND Then Return SetError(1, 0, 0) ; Daemon failed to start
    WinSetTitle($_HtmlParser_HWND, "", "")
    
    ; Attach IE
    Global $_HtmlParser_IE = _IEAttach(WinGetHandle($_HtmlParser_GUID), "embedded")
    If NOT IsObj(_HtmlParser_GetDocument()) Then Return SetError(2, 0, 0)
    
    OnAutoItExitRegister("__HtmlParser_Shutdown")
    Return 1
EndFunc

Func _HtmlParser_GetDocument ()
    If IsObj($_HtmlParser_IE) Then Return _IEDocGetObj($_HtmlParser_IE)
    Return SetError(1, 0, 0)
EndFunc

Func _HtmlParser_LoadHtml ( $sHtml, $fRemoveScriptTags=0 )
    _IENavigate($_HtmlParser_IE, "about:blank")
    Local $doc = _HtmlParser_GetDocument()
    If $fRemoveScriptTags Then $sHtml = StringRegExpReplace($sHtml, "<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>", "")
    $doc.Write($sHtml & @CRLF & '<script language="javascript">' & @CRLF & $_HtmlParser_Script & @CRLF & 'Array.prototype.set=function(i,v){this[i]=v};Array.prototype.get=function(i){return this[i]};document.scripts[document.scripts.length-1].removeNode(false)</script>')
    Return $doc
EndFunc

Func _HtmlParser_LoadUrl ( $sUrl, $fRemoveScriptTags=0 )
    Local $http = ObjCreate("winhttp.winhttprequest.5.1")
    $http.Open("GET", $sUrl)
    $http.Send()
    Return _HtmlParser_LoadHtml($http.Responsetext, $fRemoveScriptTags)
EndFunc

Func _HtmlParser_LoadScript ( $sFilename )
    Local $script = ""
    If FileExists ($sFilename) Then
        $script = FileRead($sFilename)
    Else
        If StringInStr($sFilename, "http://") OR StringInStr($sFilename, "https://") OR StringInStr($sFilename, "ftp://") Then
            Local $http = ObjCreate("winhttp.winhttprequest.5.1")
            $http.Open("GET", $sFilename)
            $http.Send()
            $script = $http.Responsetext
        EndIf
    EndIf
    If $script Then
        $_HtmlParser_Script &= @CRLF & @CRLF & $script
        Return 1
    EndIf
    Return 0
EndFunc

Func _HtmlParser_ClearScript ( $sFilename )
    $_HtmlParser_Script = ""
EndFunc

Func _HtmlParser_Exec ( $sScript )
    Local $doc = _HtmlParser_GetDocument()
    If IsObj($doc) AND IsObj($doc.parentWindow) Then
        Local $window = $doc.parentWindow
        $window.execScript('window._HtmlParser_Result=(function(){' & $sScript & '})();', 'Javascript')
        If $window._HtmlParser_Result OR IsObj($window._HtmlParser_Result) Then
            Return $window._HtmlParser_Result
        Else
            Return SetError(2, 0, 0)
        EndIf
    EndIf
    Return SetError(1, 0, 0)
EndFunc

#region >> Internals

Func __HtmlParser_Shutdown ()
    Global $_HtmlParser_IE = 0
    If IsDeclared("_HtmlParser_HWND") Then WinClose($_HtmlParser_HWND)
    OnAutoItExitUnregister("__HtmlParser_Shutdown")
EndFunc

Func __HtmlParser_ParseCmdLine ()
    If @compiled Then
      Global Const $_HtmlParser_Exec = '"' & @ScriptFullPath & '"'
    Else
      Global Const $_HtmlParser_Exec = '"' & @AutoItExe & '" "' & @ScriptFullPath & '"'
    EndIf

    Local $cmd = StringInStr($CmdLineRaw, "/_HtmlParser_", 1), $val
    If $cmd > 0 Then
        #NoTrayIcon
        $cmd = StringRegExp(StringMid($CmdLineRaw, $cmd), '/(_HtmlParser_[a-zA-Z]+)\:"([^"]*)"', 3)
        For $i = 0 To UBound($cmd)-1 Step 2
            $val = $cmd[$i+1]
            If $val = ("" & Number($val)) Then $val = Number($val)
            Assign($cmd[$i], $val, 2)
        Next
        __HtmlParser_DaemonRun()
        Exit
    Else
        Global $_HtmlParser_GUID = __WinAPI_CreateGUID()
        __HtmlParser_Daemonstart()
    EndIf
EndFunc

Func __HtmlParser_Daemonstart ()
    Run ( $_HtmlParser_Exec & _
        ' /_HtmlParser_GUID:"' & $_HtmlParser_GUID & '"' & _
        ' /_HtmlParser_Debug:"' & (0 + $_HtmlParser_Debug) & '"' & _
        ' /_HtmlParser_Port:"' & $_HtmlParser_Port & '"' )
EndFunc

Func __HtmlParser_DaemonRun ()
    ; Initialize GUI
    Local $hGUI = GUICreate("", 500, 400, 10, 10, $WS_SIZEBOX, $WS_EX_TOOLWINDOW)
    Global $_HtmlParser_IE = _IECreateEmbedded()
    Local $hIE = GUICtrlCreateObj($_HtmlParser_IE, 0, 0, _WinAPI_GetClientWidth($hGUI), _WinAPI_GetClientHeight($hGUI))
    GUICtrlSetResizing($hIE, $GUI_DOCKAUTO)
    GUIRegisterMsg($WM_SYSCOMMAND, "__HtmlParser_SysCommand")
    If $_HtmlParser_Debug Then GUISetState()
    
    ; Initialize IE
    _IE_SetSessionProxy("127.0.0.1:" & $_HtmlParser_Port)
    _IENavigate($_HtmlParser_IE, "about:blank")
    _IEPropertySet($_HtmlParser_IE, "silent", True)
    WinSetTitle($hGUI, "", $_HtmlParser_GUID)
    
    Local $timeout = TimerInit()
    Do
        If TimerDiff($timeout) > 5000 Then Exit
    Until WinGetTitle($hGUI) <> $_HtmlParser_GUID
    If $_HtmlParser_Debug Then WinSetTitle($hGUI, "", "HtmlParser Debug Window")
    
    While 1
        Sleep(100)
    WEnd
EndFunc

Func __HtmlParser_SysCommand ($hWnd, $Msg, $wParam, $lParam)
    #forceref $Msg, $wParam, $lParam
    If BitAND($wParam, 0xFFF0) = 0xF060 Then Exit
    Return $GUI_RUNDEFMSG
EndFunc

#endregion << Internals

#region >> WinAPIEx

Func __WinAPI_CreateGUID()

    Local $tGUID, $Ret

    $tGUID = DllStructCreate($tagGUID)
    $Ret = DllCall('ole32.dll', 'uint', 'CoCreateGuid', 'ptr', DllStructGetPtr($tGUID))
    If @error Then
        Return SetError(1, 0, '')
    Else
        If $Ret[0] Then
            Return SetError(1, $Ret[0], 0)
        EndIf
    EndIf
    $Ret = DllCall('ole32.dll', 'int', 'StringFromGUID2', 'ptr', DllStructGetPtr($tGUID), 'wstr', '', 'int', 39)
    If (@error) Or (Not $Ret[0]) Then
        Return SetError(1, 0, '')
    EndIf
    Return $Ret[2]
EndFunc   ;==>__WinAPI_CreateGUID

Func _IE_SetSessionProxy ($sProxyAddress, $sBypassList="")
    Local $tProxyAddress = DllStructCreate("char[" & StringLen($sProxyAddress) + 1 & "]"), _
          $tBypassList = DllStructCreate("char[" & StringLen($sBypassList) + 1 & "]"), _
          $tINTERNET_PROXY_INFO = DllStructCreate($tagINTERNET_PROXY_INFO)
          
    DllStructSetData($tINTERNET_PROXY_INFO, "dwAccessType", 0x3)
    DllStructSetData($tProxyAddress, 1, $sProxyAddress)
    DllStructSetData($tINTERNET_PROXY_INFO, "lpszProxy", DllStructGetPtr($tProxyAddress))
    DllStructSetData($tBypassList, 1, $sBypassList)
    DllStructSetData($tINTERNET_PROXY_INFO, "lpszProxyBypass", DllStructGetPtr($tBypassList))
    
    Local $aRet = DllCall("urlmon.dll", "INT", "UrlMkSetSessionOption", _
                            "uint", 0x26, _
                            "ptr", DllStructGetPtr($tINTERNET_PROXY_INFO), _
                            "int", DllStructGetSize($tINTERNET_PROXY_INFO), _
                            "int", 0 )
    If @error OR $aRet[0] Then Return SetError(1, @error, 0)
    
    Return 1
EndFunc

#endregion << WinAPIEx

#region >> Debugging

Func _Alert ($msg, $fDialog=1)
  If $fDialog Then
    MsgBox(0, @ScriptName, $msg)
  Else
    TrayTip(@ScriptName, $msg, 5000)
  EndIf
EndFunc

Func _Critical ($ret, $rel=0, $msg="Fatal Error", $err=@error, $ext=@extended, $ln = @ScriptLineNumber)
  If $err Then
    $ln += $rel
    Local $LastError = _WinAPI_GetLastError(), _
          $LastErrorMsg = _WinAPI_GetLastErrorMessage(), _
          $LastErrorHex = Hex($LastError)
    $LastErrorHex = "0x" & StringMid($LastErrorHex, StringInStr($LastErrorHex, "0", 1, -1)+1)
    $msg &= @CRLF & "at line " & $ln & @CRLF & @CRLF & "AutoIt Error: " & $err & " (0x" & Hex($err)  & ") Extended: " & $ext
    If $LastError Then $msg &= @CRLF & "WinAPI Error: " & $LastError & " (" & $LastErrorHex & ")" & @CRLF & $LastErrorMsg
    ClipPut($msg)
    MsgBox(270352, "Fatal Error - " & @ScriptName, $msg)
    Exit
  EndIf
  Return $ret
EndFunc

#endregion << Debugging

; ==============================================================================

Func _HtmlParser_Test ()
    $_HtmlParser_Debug = True
    
    _Critical( _HtmlParser_Startup() )
    
    ; warming up
    Local $doc = _HtmlParser_GetDocument()
    $doc.write("Hello AutoIt World")
    _Alert("now for real")
    
    _HtmlParser_LoadScript("https://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js")
    ; Or:
    ; _HtmlParser_LoadScript(@ScriptDir & "\jquery.min.js")
    Local $doc = _HtmlParser_LoadUrl("http://www.autoitscript.com", True)
    Local $pages = _Critical( _HtmlParser_Exec('var p=[];$("p").each(function(index) { p.push("[" + (index+1) + "] " + $(this).text());});return p') )
    Local $s = ""
    For $i = 0 To $pages.length-1
        $s &= $pages.get($i) & @CRLF
    Next
    _Alert($s)
    
    Local $saved = $doc.body.parentElement.outerHTML
    Local $divs = _HtmlParser_Exec('return $(".featitem.clearfix")')
    Local $div, $features = ""
    For $i = 0 To $divs.length-1
        $div = $divs.get($i)
        $features &= $div.outerHTML
    Next
    $doc.body.innerHTML = $features
    _Alert("A lot faster parsing in javascript though")
    
    _HtmlParser_LoadHtml($saved, True)
    Local $features = _HtmlParser_Exec('var s=""; $(".featitem.clearfix").each(function(){ s += this.outerHTML }); document.body.innerHTML=s; return s')  
    _Alert($features)
    
EndFunc
If @ScriptName = $_HtmlParser_ScriptName Then _HtmlParser_Test()

Edit: added $fRemoveScriptTags option for _HtmlParser_LoadUrl and _HtmlParser_LoadHtml

Edited by eimhym

Share this post


Link to post
Share on other sites
eimhym

Heres another, IE-less, example. Needs libtidy (attached) to clean pages and pass the well-formed html into MSXML ActiveX control (see MSDN documentation here).

pros: 1) lightweight, lighting fast. 2) XPath available by default. 3) Script tags won't executed, css or images won't loaded. 4) No problem in deleting SCRIPT tag.

cons: 1) does fail with some pages, always check @error after calling _HXmlParser_LoadUrl or _HXmlParser_LoadHtml. 2) libtidy crash on HTML5 pages, you have to reload the dll. 3) Doesn't handle html tags within textarea correctly, suggestion for workaround expected. 4) Can't use JS framework.

The sample code do the same as HtmlParser, for comparison.

#include-once
#include "libtidy.au3"

; Opt("MustDeclareVars", 1)

Global Const $_HXmlParser_ScriptName = "HXmlParser.au3"
Global $_HXmlParser_DOM = 0

Func _HXmlParser_Startup ( $sConfFilename="tidy-xml-settings.cfg" )
    _LibTidy_Startup()
    If @error Then Return SetError(@error, @extended, 0)
    _LibTidy_LoadConfig($sConfFilename)
    If @error Then
        _LibTidy_Shutdown()
        Return SetError(@error, @extended, 0)
    EndIf
    $_HXmlParser_DOM = ObjCreate("MSXML2.DOMDocument")
    OnAutoItExitRegister("__HXmlParser_Shutdown")
    $_HXmlParser_DOM.validateOnParse = False;
    $_HXmlParser_DOM.resolveExternals = False;
    Return 1
EndFunc

Func __HXmlParser_Shutdown ()
    $_HXmlParser_DOM = 0
    OnAutoItExitUnRegister("__HXmlParser_Shutdown")
EndFunc

Func _HXmlParser_GetErrorString ()
    If IsObj($_HXmlParser_DOM.parseError) AND $_HXmlParser_DOM.parseError.errorCode Then
        Return "Error loading page " & _
                " (" & Hex($_HXmlParser_DOM.parseError.errorCode) & _
                ") at line: " & $_HXmlParser_DOM.parseError.line & _
                ", position: " & $_HXmlParser_DOM.parseError.linepos & _
                ", reason: " & $_HXmlParser_DOM.parseError.reason
    EndIf
    Return 0
EndFunc

Func _HXmlParser_LoadHtml ( $sHtml )
    If NOT $sHtml Then Return SetError(4, 0, 0)
    _LibTidy_LoadString( $sHtml )
    If @error Then Return SetError(@error, @extended, 0)
    $sHtml = _LibTidy_CleanAndRepair()
    If @error OR NOT $sHtml Then Return SetError(5, @error, 0)
    
    $sHtml = StringRegExp(StringMid($sHtml, StringInStr($sHtml, "<html")), "(?s)^<html[^>]*>[^<]*(<.*)", 1)
    If @error Then Return SetError(6, @error, 0)

    $sHtml = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd" [' _
        & '<!ENTITY nbsp " "><!ENTITY iexcl "¡"><!ENTITY cent "¢"><!ENTITY pound "£"><!ENTITY curren "¤"><!ENTITY yen "¥"><!ENTITY brvbar "¦"><!ENTITY sect "§"><!ENTITY uml "¨"><!ENTITY copy "©"><!ENTITY ordf "ª"><!ENTITY laquo "«"><!ENTITY not "¬"><!ENTITY shy "­"><!ENTITY reg "®"><!ENTITY macr "¯"><!ENTITY deg "°"><!ENTITY plusmn "±"><!ENTITY sup2 "²"><!ENTITY sup3 "³"><!ENTITY acute "´"><!ENTITY micro "µ"><!ENTITY para "¶"><!ENTITY middot "·"><!ENTITY cedil "¸"><!ENTITY sup1 "¹"><!ENTITY ordm "º"><!ENTITY raquo "»"><!ENTITY frac14 "¼"><!ENTITY frac12 "½"><!ENTITY frac34 "¾"><!ENTITY iquest "¿"><!ENTITY times "×"><!ENTITY divide "÷">' _
        & '<!ENTITY Agrave "À"><!ENTITY Aacute "Á"><!ENTITY Acirc "Â"><!ENTITY Atilde "Ã"><!ENTITY Auml "Ä"><!ENTITY Aring "Å"><!ENTITY AElig "Æ"><!ENTITY Ccedil "Ç"><!ENTITY Egrave "È"><!ENTITY Eacute "É"><!ENTITY Ecirc "Ê"><!ENTITY Euml "Ë"><!ENTITY Igrave "Ì"><!ENTITY Iacute "Í"><!ENTITY Icirc "Î"><!ENTITY Iuml "Ï"><!ENTITY ETH "Ð"><!ENTITY Ntilde "Ñ"><!ENTITY Ograve "Ò"><!ENTITY Oacute "Ó"><!ENTITY Ocirc "Ô"><!ENTITY Otilde "Õ"><!ENTITY Ouml "Ö"><!ENTITY Oslash "Ø"><!ENTITY Ugrave "Ù"><!ENTITY Uacute "Ú"><!ENTITY Ucirc "Û"><!ENTITY Uuml "Ü"><!ENTITY Yacute "Ý"><!ENTITY THORN "Þ"><!ENTITY szlig "ß"><!ENTITY agrave "à"><!ENTITY aacute "á"><!ENTITY acirc "â"><!ENTITY atilde "ã"><!ENTITY auml "ä"><!ENTITY aring "å"><!ENTITY aelig "æ"><!ENTITY ccedil "ç"><!ENTITY egrave "è"><!ENTITY eacute "é"><!ENTITY ecirc "ê"><!ENTITY euml "ë"><!ENTITY igrave "ì"><!ENTITY iacute "í"><!ENTITY icirc "î"><!ENTITY iuml "ï"><!ENTITY eth "ð"><!ENTITY ntilde "ñ"><!ENTITY ograve "ò"><!ENTITY oacute "ó"><!ENTITY ocirc "ô"><!ENTITY otilde "õ"><!ENTITY ouml "ö"><!ENTITY oslash "ø"><!ENTITY ugrave "ù"><!ENTITY uacute "ú"><!ENTITY ucirc "û"><!ENTITY uuml "ü"><!ENTITY yacute "ý"><!ENTITY thorn "þ"><!ENTITY yuml "ÿ">' _
        & ']>' & @CRLF _
        & '<html>' & @CRLF _
        & $sHtml[0]

    $_HXmlParser_DOM.loadXML($sHtml);
    If IsObj($_HXmlParser_DOM.parseError) AND $_HXmlParser_DOM.parseError.errorCode Then
        SetError(7, $_HXmlParser_DOM.parseError.errorCode, 0)
    EndIf
    
    $_HXmlParser_DOM.setProperty("SelectionLanguage", "XPath");
    
    Return $_HXmlParser_DOM
EndFunc

Func _HXmlParser_LoadUrl ( $sUrl )
    Local $http = ObjCreate("winhttp.winhttprequest.5.1")
    $http.Open("GET", $sUrl)
    $http.Send()
    Local $ret = _HXmlParser_LoadHtml($http.Responsetext)
    If @error Then Return SetError(@error, @extended, $ret)
    Return $ret
EndFunc


#region >> Debugging

Func _Alert ($msg, $fDialog=1, $err=@error, $ext=@extended, $ln = @ScriptLineNumber)
  If $fDialog Then
    MsgBox(0, @ScriptName, $msg)
  Else
    TrayTip(@ScriptName, $msg, 5000)
  EndIf
  If $err Then Return SetError($err, $ext, $ln)
  Return 0
EndFunc

Func _Critical ($ret, $rel=0, $msg="Fatal Error", $err=@error, $ext=@extended, $ln = @ScriptLineNumber)
  If $err Then
    $ln += $rel
    Local $LastError = _WinAPI_GetLastError(), _
          $LastErrorMsg = _WinAPI_GetLastErrorMessage(), _
          $LastErrorHex = Hex($LastError)
    $LastErrorHex = "0x" & StringMid($LastErrorHex, StringInStr($LastErrorHex, "0", 1, -1)+1)
    $msg &= @CRLF & "at line " & $ln & @CRLF & @CRLF & "AutoIt Error: " & $err & " (0x" & Hex($err)  & ") Extended: " & $ext
    If $LastError Then $msg &= @CRLF & "WinAPI Error: " & $LastError & " (" & $LastErrorHex & ")" & @CRLF & $LastErrorMsg
    $msg &= @CRLF & @CRLF & _HXmlParser_GetErrorString()
    ClipPut($msg)
    MsgBox(270352, "Fatal Error - " & @ScriptName, $msg)
    Exit
  EndIf
  Return $ret
EndFunc

#endregion << Debugging

Func _HXmlParser_Test ()
    
    _Critical( _HXmlParser_Startup() )
    Local $dom = _Critical( _HXmlParser_LoadUrl("http://www.AutoItScript.com") )
   
    _Alert("Removing all SCRIPT tags")
    Local $begin = TimerInit()
    Local $nodes = $dom.selectNodes("//script")
    If $nodes.length Then
        For $i = 0 To $nodes.length - 1
            Local $node = $nodes.item($i)
            $node.parentNode.removeChild($node)
        Next
    EndIf
    _Alert("Done in " & TimerDiff($begin) & " ms")
    
    _Alert("Collecting all P tags")
    Local $count = 1, $s = ""
    $begin = TimerInit()
    $nodes = $dom.selectNodes("//p")
    If $nodes.length Then
        For $i = 0 To $nodes.length - 1
            Local $node = $nodes.item($i)
            $s &= "[" & $count & "] " & $node.text & @CRLF
            $count += 1
        Next
    EndIf
    _Alert("Done in " & TimerDiff($begin) & " ms")
    _Alert($s)

    $s = ""
    _Alert("Collecting all feature DIVs")
    $begin = TimerInit()
    $nodes = $dom.selectNodes("//div[contains(@class, 'featitem')]")
    If $nodes.length Then
        For $i = 0 To $nodes.length - 1
            Local $node = $nodes.item($i)
            $s &= $node.xml & @CRLF
        Next
    EndIf
    _Alert("Done in " & TimerDiff($begin) & " ms")
    _Alert($s)
    
    ClipPut($dom.xml)
    _Alert("HTML content is in clipboard")
    
EndFunc
If @ScriptName = $_HXmlParser_ScriptName Then _HXmlParser_Test()

libtidy.7z

Edited by eimhym

Share this post


Link to post
Share on other sites
level20peon

I get an "libtidy.au3(106,68) : ERROR: $tidyLoadConfig: undeclared global variable."

What am I doing wrong here ?

Edited by level20peon

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • ScriptJunky
      By ScriptJunky
      I noticed a lack of a constants file for _WinAPI_GetSystemMetrics() so I made this for anyone who wants to add it to their library. Enjoy!  (file attached below)
      #include-once ; #INDEX# ======================================================================================================================= ; Title .........: WinAPI GetSystemMetrics Constants ; AutoIt Version : 3.3.14.5 ; Language ......: English ; Description ...: Constants for _WinAPI_GetSystemMetrics(). ; Author(s) .....: ScriptJunky ; =============================================================================================================================== ; #CONSTANTS# =================================================================================================================== ; _WinAPI_GetSystemMetrics() Global Const $ARRANGE = 56 Global Const $CLEANBOOT = 67 Global Const $CMONITORS = 80 Global Const $CMOUSEBUTTONS = 43 Global Const $CONVERTIBLESLATEMODE = 0x2003 Global Const $CXBORDER = 5 Global Const $CXCURSOR = 13 Global Const $CXDLGFRAME = 7 Global Const $CXDOUBLECLK = 36 Global Const $CXDRAG = 68 Global Const $CXEDGE = 45 Global Const $CXFIXEDFRAME = 7 Global Const $CXFOCUSBORDER = 83 Global Const $CXFRAME = 32 Global Const $CXFULLSCREEN = 16 Global Const $CXHSCROLL = 21 Global Const $CXHTHUMB = 10 Global Const $CXICON = 11 Global Const $CXICONSPACING = 38 Global Const $CXMAXIMIZED = 61 Global Const $CXMAXTRACK = 59 Global Const $CXMENUCHECK = 71 Global Const $CXMENUSIZE = 54 Global Const $CXMIN = 28 Global Const $CXMINIMIZED = 57 Global Const $CXMINSPACING = 47 Global Const $CXMINTRACK = 34 Global Const $CXPADDEDBORDER = 92 Global Const $CXSCREEN = 0 Global Const $CXSIZE = 30 Global Const $CXSIZEFRAME = 32 Global Const $CXSMICON = 49 Global Const $CXSMSIZE = 52 Global Const $CXVIRTUALSCREEN = 78 Global Const $CXVSCROLL = 2 Global Const $CYBORDER = 6 Global Const $CYCAPTION = 4 Global Const $CYCURSOR = 14 Global Const $CYDLGFRAME = 8 Global Const $CYDOUBLECLK = 37 Global Const $CYDRAG = 69 Global Const $CYEDGE = 46 Global Const $CYFIXEDFRAME = 8 Global Const $CYFOCUSBORDER = 84 Global Const $CYFRAME = 33 Global Const $CYFULLSCREEN = 17 Global Const $CYHSCROLL = 3 Global Const $CYICON = 12 Global Const $CYICONSPACING = 39 Global Const $CYKANJIWINDOW = 18 Global Const $CYMAXIMIZED = 62 Global Const $CYMAXTRACK = 60 Global Const $CYMENU = 15 Global Const $CYMENUCHECK = 72 Global Const $CYMENUSIZE = 55 Global Const $CYMIN = 29 Global Const $CYMINIMIZED = 58 Global Const $CYMINSPACING = 48 Global Const $CYMINTRACK = 35 Global Const $CYSCREEN = 1 Global Const $CYSIZE = 31 Global Const $CYSIZEFRAME = 33 Global Const $CYSMCAPTION = 51 Global Const $CYSMICON = 50 Global Const $CYSMSIZE = 53 Global Const $CYVIRTUALSCREEN = 79 Global Const $CYVSCROLL = 20 Global Const $CYVTHUMB = 9 Global Const $DBCSENABLED = 42 Global Const $DEBUG = 22 Global Const $DIGITIZER = 94 Global Const $IMMENABLED = 82 Global Const $MAXIMUMTOUCHES = 95 Global Const $MEDIACENTER = 87 Global Const $MENUDROPALIGNMENT = 40 Global Const $MIDEASTENABLED = 74 Global Const $MOUSEPRESENT = 19 Global Const $MOUSEHORIZONTALWHEELPRESENT = 91 Global Const $MOUSEWHEELPRESENT = 75 Global Const $NETWORK = 63 Global Const $PENWINDOWS = 41 Global Const $REMOTECONTROL = 0x2001 Global Const $REMOTESESSION = 0x1000 Global Const $SAMEDISPLAYFORMAT = 81 Global Const $SECURE = 44 Global Const $SERVERR = 289 Global Const $SHOWSOUNDS = 70 Global Const $SHUTTINGDOWN = 0x2000 Global Const $SLOWMACHINE = 73 Global Const $STARTER = 88 Global Const $SWAPBUTTON = 23 Global Const $TABLETPC = 86 Global Const $XVIRTUALSCREEN = 76 Global Const $YVIRTUALSCREEN = 77  
      WinAPISystemMetricsConstants.au3
    • rcmaehl
      By rcmaehl
      A UDF with Extended Functions for Window Management
       
      Notes:
      Fixes WinGetClassList's barbaric returning of a @LF separated string instead of an array.
       
      Potential Uses:
      Automating applications that change their controls' handles/classes on each launch (e.g. half of Cisco's programs)
       
      Functions:
      _WinGetClassList
      _WinGetClassNNList
      _WindowGetHandleList
      _WindowGetHandleListFromPos
       
      Download: 
      WindowEx.zip  (v0.4)
       
      Changelog:
      10/04/2016 (v0.4): _WinGetClassNNList Fixed : Not Returning an Index when using $2D_ARRAY _WinGetClassNNList Fixed : Not Properly returning $aArray[x][1] on Classes with instances > 9 when using $2D_ARRAY 10/03/2016 (v0.3): _WinGetClassList Added : Exactly the same as WinGetClassList but returns a more civilized Array _WinGetClassNNList Added : Returns Classes and their instances in either a 1D or 2D array depending on Flags _WindowGetHandleList Renamed: _WinGetHandleList SCRIPT BREAKING! _WindowGetHandleListFromPos Renamed: _WinGetHandleListFromPos SCRIPT BREAKING! 10/01/2016 (v0.2): WindowsExConstants.au3 Added : Flags in _WindowGetHandleListFromPos _WindowGetHandleListFromPos Removed: ConsoleWrite left in during debug _WindowGetHandleListFromPos Added : Flag for if part of a Control is at $X, $Y return it as well. 10/01/2016 (v0.1): _WindowGetHandleList Added : Retrieves the handles of classes from a window. _WindowGetHandleListFromPos Added : Retrieves the handles of classes at a specific position from a window. Known and Reported Bugs:
      None reported To Do:
      To Be Decided. Opinions welcome! Upcoming Changes:
      To Be Decided.
    • SkysLastChance
      By SkysLastChance
       
      WinActivate("MEDITECH - Internet Explorer") Sleep (500) $oIE = _IEAttach("MEDITECH") $oDiv1 = _IEGetObjById($oIE, "sysmenu-searchbarbutton") _IEAction($oDiv1, "click") I am just trying to click the little magnifying glass, next to the gear button with no luck. I was hoping someone might have an idea why this is not working?
       

    • FrancescoDiMuro
      By FrancescoDiMuro
      Good morning everyone

      I was playing a little bit with "Screen Capture" UDF, and I was trying to make a "Window" capture, but, since I made a GUI which through I fire the event "Capture", my GUI is captured as well, and I don't want to
      This is the line of code that makes the capture:
       
      _ScreenCapture_CaptureWnd($strScreenCaptureFileName, $objActiveWindow, 0, 0, -1, -1, False) And these are the lines of code which select the "active" window:
       
      Local $objCurrentWindow = 9999 If _IsPressed("01") Then $objCurrentWindow = WinGetHandle("[ACTIVE]") If $objCurrentWindow <> $objMyGUI Then $objActiveWindow = $objCurrentWindow EndIf EndIf Sorry If I made stupid mistakes
      Thanks in advance.

      Francesco
    • Burgs
      By Burgs
      Hello,
        I have a website with a Google Map I setup using the Google Map API.  It works and displays just fine.  However to make it useful to me I need to be able to dynamically change the map to display different areas by sending new Latitude and Longitude coordinates.  I am having difficulty making this happen.  Here is my code thus far:
      #include <IE.au3> $oIE3 = _IECreate("http://my_sample_website.html") ;just an example, not an actual site... _IELoadWait($oIE3) $s_word = "lat:" $oInputs = _IETagNameAllGetCollection($oIE3) if @error <> 0 Then MsgBox($MB_SYSTEMMODAL, "ERROR", "Error is: " & @error) EndIf ;@error For $oInput In $oInputs if Number($iPos) == -1 Then $iPos = StringInStr($oInput.innerHTML, String($s_word)) if (Number($iPos) > 0) AND (@error == 0) Then ConsoleWrite("I FOUND IT...! " & String($s_word) & @CRLF) $sHTML = _IEBodyReadHTML($oIE3) $_lat_look = 0 $_lng_look = 0 $_end_look = 0 ;default $_lat_look = StringInStr(String($sHTML), "lat:") if Number($_lat_look) <> 0 Then $_lng_look = StringInStr(String($sHTML), "lng:") if Number($_lng_look) <> 0 Then $_end_look = StringInStr(String($sHTML), "}") if Number($_end_look) <> 0 Then ConsoleWrite("HTML BODY: " & $sHTML & @CRLF) $_old_lat = String(StringMid(String($sHTML), $_lat_look, ($_lng_look - $_lat_look))) $_old_lng = String(StringMid(String($sHTML), $_lng_look, ($_end_look - $_lng_look))) ConsoleWrite("$_old_lat: " & $_old_lat & @CRLF) ConsoleWrite("$_old_lng: " & $_old_lng & @CRLF) $_new_lat = "lat: " & String("-34.397") & ", " $_new_lng = "lng: " & String("150.644") & "}; " ConsoleWrite("...new lat is: " & String($_new_lat) & " new lng is: " & String($_new_lng) & @CRLF) $_LOOK = StringReplace($_old_lat, 1, String($_new_lat)) $_LOOK2 = StringReplace($_old_lng, 1, String($_new_lng)) ConsoleWrite("$_LOOK: " & $_LOOK & "$_LOOK2: " & $_LOOK2 & @CRLF) EndIf ;'$_end_look' NOT "0"... $iPos = -1 EndIf ;'String($s_word)' was found in the collection '$oInputs' EndIf ;'$iPos' is "-1" Next  
        I am having trouble trying to replace the line in the HTML ($sHTML variable in my example) that contains the "lat:" and "lng:" information.  I figure if I can replace that line everything else remains the same, and in theory, the map should cycle to display a map with the new latitude and longitude coordinates...I hope. 
        I have attempted to write the $sHTML to a text document and then use '_IEBodyWriteHTML' to read it back into the webpage HTML however that is not working.  There must be an easier method to accomplish this...what am I missing here...?  Any thoughts greatly appreciated.  Regards.       
×