Decipher

HTML Parser UDF

8 posts in this topic

Hi,

I'm inviting all autoit forum members to contribute to a HTML parser udf. I going to attempt to replicate a python module called BeautifulSoup. It would be greatly appreciated if some senior Autoit programmers took interest in this topic. There is no template other than the module written in python located here and the documentation here.

I can't wait to see what this develops into. :robot:


Spoiler

censored.jpg

 

Share this post


Link to post
Share on other sites



I don't see the need, there is the _IE UDF, and with my link below, you can focus on any node(s) through an XPATH.


IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

I see the need for simplicity. Would you care to give an example of how to use the functions mentioned above for data extraction.


Spoiler

censored.jpg

 

Share this post


Link to post
Share on other sites

I stand corrected. There are templates for parsing HTML... and in UDF format. Thank You @ Jdelaney & KaFu.

If anyone else has UDF's or examples they'd like to share for anyone following path here that would be great. :)


Spoiler

censored.jpg

 

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Finally found a way to "mute" IE -- making it unable to load external resources except the already cached ones -- makes it load page faster. Other IE instances won't be affected, only the one we used as html parser.

This simple wrapper functions:

_HtmlParser_Startup

_HtmlParser_GetDocument ; the IHtmlDocument ref

_HtmlParser_LoadHtml ; load plain text, optionally remove all script tags

_HtmlParser_LoadUrl

_HtmlParser_LoadScript ; to use jquery, xpath, etc.

_HtmlParser_ClearScript

_HtmlParser_Exec ; execute js and get return value, see sample

Save as "HtmlParser.au3" and run, the sample code is included. Clear IE cache first for the best result. Hope it helps :)

#include-once
#include <WinAPI.au3>
#include <WindowsConstants.au3>
#include <GuiConstantsEx.au3>
#include <IE.au3>

; Opt("MustDeclareVars", 1)

Global $_HtmlParser_Debug = False
Global $_HtmlParser_Script = ""

Global Const $_HtmlParser_ScriptName = "HtmlParser.au3"
Global Const $tagINTERNET_PROXY_INFO = "dword dwAccessType; ptr lpszProxy; ptr lpszProxyBypass";
Global Const $tagCOPYDATA = _
  "ULONG_PTR;" & _  ; dwData, The data to be passed to the receiving application
  "DWORD;" & _    ; cbData, The size, in bytes, of the data pointed to by the lpData member
  "PTR"          ; lpData, The data to be passed to the receiving application. This member can be NULL.

Func _HtmlParser_Startup ($iPort=843)
    If IsDeclared("_HtmlParser_IE") AND IsObj($_HtmlParser_IE) Then Return 1

    Global $_HtmlParser_Port = $iPort
    __HtmlParser_ParseCmdLine()
    
    Global $_HtmlParser_HWND = WinWait($_HtmlParser_GUID, "", 5)
    If NOT $_HtmlParser_HWND Then Return SetError(1, 0, 0) ; Daemon failed to start
    WinSetTitle($_HtmlParser_HWND, "", "")
    
    ; Attach IE
    Global $_HtmlParser_IE = _IEAttach(WinGetHandle($_HtmlParser_GUID), "embedded")
    If NOT IsObj(_HtmlParser_GetDocument()) Then Return SetError(2, 0, 0)
    
    OnAutoItExitRegister("__HtmlParser_Shutdown")
    Return 1
EndFunc

Func _HtmlParser_GetDocument ()
    If IsObj($_HtmlParser_IE) Then Return _IEDocGetObj($_HtmlParser_IE)
    Return SetError(1, 0, 0)
EndFunc

Func _HtmlParser_LoadHtml ( $sHtml, $fRemoveScriptTags=0 )
    _IENavigate($_HtmlParser_IE, "about:blank")
    Local $doc = _HtmlParser_GetDocument()
    If $fRemoveScriptTags Then $sHtml = StringRegExpReplace($sHtml, "<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>", "")
    $doc.Write($sHtml & @CRLF & '<script language="javascript">' & @CRLF & $_HtmlParser_Script & @CRLF & 'Array.prototype.set=function(i,v){this[i]=v};Array.prototype.get=function(i){return this[i]};document.scripts[document.scripts.length-1].removeNode(false)</script>')
    Return $doc
EndFunc

Func _HtmlParser_LoadUrl ( $sUrl, $fRemoveScriptTags=0 )
    Local $http = ObjCreate("winhttp.winhttprequest.5.1")
    $http.Open("GET", $sUrl)
    $http.Send()
    Return _HtmlParser_LoadHtml($http.Responsetext, $fRemoveScriptTags)
EndFunc

Func _HtmlParser_LoadScript ( $sFilename )
    Local $script = ""
    If FileExists ($sFilename) Then
        $script = FileRead($sFilename)
    Else
        If StringInStr($sFilename, "http://") OR StringInStr($sFilename, "https://") OR StringInStr($sFilename, "ftp://") Then
            Local $http = ObjCreate("winhttp.winhttprequest.5.1")
            $http.Open("GET", $sFilename)
            $http.Send()
            $script = $http.Responsetext
        EndIf
    EndIf
    If $script Then
        $_HtmlParser_Script &= @CRLF & @CRLF & $script
        Return 1
    EndIf
    Return 0
EndFunc

Func _HtmlParser_ClearScript ( $sFilename )
    $_HtmlParser_Script = ""
EndFunc

Func _HtmlParser_Exec ( $sScript )
    Local $doc = _HtmlParser_GetDocument()
    If IsObj($doc) AND IsObj($doc.parentWindow) Then
        Local $window = $doc.parentWindow
        $window.execScript('window._HtmlParser_Result=(function(){' & $sScript & '})();', 'Javascript')
        If $window._HtmlParser_Result OR IsObj($window._HtmlParser_Result) Then
            Return $window._HtmlParser_Result
        Else
            Return SetError(2, 0, 0)
        EndIf
    EndIf
    Return SetError(1, 0, 0)
EndFunc

#region >> Internals

Func __HtmlParser_Shutdown ()
    Global $_HtmlParser_IE = 0
    If IsDeclared("_HtmlParser_HWND") Then WinClose($_HtmlParser_HWND)
    OnAutoItExitUnregister("__HtmlParser_Shutdown")
EndFunc

Func __HtmlParser_ParseCmdLine ()
    If @compiled Then
      Global Const $_HtmlParser_Exec = '"' & @ScriptFullPath & '"'
    Else
      Global Const $_HtmlParser_Exec = '"' & @AutoItExe & '" "' & @ScriptFullPath & '"'
    EndIf

    Local $cmd = StringInStr($CmdLineRaw, "/_HtmlParser_", 1), $val
    If $cmd > 0 Then
        #NoTrayIcon
        $cmd = StringRegExp(StringMid($CmdLineRaw, $cmd), '/(_HtmlParser_[a-zA-Z]+)\:"([^"]*)"', 3)
        For $i = 0 To UBound($cmd)-1 Step 2
            $val = $cmd[$i+1]
            If $val = ("" & Number($val)) Then $val = Number($val)
            Assign($cmd[$i], $val, 2)
        Next
        __HtmlParser_DaemonRun()
        Exit
    Else
        Global $_HtmlParser_GUID = __WinAPI_CreateGUID()
        __HtmlParser_Daemonstart()
    EndIf
EndFunc

Func __HtmlParser_Daemonstart ()
    Run ( $_HtmlParser_Exec & _
        ' /_HtmlParser_GUID:"' & $_HtmlParser_GUID & '"' & _
        ' /_HtmlParser_Debug:"' & (0 + $_HtmlParser_Debug) & '"' & _
        ' /_HtmlParser_Port:"' & $_HtmlParser_Port & '"' )
EndFunc

Func __HtmlParser_DaemonRun ()
    ; Initialize GUI
    Local $hGUI = GUICreate("", 500, 400, 10, 10, $WS_SIZEBOX, $WS_EX_TOOLWINDOW)
    Global $_HtmlParser_IE = _IECreateEmbedded()
    Local $hIE = GUICtrlCreateObj($_HtmlParser_IE, 0, 0, _WinAPI_GetClientWidth($hGUI), _WinAPI_GetClientHeight($hGUI))
    GUICtrlSetResizing($hIE, $GUI_DOCKAUTO)
    GUIRegisterMsg($WM_SYSCOMMAND, "__HtmlParser_SysCommand")
    If $_HtmlParser_Debug Then GUISetState()
    
    ; Initialize IE
    _IE_SetSessionProxy("127.0.0.1:" & $_HtmlParser_Port)
    _IENavigate($_HtmlParser_IE, "about:blank")
    _IEPropertySet($_HtmlParser_IE, "silent", True)
    WinSetTitle($hGUI, "", $_HtmlParser_GUID)
    
    Local $timeout = TimerInit()
    Do
        If TimerDiff($timeout) > 5000 Then Exit
    Until WinGetTitle($hGUI) <> $_HtmlParser_GUID
    If $_HtmlParser_Debug Then WinSetTitle($hGUI, "", "HtmlParser Debug Window")
    
    While 1
        Sleep(100)
    WEnd
EndFunc

Func __HtmlParser_SysCommand ($hWnd, $Msg, $wParam, $lParam)
    #forceref $Msg, $wParam, $lParam
    If BitAND($wParam, 0xFFF0) = 0xF060 Then Exit
    Return $GUI_RUNDEFMSG
EndFunc

#endregion << Internals

#region >> WinAPIEx

Func __WinAPI_CreateGUID()

    Local $tGUID, $Ret

    $tGUID = DllStructCreate($tagGUID)
    $Ret = DllCall('ole32.dll', 'uint', 'CoCreateGuid', 'ptr', DllStructGetPtr($tGUID))
    If @error Then
        Return SetError(1, 0, '')
    Else
        If $Ret[0] Then
            Return SetError(1, $Ret[0], 0)
        EndIf
    EndIf
    $Ret = DllCall('ole32.dll', 'int', 'StringFromGUID2', 'ptr', DllStructGetPtr($tGUID), 'wstr', '', 'int', 39)
    If (@error) Or (Not $Ret[0]) Then
        Return SetError(1, 0, '')
    EndIf
    Return $Ret[2]
EndFunc   ;==>__WinAPI_CreateGUID

Func _IE_SetSessionProxy ($sProxyAddress, $sBypassList="")
    Local $tProxyAddress = DllStructCreate("char[" & StringLen($sProxyAddress) + 1 & "]"), _
          $tBypassList = DllStructCreate("char[" & StringLen($sBypassList) + 1 & "]"), _
          $tINTERNET_PROXY_INFO = DllStructCreate($tagINTERNET_PROXY_INFO)
          
    DllStructSetData($tINTERNET_PROXY_INFO, "dwAccessType", 0x3)
    DllStructSetData($tProxyAddress, 1, $sProxyAddress)
    DllStructSetData($tINTERNET_PROXY_INFO, "lpszProxy", DllStructGetPtr($tProxyAddress))
    DllStructSetData($tBypassList, 1, $sBypassList)
    DllStructSetData($tINTERNET_PROXY_INFO, "lpszProxyBypass", DllStructGetPtr($tBypassList))
    
    Local $aRet = DllCall("urlmon.dll", "INT", "UrlMkSetSessionOption", _
                            "uint", 0x26, _
                            "ptr", DllStructGetPtr($tINTERNET_PROXY_INFO), _
                            "int", DllStructGetSize($tINTERNET_PROXY_INFO), _
                            "int", 0 )
    If @error OR $aRet[0] Then Return SetError(1, @error, 0)
    
    Return 1
EndFunc

#endregion << WinAPIEx

#region >> Debugging

Func _Alert ($msg, $fDialog=1)
  If $fDialog Then
    MsgBox(0, @ScriptName, $msg)
  Else
    TrayTip(@ScriptName, $msg, 5000)
  EndIf
EndFunc

Func _Critical ($ret, $rel=0, $msg="Fatal Error", $err=@error, $ext=@extended, $ln = @ScriptLineNumber)
  If $err Then
    $ln += $rel
    Local $LastError = _WinAPI_GetLastError(), _
          $LastErrorMsg = _WinAPI_GetLastErrorMessage(), _
          $LastErrorHex = Hex($LastError)
    $LastErrorHex = "0x" & StringMid($LastErrorHex, StringInStr($LastErrorHex, "0", 1, -1)+1)
    $msg &= @CRLF & "at line " & $ln & @CRLF & @CRLF & "AutoIt Error: " & $err & " (0x" & Hex($err)  & ") Extended: " & $ext
    If $LastError Then $msg &= @CRLF & "WinAPI Error: " & $LastError & " (" & $LastErrorHex & ")" & @CRLF & $LastErrorMsg
    ClipPut($msg)
    MsgBox(270352, "Fatal Error - " & @ScriptName, $msg)
    Exit
  EndIf
  Return $ret
EndFunc

#endregion << Debugging

; ==============================================================================

Func _HtmlParser_Test ()
    $_HtmlParser_Debug = True
    
    _Critical( _HtmlParser_Startup() )
    
    ; warming up
    Local $doc = _HtmlParser_GetDocument()
    $doc.write("Hello AutoIt World")
    _Alert("now for real")
    
    _HtmlParser_LoadScript("https://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js")
    ; Or:
    ; _HtmlParser_LoadScript(@ScriptDir & "\jquery.min.js")
    Local $doc = _HtmlParser_LoadUrl("http://www.autoitscript.com", True)
    Local $pages = _Critical( _HtmlParser_Exec('var p=[];$("p").each(function(index) { p.push("[" + (index+1) + "] " + $(this).text());});return p') )
    Local $s = ""
    For $i = 0 To $pages.length-1
        $s &= $pages.get($i) & @CRLF
    Next
    _Alert($s)
    
    Local $saved = $doc.body.parentElement.outerHTML
    Local $divs = _HtmlParser_Exec('return $(".featitem.clearfix")')
    Local $div, $features = ""
    For $i = 0 To $divs.length-1
        $div = $divs.get($i)
        $features &= $div.outerHTML
    Next
    $doc.body.innerHTML = $features
    _Alert("A lot faster parsing in javascript though")
    
    _HtmlParser_LoadHtml($saved, True)
    Local $features = _HtmlParser_Exec('var s=""; $(".featitem.clearfix").each(function(){ s += this.outerHTML }); document.body.innerHTML=s; return s')  
    _Alert($features)
    
EndFunc
If @ScriptName = $_HtmlParser_ScriptName Then _HtmlParser_Test()

Edit: added $fRemoveScriptTags option for _HtmlParser_LoadUrl and _HtmlParser_LoadHtml

Edited by eimhym

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

Heres another, IE-less, example. Needs libtidy (attached) to clean pages and pass the well-formed html into MSXML ActiveX control (see MSDN documentation here).

pros: 1) lightweight, lighting fast. 2) XPath available by default. 3) Script tags won't executed, css or images won't loaded. 4) No problem in deleting SCRIPT tag.

cons: 1) does fail with some pages, always check @error after calling _HXmlParser_LoadUrl or _HXmlParser_LoadHtml. 2) libtidy crash on HTML5 pages, you have to reload the dll. 3) Doesn't handle html tags within textarea correctly, suggestion for workaround expected. 4) Can't use JS framework.

The sample code do the same as HtmlParser, for comparison.

#include-once
#include "libtidy.au3"

; Opt("MustDeclareVars", 1)

Global Const $_HXmlParser_ScriptName = "HXmlParser.au3"
Global $_HXmlParser_DOM = 0

Func _HXmlParser_Startup ( $sConfFilename="tidy-xml-settings.cfg" )
    _LibTidy_Startup()
    If @error Then Return SetError(@error, @extended, 0)
    _LibTidy_LoadConfig($sConfFilename)
    If @error Then
        _LibTidy_Shutdown()
        Return SetError(@error, @extended, 0)
    EndIf
    $_HXmlParser_DOM = ObjCreate("MSXML2.DOMDocument")
    OnAutoItExitRegister("__HXmlParser_Shutdown")
    $_HXmlParser_DOM.validateOnParse = False;
    $_HXmlParser_DOM.resolveExternals = False;
    Return 1
EndFunc

Func __HXmlParser_Shutdown ()
    $_HXmlParser_DOM = 0
    OnAutoItExitUnRegister("__HXmlParser_Shutdown")
EndFunc

Func _HXmlParser_GetErrorString ()
    If IsObj($_HXmlParser_DOM.parseError) AND $_HXmlParser_DOM.parseError.errorCode Then
        Return "Error loading page " & _
                " (" & Hex($_HXmlParser_DOM.parseError.errorCode) & _
                ") at line: " & $_HXmlParser_DOM.parseError.line & _
                ", position: " & $_HXmlParser_DOM.parseError.linepos & _
                ", reason: " & $_HXmlParser_DOM.parseError.reason
    EndIf
    Return 0
EndFunc

Func _HXmlParser_LoadHtml ( $sHtml )
    If NOT $sHtml Then Return SetError(4, 0, 0)
    _LibTidy_LoadString( $sHtml )
    If @error Then Return SetError(@error, @extended, 0)
    $sHtml = _LibTidy_CleanAndRepair()
    If @error OR NOT $sHtml Then Return SetError(5, @error, 0)
    
    $sHtml = StringRegExp(StringMid($sHtml, StringInStr($sHtml, "<html")), "(?s)^<html[^>]*>[^<]*(<.*)", 1)
    If @error Then Return SetError(6, @error, 0)

    $sHtml = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd" [' _
        & '<!ENTITY nbsp " "><!ENTITY iexcl "¡"><!ENTITY cent "¢"><!ENTITY pound "£"><!ENTITY curren "¤"><!ENTITY yen "¥"><!ENTITY brvbar "¦"><!ENTITY sect "§"><!ENTITY uml "¨"><!ENTITY copy "©"><!ENTITY ordf "ª"><!ENTITY laquo "«"><!ENTITY not "¬"><!ENTITY shy "­"><!ENTITY reg "®"><!ENTITY macr "¯"><!ENTITY deg "°"><!ENTITY plusmn "±"><!ENTITY sup2 "²"><!ENTITY sup3 "³"><!ENTITY acute "´"><!ENTITY micro "µ"><!ENTITY para "¶"><!ENTITY middot "·"><!ENTITY cedil "¸"><!ENTITY sup1 "¹"><!ENTITY ordm "º"><!ENTITY raquo "»"><!ENTITY frac14 "¼"><!ENTITY frac12 "½"><!ENTITY frac34 "¾"><!ENTITY iquest "¿"><!ENTITY times "×"><!ENTITY divide "÷">' _
        & '<!ENTITY Agrave "À"><!ENTITY Aacute "Á"><!ENTITY Acirc "Â"><!ENTITY Atilde "Ã"><!ENTITY Auml "Ä"><!ENTITY Aring "Å"><!ENTITY AElig "Æ"><!ENTITY Ccedil "Ç"><!ENTITY Egrave "È"><!ENTITY Eacute "É"><!ENTITY Ecirc "Ê"><!ENTITY Euml "Ë"><!ENTITY Igrave "Ì"><!ENTITY Iacute "Í"><!ENTITY Icirc "Î"><!ENTITY Iuml "Ï"><!ENTITY ETH "Ð"><!ENTITY Ntilde "Ñ"><!ENTITY Ograve "Ò"><!ENTITY Oacute "Ó"><!ENTITY Ocirc "Ô"><!ENTITY Otilde "Õ"><!ENTITY Ouml "Ö"><!ENTITY Oslash "Ø"><!ENTITY Ugrave "Ù"><!ENTITY Uacute "Ú"><!ENTITY Ucirc "Û"><!ENTITY Uuml "Ü"><!ENTITY Yacute "Ý"><!ENTITY THORN "Þ"><!ENTITY szlig "ß"><!ENTITY agrave "à"><!ENTITY aacute "á"><!ENTITY acirc "â"><!ENTITY atilde "ã"><!ENTITY auml "ä"><!ENTITY aring "å"><!ENTITY aelig "æ"><!ENTITY ccedil "ç"><!ENTITY egrave "è"><!ENTITY eacute "é"><!ENTITY ecirc "ê"><!ENTITY euml "ë"><!ENTITY igrave "ì"><!ENTITY iacute "í"><!ENTITY icirc "î"><!ENTITY iuml "ï"><!ENTITY eth "ð"><!ENTITY ntilde "ñ"><!ENTITY ograve "ò"><!ENTITY oacute "ó"><!ENTITY ocirc "ô"><!ENTITY otilde "õ"><!ENTITY ouml "ö"><!ENTITY oslash "ø"><!ENTITY ugrave "ù"><!ENTITY uacute "ú"><!ENTITY ucirc "û"><!ENTITY uuml "ü"><!ENTITY yacute "ý"><!ENTITY thorn "þ"><!ENTITY yuml "ÿ">' _
        & ']>' & @CRLF _
        & '<html>' & @CRLF _
        & $sHtml[0]

    $_HXmlParser_DOM.loadXML($sHtml);
    If IsObj($_HXmlParser_DOM.parseError) AND $_HXmlParser_DOM.parseError.errorCode Then
        SetError(7, $_HXmlParser_DOM.parseError.errorCode, 0)
    EndIf
    
    $_HXmlParser_DOM.setProperty("SelectionLanguage", "XPath");
    
    Return $_HXmlParser_DOM
EndFunc

Func _HXmlParser_LoadUrl ( $sUrl )
    Local $http = ObjCreate("winhttp.winhttprequest.5.1")
    $http.Open("GET", $sUrl)
    $http.Send()
    Local $ret = _HXmlParser_LoadHtml($http.Responsetext)
    If @error Then Return SetError(@error, @extended, $ret)
    Return $ret
EndFunc


#region >> Debugging

Func _Alert ($msg, $fDialog=1, $err=@error, $ext=@extended, $ln = @ScriptLineNumber)
  If $fDialog Then
    MsgBox(0, @ScriptName, $msg)
  Else
    TrayTip(@ScriptName, $msg, 5000)
  EndIf
  If $err Then Return SetError($err, $ext, $ln)
  Return 0
EndFunc

Func _Critical ($ret, $rel=0, $msg="Fatal Error", $err=@error, $ext=@extended, $ln = @ScriptLineNumber)
  If $err Then
    $ln += $rel
    Local $LastError = _WinAPI_GetLastError(), _
          $LastErrorMsg = _WinAPI_GetLastErrorMessage(), _
          $LastErrorHex = Hex($LastError)
    $LastErrorHex = "0x" & StringMid($LastErrorHex, StringInStr($LastErrorHex, "0", 1, -1)+1)
    $msg &= @CRLF & "at line " & $ln & @CRLF & @CRLF & "AutoIt Error: " & $err & " (0x" & Hex($err)  & ") Extended: " & $ext
    If $LastError Then $msg &= @CRLF & "WinAPI Error: " & $LastError & " (" & $LastErrorHex & ")" & @CRLF & $LastErrorMsg
    $msg &= @CRLF & @CRLF & _HXmlParser_GetErrorString()
    ClipPut($msg)
    MsgBox(270352, "Fatal Error - " & @ScriptName, $msg)
    Exit
  EndIf
  Return $ret
EndFunc

#endregion << Debugging

Func _HXmlParser_Test ()
    
    _Critical( _HXmlParser_Startup() )
    Local $dom = _Critical( _HXmlParser_LoadUrl("http://www.AutoItScript.com") )
   
    _Alert("Removing all SCRIPT tags")
    Local $begin = TimerInit()
    Local $nodes = $dom.selectNodes("//script")
    If $nodes.length Then
        For $i = 0 To $nodes.length - 1
            Local $node = $nodes.item($i)
            $node.parentNode.removeChild($node)
        Next
    EndIf
    _Alert("Done in " & TimerDiff($begin) & " ms")
    
    _Alert("Collecting all P tags")
    Local $count = 1, $s = ""
    $begin = TimerInit()
    $nodes = $dom.selectNodes("//p")
    If $nodes.length Then
        For $i = 0 To $nodes.length - 1
            Local $node = $nodes.item($i)
            $s &= "[" & $count & "] " & $node.text & @CRLF
            $count += 1
        Next
    EndIf
    _Alert("Done in " & TimerDiff($begin) & " ms")
    _Alert($s)

    $s = ""
    _Alert("Collecting all feature DIVs")
    $begin = TimerInit()
    $nodes = $dom.selectNodes("//div[contains(@class, 'featitem')]")
    If $nodes.length Then
        For $i = 0 To $nodes.length - 1
            Local $node = $nodes.item($i)
            $s &= $node.xml & @CRLF
        Next
    EndIf
    _Alert("Done in " & TimerDiff($begin) & " ms")
    _Alert($s)
    
    ClipPut($dom.xml)
    _Alert("HTML content is in clipboard")
    
EndFunc
If @ScriptName = $_HXmlParser_ScriptName Then _HXmlParser_Test()

libtidy.7z

Edited by eimhym

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

I get an "libtidy.au3(106,68) : ERROR: $tidyLoadConfig: undeclared global variable."

What am I doing wrong here ?

Edited by level20peon

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now

  • Similar Content

    • FrancescoDiMuro
      By FrancescoDiMuro
      Good evening everyone
      Before all, I want to say that I'm doing this script to see how _IE* functions work, and see if my studs can hack a quiz I'm working on.
      I want to clarify that I'm not automating any game, bypassing any CAPTCHAs, or anything that could damage anyone.
      I was trying to autofill a form, based on which question is displayed.
      The question is always stored in here:
      <header> <h1><span class="questionid">1. </span>Here goes the question</h1> </header> And answers are stored in here:
      <ul class="answers"> <li><label><span><input id="answer_0" name="answer[]" type="radio" value="0">Answer 1</span></label></li> <li><label><span><input id="answer_1" name="answer[]" type="radio" value="1">Answer 2</span></label></li> <li><label><span><input id="answer_2" name="answer[]" type="radio" value="2">Anwser 3</span></label></li> <li><label><span><input id="answer_3" name="answer[]" type="radio" value="3">Answer 4</span></label></li> </ul></fieldset></form></div> And, there are 15 questions like this.
      How can automatically fill my form?
      Thanks in advance
      Francesco
    • houser747
      By houser747
      I have previously used _IEFormElementGetObjByName and _IEFormElementSetValue to enter text into a search box on a form and then submit the form.
      I am now trying to enter text into a search box which is not part of a form. 
      Here is the HTML from the website that i'm trying to enter the data on and then submit the search.
      <div class="row">
          <div class="form-group col-xs-12">
              <span id="FullWidthWithSubmenuContent_FullWidthContent_MainContent_AircraftRegistry_lblSearchText" for="input-search">Registreringsbeteckning</span>
              <div class="input-group col-xs-12">
                  <span id="FullWidthWithSubmenuContent_FullWidthContent_MainContent_AircraftRegistry_preSearchText" class="input-group-addon">SE -</span>
                  <input name="ctl00$FullWidthWithSubmenuContent$FullWidthContent$MainContent$AircraftRegistry$txtSearchText" type="text" value="DTH" id="FullWidthWithSubmenuContent_FullWidthContent_MainContent_AircraftRegistry_txtSearchText" class="form-control" />
              </div>
          </div>
      </div>
      <div class="row">
          <div class="form-group col-xs-12">
              <label class="sr-only" for="">Sök</label>
              <input type="submit" name="ctl00$FullWidthWithSubmenuContent$FullWidthContent$MainContent$AircraftRegistry$btnSearch" value="Sök" id="FullWidthWithSubmenuContent_FullWidthContent_MainContent_AircraftRegistry_btnSearch" class="btn btn-primary ladda-button" data-style="expand-right" />
          </div>
      </div>
      Many thanks in advance
      cheers
      Roger
    • FrancescoDiMuro
      By FrancescoDiMuro
      Good morning everyone
      I'm working on a little project, and, I encountered a little strange error when I try to add some data to an array...
      The code I wrote is this:
      Func _WMI_Get_Win32_TemperatureProbe($blnCanUseWMI, $blnCustomArrayDisplay = False, $blnReturnEU = False) If $blnCanUseWMI Then Local $objWMI_QueryResult = $objWMI.ExecQuery("SELECT * FROM Win32_TemperatureProbe", "WQL", 32) If @error Then __ConsoleWrite("Error executing the query on Win32_TemperatureProbe class.", @error, 9999) Else Local $arrWin32_TemperatureProbe[1][3] _ArrayDelete($arrWin32_TemperatureProbe, 0) If @error Then __ConsoleWrite("Error deleting the 0st element $arrWin32_TemperatureProbe array.", @error, 9999) Else Local $objWMI_Variable = Null, $strWMI_QueryResult = "", $i = 0 For $objWMI_Variable In $objWMI_QueryResult $strWMI_QueryResult &= "QUERY RESULT" & "|# " & $i & "|/" & @CRLF & _ "Accuracy" & "|" & $objWMI_Variable.Accuracy & "|" & "[sint32]" & @CRLF & _ "Availability" & "|" & $objWMI_Variable.Availability & "|" & "[uint16]" & @CRLF & _ "Caption" & "|" & $objWMI_Variable.Caption & "|" & "[string]" & @CRLF & _ "ConfigManagerErrorCode" & "|" & $objWMI_Variable.ConfigManagerErrorCode & "|" & "[uint32]" & @CRLF & _ "ConfigManagerUserConfig" & "|" & $objWMI_Variable.ConfigManagerUserConfig & "|" & "[boolean]" & @CRLF & _ "CreationClassName" & "|" & $objWMI_Variable.CreationClassName & "|" & "[string]" & @CRLF & _ "CurrentReading" & "|" & $objWMI_Variable.CurrentReading & "|" & "[sint32]" & @CRLF & _ "Description" & "|" & $objWMI_Variable.Description & "|" & "[string]" & @CRLF & _ "DeviceID" & "|" & $objWMI_Variable.DeviceID & "|" & "[string]" & @CRLF & _ "ErrorCleared" & "|" & $objWMI_Variable.ErrorCleared & "|" & "[boolean]" & @CRLF & _ "ErrorDescription" & "|" & $objWMI_Variable.ErrorDescription & "|" & "[string]" & @CRLF & _ "InstallDate" & "|" & $objWMI_Variable.InstallDate & "|" & "[datetime]" & @CRLF & _ "IsLinear" & "|" & $objWMI_Variable.IsLinear & "|" & "[boolean]" & @CRLF & _ "LastErrorCode" & "|" & $objWMI_Variable.LastErrorCode & "|" & "[uint32]" & @CRLF & _ "LowerThresholdCritical" & "|" & $objWMI_Variable.LowerThresholdCritical & "|" & "[sint32]" & @CRLF & _ "LowerThresholdFatal" & "|" & $objWMI_Variable.LowerThresholdFatal & "|" & "[sint32]" & @CRLF & _ "LowerThresholdNonCritical" & "|" & $objWMI_Variable.LowerThresholdNonCritical & "|" & "[sint32]" & @CRLF & _ "MaxReadable" & "|" & $objWMI_Variable.MaxReadable & "|" & "[sint32]" & @CRLF & _ "MinReadable" & "|" & $objWMI_Variable.MinReadable & "|" & "[sint32]" & @CRLF & _ "Name" & "|" & $objWMI_Variable.Name & "|" & "[string]" & @CRLF & _ "NominalReading" & "|" & $objWMI_Variable.NominalReading & "|" & "[sint32]" & @CRLF & _ "NormalMax" & "|" & $objWMI_Variable.NormalMax & "|" & "[sint32]" & @CRLF & _ "NormalMin" & "|" & $objWMI_Variable.NormalMin & "|" & "[sint32]" & @CRLF & _ "PNPDeviceID" & "|" & $objWMI_Variable.PNPDeviceID & "|" & "[string]" & @CRLF & _ "PowerManagementCapabilities" & "|" & $objWMI_Variable.PowerManagementCapabilities & "|" & "[uint16]" & @CRLF & _ "PowerManagementSupported" & "|" & $objWMI_Variable.PowerManagementSupported & "|" & "[boolean]" & @CRLF & _ "Resolution" & "|" & $objWMI_Variable.Resolution & "|" & "[uint32]" & @CRLF & _ "Status" & "|" & $objWMI_Variable.Status & "|" & "[string]" & @CRLF & _ "StatusInfo" & "|" & $objWMI_Variable.StatusInfo & "|" & "[uint16]" & @CRLF & _ "SystemCreationClassName" & "|" & $objWMI_Variable.SystemCreationClassName & "|" & "[string]" & @CRLF & _ "SystemName" & "|" & $objWMI_Variable.SystemName & "|" & "[string]" & @CRLF & _ "Tolerance" & "|" & $objWMI_Variable.Tolerance & "|" & "[sint32]" & @CRLF & _ "UpperThresholdCritical" & "|" & $objWMI_Variable.UpperThresholdCritical & "|" & "[sint32]" & @CRLF & _ "UpperThresholdFatal" & "|" & $objWMI_Variable.UpperThresholdFatal & "|" & "[sint32]" & @CRLF & _ "UpperThresholdNonCritical" & "|" & $objWMI_Variable.UpperThresholdNonCritical & "|" & "[sint32]" $i+=1 Next ConsoleWrite($strWMI_QueryResult & @CRLF) _ArrayAdd($arrWin32_TemperatureProbe, $strWMI_QueryResult) ; I'll wait for an answer... See you later :) If @error Then __ConsoleWrite("Error inserting item #" & $i & " in the $arrWin32_TemperatureProbe array.", @error, 9999) Else If $blnCustomArrayDisplay Then _ArrayDisplay($arrWin32_TemperatureProbe, "Win32_TemperatureProbe:", "", 64 + 32 + 4, "|", "VARIABLE NAME|ACTUAL VALUE|ENGINEERING UNIT", 350, 0xD3D3D3) If @error Then __ConsoleWrite("Error displaying the $arrWin32_TemperatureProbe array.", @error, 9999) EndIf EndIf If $blnReturnEU = False Then _ArrayColDelete($arrWin32_TemperatureProbe, 2) If @error Then __ConsoleWrite("Error deleting the column #2 of $arrWin32_TemperatureProbe array.") EndIf EndIf If IsArray($arrWin32_TemperatureProbe) Then Return $arrWin32_TemperatureProbe Else Return False EndIf EndIf EndIf EndIf EndIf EndFunc And I get this error ( undocumented in the Help File on _ArrayAdd() function ):
      [15/09/2017 10:24:46] : Error inserting item #4 in the $arrWin32_TemperatureProbe array. > Error: 0 Adding a ConsoleWrite() before the _ArrayAdd() function, I can see the content of $strWMI_QueryResult, and, here it is:
      QUERY RESULT|# 0|/
      Accuracy|32768|[sint32]
      Availability||[uint16]
      Caption|Sensore numerico|[string]
      ConfigManagerErrorCode||[uint32]
      ConfigManagerUserConfig||[boolean]
      CreationClassName|Win32_TemperatureProbe|[string]
      CurrentReading||[sint32]
      Description|CPU Thermal Probe|[string]
      DeviceID|root\cimv2 0|[string]
      ErrorCleared||[boolean]
      ErrorDescription||[string]
      InstallDate||[datetime]
      IsLinear||[boolean]
      LastErrorCode||[uint32]
      LowerThresholdCritical||[sint32]
      LowerThresholdFatal||[sint32]
      LowerThresholdNonCritical||[sint32]
      MaxReadable|1270|[sint32]
      MinReadable|64266|[sint32]
      Name|Sensore numerico|[string]
      NominalReading||[sint32]
      NormalMax||[sint32]
      NormalMin||[sint32]
      PNPDeviceID||[string]
      PowerManagementCapabilities||[uint16]
      PowerManagementSupported||[boolean]
      Resolution|1000|[uint32]
      Status|Unknown|[string]
      StatusInfo||[uint16]
      SystemCreationClassName|Win32_ComputerSystem|[string]
      SystemName|DESKTOP-25LFPVU|[string]
      Tolerance|32768|[sint32]
      UpperThresholdCritical||[sint32]
      UpperThresholdFatal||[sint32]
      UpperThresholdNonCritical||[sint32]QUERY RESULT|# 1|/
      Accuracy|32768|[sint32]
      Availability||[uint16]
      Caption|Sensore numerico|[string]
      ConfigManagerErrorCode||[uint32]
      ConfigManagerUserConfig||[boolean]
      CreationClassName|Win32_TemperatureProbe|[string]
      CurrentReading||[sint32]
      Description|True Ambient Thermal Probe|[string]
      DeviceID|root\cimv2 1|[string]
      ErrorCleared||[boolean]
      ErrorDescription||[string]
      InstallDate||[datetime]
      IsLinear||[boolean]
      LastErrorCode||[uint32]
      LowerThresholdCritical||[sint32]
      LowerThresholdFatal||[sint32]
      LowerThresholdNonCritical||[sint32]
      MaxReadable|1270|[sint32]
      MinReadable|64266|[sint32]
      Name|Sensore numerico|[string]
      NominalReading||[sint32]
      NormalMax||[sint32]
      NormalMin||[sint32]
      PNPDeviceID||[string]
      PowerManagementCapabilities||[uint16]
      PowerManagementSupported||[boolean]
      Resolution|1000|[uint32]
      Status|Unknown|[string]
      StatusInfo||[uint16]
      SystemCreationClassName|Win32_ComputerSystem|[string]
      SystemName|DESKTOP-25LFPVU|[string]
      Tolerance|32768|[sint32]
      UpperThresholdCritical||[sint32]
      UpperThresholdFatal||[sint32]
      UpperThresholdNonCritical||[sint32]QUERY RESULT|# 2|/
      Accuracy|32768|[sint32]
      Availability||[uint16]
      Caption|Sensore numerico|[string]
      ConfigManagerErrorCode||[uint32]
      ConfigManagerUserConfig||[boolean]
      CreationClassName|Win32_TemperatureProbe|[string]
      CurrentReading||[sint32]
      Description|Memory Module Thermal Probe|[string]
      DeviceID|root\cimv2 2|[string]
      ErrorCleared||[boolean]
      ErrorDescription||[string]
      InstallDate||[datetime]
      IsLinear||[boolean]
      LastErrorCode||[uint32]
      LowerThresholdCritical||[sint32]
      LowerThresholdFatal||[sint32]
      LowerThresholdNonCritical||[sint32]
      MaxReadable|1270|[sint32]
      MinReadable|64266|[sint32]
      Name|Sensore numerico|[string]
      NominalReading||[sint32]
      NormalMax||[sint32]
      NormalMin||[sint32]
      PNPDeviceID||[string]
      PowerManagementCapabilities||[uint16]
      PowerManagementSupported||[boolean]
      Resolution|1000|[uint32]
      Status|Unknown|[string]
      StatusInfo||[uint16]
      SystemCreationClassName|Win32_ComputerSystem|[string]
      SystemName|DESKTOP-25LFPVU|[string]
      Tolerance|32768|[sint32]
      UpperThresholdCritical||[sint32]
      UpperThresholdFatal||[sint32]
      UpperThresholdNonCritical||[sint32]QUERY RESULT|# 3|/
      Accuracy|32768|[sint32]
      Availability||[uint16]
      Caption|Sensore numerico|[string]
      ConfigManagerErrorCode||[uint32]
      ConfigManagerUserConfig||[boolean]
      CreationClassName|Win32_TemperatureProbe|[string]
      CurrentReading||[sint32]
      Description|Video Card Thermal Probe|[string]
      DeviceID|root\cimv2 3|[string]
      ErrorCleared||[boolean]
      ErrorDescription||[string]
      InstallDate||[datetime]
      IsLinear||[boolean]
      LastErrorCode||[uint32]
      LowerThresholdCritical||[sint32]
      LowerThresholdFatal||[sint32]
      LowerThresholdNonCritical||[sint32]
      MaxReadable|1270|[sint32]
      MinReadable|64266|[sint32]
      Name|Sensore numerico|[string]
      NominalReading||[sint32]
      NormalMax||[sint32]
      NormalMin||[sint32]
      PNPDeviceID||[string]
      PowerManagementCapabilities||[uint16]
      PowerManagementSupported||[boolean]
      Resolution|1000|[uint32]
      Status|Unknown|[string]
      StatusInfo||[uint16]
      SystemCreationClassName|Win32_ComputerSystem|[string]
      SystemName|DESKTOP-25LFPVU|[string]
      Tolerance|32768|[sint32]
      UpperThresholdCritical||[sint32]
      UpperThresholdFatal||[sint32]
      UpperThresholdNonCritical||[sint32]
       
      Could please anyone help me out? 
      Thanks in advance
      Francesco
    • TheDcoder
      By TheDcoder
      Hello, I recently opened a bug report without reading the Helpfile... My bad . After @Melba23's gentle reminder, I was curious about why it was like that.
      It is about SetError's behaviour. This is the example from the bug report:
      Example() If @error Then ConsoleWrite("Error" & @CRLF) Else ConsoleWrite("No Error" & @CRLF) EndIf Func Example() SetError(1) Sleep(1000) EndFunc What I tried to do is set Example's (my user defined function's) @error value to 1... but the value set by SetError is cleared after calling a function, I wonder why? Why should calling to an external function effect my function's @error which is set when my function returns.
      Setting the error of a UDF in advance by using SetError makes sense... but I cannot find a reason why calling a function should clear it? Please note that I am not talking about @error, I am talking about the @error set by my function when it ends/returns!
      I hope someone can enlighten me, thanks for the answers in advance!
      P.S I tried to explain my best but my English is not very good and I didn't feel like I did a good job explaining today, so please pardon any mistakes that I have made
    • pboom
      By pboom
      I am looking for a way to retrieve filtered messages from the ‘system debug channel.' also known as  ‘kernel-mode debug output.'

      AutoIt must do the capture in real time. The following AutoIt UDF almost does what is required but it only captures application level, or Win32 debug output.

      https://www.autoitscript.com/forum/topic/82889-capture-debug-information-udf/#comment-593268

      The utility DebugView by Sysinternals captures the information as required by turning on Capture Kernal and in my case using the Filter include:

      *Incoming connection*

      The use of DebugView to do this is covered in the following tech note;

      https://www.tacticalsoftware.com/support/tech-notes/logging-com-port-activity.htm

      https://technet.microsoft.com/en-us/sysinternals/debugview.aspx

      However to make to make the information from DebugView available to my AutoIt script required DebugView capture to a text file and then my AutoIt script monitor that file for changes. The use of DebugView to capture the system debug channel could be made to work, but it was less than reliable and difficult to get started. The startup wasn’t something that could be easily automated not even with AutoIt.

      If you understood what I am talking about and made it this far, I think an explanation of the application is in order. Lots of details here sorry trying to answer questions in advance.

      I support a large installation of General Electric MUSE application. MUSE is a Windows-based medical application that processes and archives ECGs (electrocardiograms) taken on dedicated hardware (ECG Carts). Several methods exist on the cart to get the ECG from the Carts to the MUSE system; they range from floppies (on old obsolete hardware), memory cards, RS232 serial ports, and hardwired network connections.

      In our installation, we choose not to use the vendor-supplied network solution due to a variety of reasons I won’t get into here.  Instead, we have designed our own connection solution.

      We use a wireless serial server mounted on the ECG carts connecting to a server running a Serial/IP COM Port Redirector. The ECG cart and MUSE application think they are talking to each other via an RS232 port and as far as they are concerned, they are. However, this RS232 cable happens to run through our province (think State) wide Health Care WAN.  The hardware and software used can be seen on these two sites;

      http://www.bb-elec.com/Products/Wireless-Cellular/AirborneM2M-802-11-a-b-g-n-Dual-Band-Wireless/AirborneM2M-Industrial-Dual-Band-Wi-Fi-Router-Brid.aspx

      https://www.tacticalsoftware.com/virtual-serial-port-redirector/serial-ip.htm

      This setup works well we have over 130 ECG carts connecting using this setup. However, the end users are not technical, and there is a lot that can go wrong with wireless connections. So we do get complaints, often after the fact, that the ECG cart would not connect. A log of what ECG carts connected and when would be very helpful.

      The Serial redirector software can be configured to log all activity to the Kernal-mode Debug output. The serial redirector software itself being kernel level software. For configuration of the Wireless modules, we have custom written software (written in AutoIt) that amongst other things can display relevant configuration information for a Wireless module given it’s IP address.

      By extracting messages like the ones below from the Kernal-Mode Debug channel;

      COM56 : ½ Incoming connection from 10.158.188.172:51562

      COM18 : ½ Incoming connection from 10.158.188.200:50896

      COM19 : ½ Incoming connection from 10.158.188.180:59074

      COM68 : ½ Incoming connection from 142.239.15.82:34322

      We can have the module configuration program retrieve the configuration. The retrieved configuration contains more information such as the module ID number and wireless signal strength. This information is then logged to a file which is later loaded into a database. We can then query the database for connections made by a particular module within a specified time frame. The results of these Queries help us determine if the module was connected or is having problems connecting. Problems are usually indicated by poor signal strength and frequent re-connecting.

      So what I am looking for is a way for our module configuration program (written in AutoIt) to retrieve filtered Kernal-Level debug messages directly without using the DebugView application.

      The Forum post listed at the first of this message includes the source code for the DLL. So if you are versed in these matters and Visual Studio this may be an easy task. I looked at what needed to be done but, I was way over my head. If you look up the price of the serial IP redirector software, you can see that there is some money in our project for such things however, I do have a spending limit for purchases such as this.