Jump to content
Decipher

HTML Parser UDF

Recommended Posts

Decipher

Hi,

I'm inviting all autoit forum members to contribute to a HTML parser udf. I going to attempt to replicate a python module called BeautifulSoup. It would be greatly appreciated if some senior Autoit programmers took interest in this topic. There is no template other than the module written in python located here and the documentation here.

I can't wait to see what this develops into. :robot:


Spoiler

censored.jpg

 

Share this post


Link to post
Share on other sites
jdelaney

I don't see the need, there is the _IE UDF, and with my link below, you can focus on any node(s) through an XPATH.


IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites
Decipher

I see the need for simplicity. Would you care to give an example of how to use the functions mentioned above for data extraction.


Spoiler

censored.jpg

 

Share this post


Link to post
Share on other sites
KaFu
Decipher

I stand corrected. There are templates for parsing HTML... and in UDF format. Thank You @ Jdelaney & KaFu.

If anyone else has UDF's or examples they'd like to share for anyone following path here that would be great. :)


Spoiler

censored.jpg

 

Share this post


Link to post
Share on other sites
eimhym

Finally found a way to "mute" IE -- making it unable to load external resources except the already cached ones -- makes it load page faster. Other IE instances won't be affected, only the one we used as html parser.

This simple wrapper functions:

_HtmlParser_Startup

_HtmlParser_GetDocument ; the IHtmlDocument ref

_HtmlParser_LoadHtml ; load plain text, optionally remove all script tags

_HtmlParser_LoadUrl

_HtmlParser_LoadScript ; to use jquery, xpath, etc.

_HtmlParser_ClearScript

_HtmlParser_Exec ; execute js and get return value, see sample

Save as "HtmlParser.au3" and run, the sample code is included. Clear IE cache first for the best result. Hope it helps :)

#include-once
#include <WinAPI.au3>
#include <WindowsConstants.au3>
#include <GuiConstantsEx.au3>
#include <IE.au3>

; Opt("MustDeclareVars", 1)

Global $_HtmlParser_Debug = False
Global $_HtmlParser_Script = ""

Global Const $_HtmlParser_ScriptName = "HtmlParser.au3"
Global Const $tagINTERNET_PROXY_INFO = "dword dwAccessType; ptr lpszProxy; ptr lpszProxyBypass";
Global Const $tagCOPYDATA = _
  "ULONG_PTR;" & _  ; dwData, The data to be passed to the receiving application
  "DWORD;" & _    ; cbData, The size, in bytes, of the data pointed to by the lpData member
  "PTR"          ; lpData, The data to be passed to the receiving application. This member can be NULL.

Func _HtmlParser_Startup ($iPort=843)
    If IsDeclared("_HtmlParser_IE") AND IsObj($_HtmlParser_IE) Then Return 1

    Global $_HtmlParser_Port = $iPort
    __HtmlParser_ParseCmdLine()
    
    Global $_HtmlParser_HWND = WinWait($_HtmlParser_GUID, "", 5)
    If NOT $_HtmlParser_HWND Then Return SetError(1, 0, 0) ; Daemon failed to start
    WinSetTitle($_HtmlParser_HWND, "", "")
    
    ; Attach IE
    Global $_HtmlParser_IE = _IEAttach(WinGetHandle($_HtmlParser_GUID), "embedded")
    If NOT IsObj(_HtmlParser_GetDocument()) Then Return SetError(2, 0, 0)
    
    OnAutoItExitRegister("__HtmlParser_Shutdown")
    Return 1
EndFunc

Func _HtmlParser_GetDocument ()
    If IsObj($_HtmlParser_IE) Then Return _IEDocGetObj($_HtmlParser_IE)
    Return SetError(1, 0, 0)
EndFunc

Func _HtmlParser_LoadHtml ( $sHtml, $fRemoveScriptTags=0 )
    _IENavigate($_HtmlParser_IE, "about:blank")
    Local $doc = _HtmlParser_GetDocument()
    If $fRemoveScriptTags Then $sHtml = StringRegExpReplace($sHtml, "<script\b[^<]*(?:(?!<\/script>)<[^<]*)*<\/script>", "")
    $doc.Write($sHtml & @CRLF & '<script language="javascript">' & @CRLF & $_HtmlParser_Script & @CRLF & 'Array.prototype.set=function(i,v){this[i]=v};Array.prototype.get=function(i){return this[i]};document.scripts[document.scripts.length-1].removeNode(false)</script>')
    Return $doc
EndFunc

Func _HtmlParser_LoadUrl ( $sUrl, $fRemoveScriptTags=0 )
    Local $http = ObjCreate("winhttp.winhttprequest.5.1")
    $http.Open("GET", $sUrl)
    $http.Send()
    Return _HtmlParser_LoadHtml($http.Responsetext, $fRemoveScriptTags)
EndFunc

Func _HtmlParser_LoadScript ( $sFilename )
    Local $script = ""
    If FileExists ($sFilename) Then
        $script = FileRead($sFilename)
    Else
        If StringInStr($sFilename, "http://") OR StringInStr($sFilename, "https://") OR StringInStr($sFilename, "ftp://") Then
            Local $http = ObjCreate("winhttp.winhttprequest.5.1")
            $http.Open("GET", $sFilename)
            $http.Send()
            $script = $http.Responsetext
        EndIf
    EndIf
    If $script Then
        $_HtmlParser_Script &= @CRLF & @CRLF & $script
        Return 1
    EndIf
    Return 0
EndFunc

Func _HtmlParser_ClearScript ( $sFilename )
    $_HtmlParser_Script = ""
EndFunc

Func _HtmlParser_Exec ( $sScript )
    Local $doc = _HtmlParser_GetDocument()
    If IsObj($doc) AND IsObj($doc.parentWindow) Then
        Local $window = $doc.parentWindow
        $window.execScript('window._HtmlParser_Result=(function(){' & $sScript & '})();', 'Javascript')
        If $window._HtmlParser_Result OR IsObj($window._HtmlParser_Result) Then
            Return $window._HtmlParser_Result
        Else
            Return SetError(2, 0, 0)
        EndIf
    EndIf
    Return SetError(1, 0, 0)
EndFunc

#region >> Internals

Func __HtmlParser_Shutdown ()
    Global $_HtmlParser_IE = 0
    If IsDeclared("_HtmlParser_HWND") Then WinClose($_HtmlParser_HWND)
    OnAutoItExitUnregister("__HtmlParser_Shutdown")
EndFunc

Func __HtmlParser_ParseCmdLine ()
    If @compiled Then
      Global Const $_HtmlParser_Exec = '"' & @ScriptFullPath & '"'
    Else
      Global Const $_HtmlParser_Exec = '"' & @AutoItExe & '" "' & @ScriptFullPath & '"'
    EndIf

    Local $cmd = StringInStr($CmdLineRaw, "/_HtmlParser_", 1), $val
    If $cmd > 0 Then
        #NoTrayIcon
        $cmd = StringRegExp(StringMid($CmdLineRaw, $cmd), '/(_HtmlParser_[a-zA-Z]+)\:"([^"]*)"', 3)
        For $i = 0 To UBound($cmd)-1 Step 2
            $val = $cmd[$i+1]
            If $val = ("" & Number($val)) Then $val = Number($val)
            Assign($cmd[$i], $val, 2)
        Next
        __HtmlParser_DaemonRun()
        Exit
    Else
        Global $_HtmlParser_GUID = __WinAPI_CreateGUID()
        __HtmlParser_Daemonstart()
    EndIf
EndFunc

Func __HtmlParser_Daemonstart ()
    Run ( $_HtmlParser_Exec & _
        ' /_HtmlParser_GUID:"' & $_HtmlParser_GUID & '"' & _
        ' /_HtmlParser_Debug:"' & (0 + $_HtmlParser_Debug) & '"' & _
        ' /_HtmlParser_Port:"' & $_HtmlParser_Port & '"' )
EndFunc

Func __HtmlParser_DaemonRun ()
    ; Initialize GUI
    Local $hGUI = GUICreate("", 500, 400, 10, 10, $WS_SIZEBOX, $WS_EX_TOOLWINDOW)
    Global $_HtmlParser_IE = _IECreateEmbedded()
    Local $hIE = GUICtrlCreateObj($_HtmlParser_IE, 0, 0, _WinAPI_GetClientWidth($hGUI), _WinAPI_GetClientHeight($hGUI))
    GUICtrlSetResizing($hIE, $GUI_DOCKAUTO)
    GUIRegisterMsg($WM_SYSCOMMAND, "__HtmlParser_SysCommand")
    If $_HtmlParser_Debug Then GUISetState()
    
    ; Initialize IE
    _IE_SetSessionProxy("127.0.0.1:" & $_HtmlParser_Port)
    _IENavigate($_HtmlParser_IE, "about:blank")
    _IEPropertySet($_HtmlParser_IE, "silent", True)
    WinSetTitle($hGUI, "", $_HtmlParser_GUID)
    
    Local $timeout = TimerInit()
    Do
        If TimerDiff($timeout) > 5000 Then Exit
    Until WinGetTitle($hGUI) <> $_HtmlParser_GUID
    If $_HtmlParser_Debug Then WinSetTitle($hGUI, "", "HtmlParser Debug Window")
    
    While 1
        Sleep(100)
    WEnd
EndFunc

Func __HtmlParser_SysCommand ($hWnd, $Msg, $wParam, $lParam)
    #forceref $Msg, $wParam, $lParam
    If BitAND($wParam, 0xFFF0) = 0xF060 Then Exit
    Return $GUI_RUNDEFMSG
EndFunc

#endregion << Internals

#region >> WinAPIEx

Func __WinAPI_CreateGUID()

    Local $tGUID, $Ret

    $tGUID = DllStructCreate($tagGUID)
    $Ret = DllCall('ole32.dll', 'uint', 'CoCreateGuid', 'ptr', DllStructGetPtr($tGUID))
    If @error Then
        Return SetError(1, 0, '')
    Else
        If $Ret[0] Then
            Return SetError(1, $Ret[0], 0)
        EndIf
    EndIf
    $Ret = DllCall('ole32.dll', 'int', 'StringFromGUID2', 'ptr', DllStructGetPtr($tGUID), 'wstr', '', 'int', 39)
    If (@error) Or (Not $Ret[0]) Then
        Return SetError(1, 0, '')
    EndIf
    Return $Ret[2]
EndFunc   ;==>__WinAPI_CreateGUID

Func _IE_SetSessionProxy ($sProxyAddress, $sBypassList="")
    Local $tProxyAddress = DllStructCreate("char[" & StringLen($sProxyAddress) + 1 & "]"), _
          $tBypassList = DllStructCreate("char[" & StringLen($sBypassList) + 1 & "]"), _
          $tINTERNET_PROXY_INFO = DllStructCreate($tagINTERNET_PROXY_INFO)
          
    DllStructSetData($tINTERNET_PROXY_INFO, "dwAccessType", 0x3)
    DllStructSetData($tProxyAddress, 1, $sProxyAddress)
    DllStructSetData($tINTERNET_PROXY_INFO, "lpszProxy", DllStructGetPtr($tProxyAddress))
    DllStructSetData($tBypassList, 1, $sBypassList)
    DllStructSetData($tINTERNET_PROXY_INFO, "lpszProxyBypass", DllStructGetPtr($tBypassList))
    
    Local $aRet = DllCall("urlmon.dll", "INT", "UrlMkSetSessionOption", _
                            "uint", 0x26, _
                            "ptr", DllStructGetPtr($tINTERNET_PROXY_INFO), _
                            "int", DllStructGetSize($tINTERNET_PROXY_INFO), _
                            "int", 0 )
    If @error OR $aRet[0] Then Return SetError(1, @error, 0)
    
    Return 1
EndFunc

#endregion << WinAPIEx

#region >> Debugging

Func _Alert ($msg, $fDialog=1)
  If $fDialog Then
    MsgBox(0, @ScriptName, $msg)
  Else
    TrayTip(@ScriptName, $msg, 5000)
  EndIf
EndFunc

Func _Critical ($ret, $rel=0, $msg="Fatal Error", $err=@error, $ext=@extended, $ln = @ScriptLineNumber)
  If $err Then
    $ln += $rel
    Local $LastError = _WinAPI_GetLastError(), _
          $LastErrorMsg = _WinAPI_GetLastErrorMessage(), _
          $LastErrorHex = Hex($LastError)
    $LastErrorHex = "0x" & StringMid($LastErrorHex, StringInStr($LastErrorHex, "0", 1, -1)+1)
    $msg &= @CRLF & "at line " & $ln & @CRLF & @CRLF & "AutoIt Error: " & $err & " (0x" & Hex($err)  & ") Extended: " & $ext
    If $LastError Then $msg &= @CRLF & "WinAPI Error: " & $LastError & " (" & $LastErrorHex & ")" & @CRLF & $LastErrorMsg
    ClipPut($msg)
    MsgBox(270352, "Fatal Error - " & @ScriptName, $msg)
    Exit
  EndIf
  Return $ret
EndFunc

#endregion << Debugging

; ==============================================================================

Func _HtmlParser_Test ()
    $_HtmlParser_Debug = True
    
    _Critical( _HtmlParser_Startup() )
    
    ; warming up
    Local $doc = _HtmlParser_GetDocument()
    $doc.write("Hello AutoIt World")
    _Alert("now for real")
    
    _HtmlParser_LoadScript("https://ajax.googleapis.com/ajax/libs/jquery/1.8.3/jquery.min.js")
    ; Or:
    ; _HtmlParser_LoadScript(@ScriptDir & "\jquery.min.js")
    Local $doc = _HtmlParser_LoadUrl("http://www.autoitscript.com", True)
    Local $pages = _Critical( _HtmlParser_Exec('var p=[];$("p").each(function(index) { p.push("[" + (index+1) + "] " + $(this).text());});return p') )
    Local $s = ""
    For $i = 0 To $pages.length-1
        $s &= $pages.get($i) & @CRLF
    Next
    _Alert($s)
    
    Local $saved = $doc.body.parentElement.outerHTML
    Local $divs = _HtmlParser_Exec('return $(".featitem.clearfix")')
    Local $div, $features = ""
    For $i = 0 To $divs.length-1
        $div = $divs.get($i)
        $features &= $div.outerHTML
    Next
    $doc.body.innerHTML = $features
    _Alert("A lot faster parsing in javascript though")
    
    _HtmlParser_LoadHtml($saved, True)
    Local $features = _HtmlParser_Exec('var s=""; $(".featitem.clearfix").each(function(){ s += this.outerHTML }); document.body.innerHTML=s; return s')  
    _Alert($features)
    
EndFunc
If @ScriptName = $_HtmlParser_ScriptName Then _HtmlParser_Test()

Edit: added $fRemoveScriptTags option for _HtmlParser_LoadUrl and _HtmlParser_LoadHtml

Edited by eimhym

Share this post


Link to post
Share on other sites
eimhym

Heres another, IE-less, example. Needs libtidy (attached) to clean pages and pass the well-formed html into MSXML ActiveX control (see MSDN documentation here).

pros: 1) lightweight, lighting fast. 2) XPath available by default. 3) Script tags won't executed, css or images won't loaded. 4) No problem in deleting SCRIPT tag.

cons: 1) does fail with some pages, always check @error after calling _HXmlParser_LoadUrl or _HXmlParser_LoadHtml. 2) libtidy crash on HTML5 pages, you have to reload the dll. 3) Doesn't handle html tags within textarea correctly, suggestion for workaround expected. 4) Can't use JS framework.

The sample code do the same as HtmlParser, for comparison.

#include-once
#include "libtidy.au3"

; Opt("MustDeclareVars", 1)

Global Const $_HXmlParser_ScriptName = "HXmlParser.au3"
Global $_HXmlParser_DOM = 0

Func _HXmlParser_Startup ( $sConfFilename="tidy-xml-settings.cfg" )
    _LibTidy_Startup()
    If @error Then Return SetError(@error, @extended, 0)
    _LibTidy_LoadConfig($sConfFilename)
    If @error Then
        _LibTidy_Shutdown()
        Return SetError(@error, @extended, 0)
    EndIf
    $_HXmlParser_DOM = ObjCreate("MSXML2.DOMDocument")
    OnAutoItExitRegister("__HXmlParser_Shutdown")
    $_HXmlParser_DOM.validateOnParse = False;
    $_HXmlParser_DOM.resolveExternals = False;
    Return 1
EndFunc

Func __HXmlParser_Shutdown ()
    $_HXmlParser_DOM = 0
    OnAutoItExitUnRegister("__HXmlParser_Shutdown")
EndFunc

Func _HXmlParser_GetErrorString ()
    If IsObj($_HXmlParser_DOM.parseError) AND $_HXmlParser_DOM.parseError.errorCode Then
        Return "Error loading page " & _
                " (" & Hex($_HXmlParser_DOM.parseError.errorCode) & _
                ") at line: " & $_HXmlParser_DOM.parseError.line & _
                ", position: " & $_HXmlParser_DOM.parseError.linepos & _
                ", reason: " & $_HXmlParser_DOM.parseError.reason
    EndIf
    Return 0
EndFunc

Func _HXmlParser_LoadHtml ( $sHtml )
    If NOT $sHtml Then Return SetError(4, 0, 0)
    _LibTidy_LoadString( $sHtml )
    If @error Then Return SetError(@error, @extended, 0)
    $sHtml = _LibTidy_CleanAndRepair()
    If @error OR NOT $sHtml Then Return SetError(5, @error, 0)
    
    $sHtml = StringRegExp(StringMid($sHtml, StringInStr($sHtml, "<html")), "(?s)^<html[^>]*>[^<]*(<.*)", 1)
    If @error Then Return SetError(6, @error, 0)

    $sHtml = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd" [' _
        & '<!ENTITY nbsp " "><!ENTITY iexcl "¡"><!ENTITY cent "¢"><!ENTITY pound "£"><!ENTITY curren "¤"><!ENTITY yen "¥"><!ENTITY brvbar "¦"><!ENTITY sect "§"><!ENTITY uml "¨"><!ENTITY copy "©"><!ENTITY ordf "ª"><!ENTITY laquo "«"><!ENTITY not "¬"><!ENTITY shy "­"><!ENTITY reg "®"><!ENTITY macr "¯"><!ENTITY deg "°"><!ENTITY plusmn "±"><!ENTITY sup2 "²"><!ENTITY sup3 "³"><!ENTITY acute "´"><!ENTITY micro "µ"><!ENTITY para "¶"><!ENTITY middot "·"><!ENTITY cedil "¸"><!ENTITY sup1 "¹"><!ENTITY ordm "º"><!ENTITY raquo "»"><!ENTITY frac14 "¼"><!ENTITY frac12 "½"><!ENTITY frac34 "¾"><!ENTITY iquest "¿"><!ENTITY times "×"><!ENTITY divide "÷">' _
        & '<!ENTITY Agrave "À"><!ENTITY Aacute "Á"><!ENTITY Acirc "Â"><!ENTITY Atilde "Ã"><!ENTITY Auml "Ä"><!ENTITY Aring "Å"><!ENTITY AElig "Æ"><!ENTITY Ccedil "Ç"><!ENTITY Egrave "È"><!ENTITY Eacute "É"><!ENTITY Ecirc "Ê"><!ENTITY Euml "Ë"><!ENTITY Igrave "Ì"><!ENTITY Iacute "Í"><!ENTITY Icirc "Î"><!ENTITY Iuml "Ï"><!ENTITY ETH "Ð"><!ENTITY Ntilde "Ñ"><!ENTITY Ograve "Ò"><!ENTITY Oacute "Ó"><!ENTITY Ocirc "Ô"><!ENTITY Otilde "Õ"><!ENTITY Ouml "Ö"><!ENTITY Oslash "Ø"><!ENTITY Ugrave "Ù"><!ENTITY Uacute "Ú"><!ENTITY Ucirc "Û"><!ENTITY Uuml "Ü"><!ENTITY Yacute "Ý"><!ENTITY THORN "Þ"><!ENTITY szlig "ß"><!ENTITY agrave "à"><!ENTITY aacute "á"><!ENTITY acirc "â"><!ENTITY atilde "ã"><!ENTITY auml "ä"><!ENTITY aring "å"><!ENTITY aelig "æ"><!ENTITY ccedil "ç"><!ENTITY egrave "è"><!ENTITY eacute "é"><!ENTITY ecirc "ê"><!ENTITY euml "ë"><!ENTITY igrave "ì"><!ENTITY iacute "í"><!ENTITY icirc "î"><!ENTITY iuml "ï"><!ENTITY eth "ð"><!ENTITY ntilde "ñ"><!ENTITY ograve "ò"><!ENTITY oacute "ó"><!ENTITY ocirc "ô"><!ENTITY otilde "õ"><!ENTITY ouml "ö"><!ENTITY oslash "ø"><!ENTITY ugrave "ù"><!ENTITY uacute "ú"><!ENTITY ucirc "û"><!ENTITY uuml "ü"><!ENTITY yacute "ý"><!ENTITY thorn "þ"><!ENTITY yuml "ÿ">' _
        & ']>' & @CRLF _
        & '<html>' & @CRLF _
        & $sHtml[0]

    $_HXmlParser_DOM.loadXML($sHtml);
    If IsObj($_HXmlParser_DOM.parseError) AND $_HXmlParser_DOM.parseError.errorCode Then
        SetError(7, $_HXmlParser_DOM.parseError.errorCode, 0)
    EndIf
    
    $_HXmlParser_DOM.setProperty("SelectionLanguage", "XPath");
    
    Return $_HXmlParser_DOM
EndFunc

Func _HXmlParser_LoadUrl ( $sUrl )
    Local $http = ObjCreate("winhttp.winhttprequest.5.1")
    $http.Open("GET", $sUrl)
    $http.Send()
    Local $ret = _HXmlParser_LoadHtml($http.Responsetext)
    If @error Then Return SetError(@error, @extended, $ret)
    Return $ret
EndFunc


#region >> Debugging

Func _Alert ($msg, $fDialog=1, $err=@error, $ext=@extended, $ln = @ScriptLineNumber)
  If $fDialog Then
    MsgBox(0, @ScriptName, $msg)
  Else
    TrayTip(@ScriptName, $msg, 5000)
  EndIf
  If $err Then Return SetError($err, $ext, $ln)
  Return 0
EndFunc

Func _Critical ($ret, $rel=0, $msg="Fatal Error", $err=@error, $ext=@extended, $ln = @ScriptLineNumber)
  If $err Then
    $ln += $rel
    Local $LastError = _WinAPI_GetLastError(), _
          $LastErrorMsg = _WinAPI_GetLastErrorMessage(), _
          $LastErrorHex = Hex($LastError)
    $LastErrorHex = "0x" & StringMid($LastErrorHex, StringInStr($LastErrorHex, "0", 1, -1)+1)
    $msg &= @CRLF & "at line " & $ln & @CRLF & @CRLF & "AutoIt Error: " & $err & " (0x" & Hex($err)  & ") Extended: " & $ext
    If $LastError Then $msg &= @CRLF & "WinAPI Error: " & $LastError & " (" & $LastErrorHex & ")" & @CRLF & $LastErrorMsg
    $msg &= @CRLF & @CRLF & _HXmlParser_GetErrorString()
    ClipPut($msg)
    MsgBox(270352, "Fatal Error - " & @ScriptName, $msg)
    Exit
  EndIf
  Return $ret
EndFunc

#endregion << Debugging

Func _HXmlParser_Test ()
    
    _Critical( _HXmlParser_Startup() )
    Local $dom = _Critical( _HXmlParser_LoadUrl("http://www.AutoItScript.com") )
   
    _Alert("Removing all SCRIPT tags")
    Local $begin = TimerInit()
    Local $nodes = $dom.selectNodes("//script")
    If $nodes.length Then
        For $i = 0 To $nodes.length - 1
            Local $node = $nodes.item($i)
            $node.parentNode.removeChild($node)
        Next
    EndIf
    _Alert("Done in " & TimerDiff($begin) & " ms")
    
    _Alert("Collecting all P tags")
    Local $count = 1, $s = ""
    $begin = TimerInit()
    $nodes = $dom.selectNodes("//p")
    If $nodes.length Then
        For $i = 0 To $nodes.length - 1
            Local $node = $nodes.item($i)
            $s &= "[" & $count & "] " & $node.text & @CRLF
            $count += 1
        Next
    EndIf
    _Alert("Done in " & TimerDiff($begin) & " ms")
    _Alert($s)

    $s = ""
    _Alert("Collecting all feature DIVs")
    $begin = TimerInit()
    $nodes = $dom.selectNodes("//div[contains(@class, 'featitem')]")
    If $nodes.length Then
        For $i = 0 To $nodes.length - 1
            Local $node = $nodes.item($i)
            $s &= $node.xml & @CRLF
        Next
    EndIf
    _Alert("Done in " & TimerDiff($begin) & " ms")
    _Alert($s)
    
    ClipPut($dom.xml)
    _Alert("HTML content is in clipboard")
    
EndFunc
If @ScriptName = $_HXmlParser_ScriptName Then _HXmlParser_Test()

libtidy.7z

Edited by eimhym

Share this post


Link to post
Share on other sites
level20peon

I get an "libtidy.au3(106,68) : ERROR: $tidyLoadConfig: undeclared global variable."

What am I doing wrong here ?

Edited by level20peon

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Similar Content

    • kcvinu
      By kcvinu
      Hi all,
      I am playing with _GUICtrlButton_Create function. How can i change this button's (or the entire form's) font ?. The in-built GUICtrlSetFont function is not working even when i convert the control handle to control ID with _WinAPI_GetDlgCtrlID ( ) function.  Do i need to use CreateFont api finction and send WM_SETFONT message ? Or is there any other easy and safe ways to do this ?. Thanks in advance.
      Note : This window is created by CreateWindowEx function, not by GUICreate function. 
    • okolaris
      By okolaris
      Hey everyone,
      I thought I might share my little Language UDF plus the more powerful SciTE Tool to ship Strings from SciTE into the xml file. While I haven't had the time to fully adapt the small UDF to one of the big XML-UDFs the so called "Language Transmitter" that basically writes the XML file for you runs mostly on "XML DOM wrapper (COM)" by eltorro. The Transmitter should work with other XML-Language-UDFs depending on their encoding.
       
      First let's start with the UDF, there are two functions of interest: _LangInit($sFilePath) and s($sString) As you will have guessed, _LangInit($sFilePath) is called once to initialize the UDF and s($sString) is used to receive the string to your key. Plain and simple.
      Now to the actual "new" part, the Language Transmitter. It basically allows you to transfer a selected String from SciTE into a xml file. While doing so it will scan for AutoIt variables, macros etc. and parse the string to fit StringFormat(). It then saves the formatted string in the xml file and returns the formatted call into SciTE. If the selected string is already defined it will directly parse the key into SciTE. To change the default output file, you can either edit the ini-file in the @ScriptDir of the LanguageTransmitter.exe or press Alt+A on empty space again and keep clicking cancel/no until the Transmitter let's you select the current output file. Standard output is strings.xml in the current opened AutoIt Script.
      Example:
      ; given the line: MsgBox(16, 'Error', 'Error message') ; select 'Error' run the Transmitter follow the instructions, repeat with 'Error message'. Outcome (e.g.): MsgBox(16, s('Error'), s('Error_msg')) ; Variables and Macro example: $sString = "Value: " & $iValue & @CRLF & 'Another value: '& $iValue2 ; Select the full string including all AutoIt variables and macros etc. in SciTE and press Alt+A (default) to run the LanguageTransmitter ; follow the instructions and it will then paste a formatted string like that into SciTE: $sVar = StringFormat(s('Key'), $iValue, $iValue2) ; the correlating xml entry should look like that: ; <string name="Key">Value: %s\r\nAnother value: %s</string> ; as you can see @CRLF has been replaced with \r\n as well. If you are working on a project and want to directly add a string to the xml file just start the Language Transmitter without selecting any text, enter your string and a key.
      SetUp/Installation
      Examples
      Since xml files are required all examples can be found in the zip file. There are two examples, one includes a language selection interface.
      Language File Checker
      I added a script to check whether the xml file contains all required strings or even unnecessary strings.
       
      Hope you like my little helper!
       
      UDF - LanguageSupport.zip
    • natedog102
      By natedog102
      Hi everyone. I want to format the output of _INetGetSource to look nice and pretty. 
      Example google.com source output: 
      <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script>(function(){window.google={kEI:'DJtTWvCOI6WGjwSE9JrICg',kEXPI:'18167,1354277,1354916,1355218,1355675,1355793,1356171,1356806,1357219,1357326,3700304,3700519,3700521,4003510,4029815,4031109,4043492,4045841,4048347,4081038,4081164,4095909,4096834,4097153,4097195,4097922,4097929,4098733,4098740,4098752,4102237,4102827,4103475,4103845,4106084,4107914,4109316,4109490,4112770,4113217,4115697,4116349,4116724,4116731,4116926,4116927,4116935,4117980,4118798,4119032,4119034,4119036,4120285,4120286,4120660,4121175,4121518,4122511,4123830,4123850,4124091,4124850,4125837,4126202,4126754,4126869,4127262,4127418,4127473,4127744,4127863,4128586,4128622,4129001,4129520,4129556,4129633,4130362,4130783,4131247,4131834,4132956,4133114,4133509,4135025,4135088,4135249,4135934,4136073,4136092,4136137,4137597,4137646,4140792,4140849,4141281,4141707,4141915,4142071,4142328,4142420,4142443,4142503,4142678,4142729,4142829,4142834,4142847,4143278,4143527,4143902,4144442,4144550,4144704,4145074,4145075,4145082,4145088,4145461,4145485,4145622,4145688,4145713,4145836,4146146,4146183,4146874,4147032,4147043,4147096,4147443,4147800,4147951,4148257,4148304,4148436,4148498,4148573,6512220,10200083,10202524,10202562,15807763,19000288,19000423,19000427,19001999,19002287,19002288,19002366,19002548,19002880,19003321,19003323,19003325,19003326,19003328,19003329,19003330,19003407,19003408,19003409,19004309,19004516,19004517,19004518,19004519,19004520,19004521,19004531,19004656,19004668,19004670,19004692,41317155',authuser:0,kscs:'c9c918f0_DJtTWvCOI6WGjwSE9JrICg',u:'c9c918f0',kGL:'US'};google.kHL='en';})();(function(){google.lc=[];google.li=0;google.getEI=function(a){for(var b;a&&(!a.getAttribute||!(b=a.getAttribute("eid")));)a=a.parentNode;return b||google.kEI};google.getLEI=function(a){for(var b=null;a&&(!a.getAttribute||!(b=a.getAttribute("leid")));)a=a.parentNode;return b};google.https=function(){return"https:"==window.location.protocol};google.ml=function(){return null};google.wl=function(a,b){try{google.ml(Error(a),!1,b)}catch(d){}};google.time=function(){return(new Date).getTime()};google.log=function(a,b,d,c,g){if(a=google.logUrl(a,b,d,c,g)){b=new Image;var e=google.lc,f=google.li;e[f]=b;b.onerror=b.onload=b.onabort=function(){delete e[f]};google.vel&&google.vel.lu&&google.vel.lu(a);b.src=a;google.li=f+1}};google.logUrl=function(a,b,d,c,g){var e="",f=google.ls||"";d||-1!=b.search("&ei=")||(e="&ei="+google.getEI(c),-1==b.search("&lei=")&&(c=google.getLEI(c))&&(e+="&lei="+c));c="";!d&&google.cshid&&-1==b.search("&cshid=")&&(c="&cshid="+google.cshid);a=d||"/"+(g||"gen_204")+"?atyp=i&ct="+a+"&cad="+b+e+f+"&zx="+google.time()+c;/^http:/i.test(a)&&google.https()&&(google.ml(Error("a"),!1,{src:a,glmm:1}),a="");return a};}).call(this);(function(){google.y={};google.x=function(a,b){if(a)var c=a.id;else{do c=Math.random();while(google.y[c])}google.y[c]=[a,b];return!1};google.lm=[];google.plm=function(a){google.lm.push.apply(google.lm,a)};google.lq=[];google.load=function(a,b,c){google.lq.push([[a],b,c])};google.loadAll=function(a,b){google.lq.push([a,b])};}).call(this);google.f={};var a=window.location,b=a.href.indexOf("#");if(0<=b){var c=a.href.substring(b+1);/(^|&)q=/.test(c)&&-1==c.indexOf("#")&&a.replace("/search?"+c.replace(/(^|&)fp=[^&]*/g,"")+"&cad=h")};</script><style>#gbar,#guser{font-size:13px;padding-top:1px !important;}#gbar{height:22px}#guser{padding-bottom:7px !important;text-align:right}.gbh,.gbd{border-top:1px solid #c9d7f1;font-size:1px}.gbh{height:0;position:absolute;top:24px;width:100%}@media all{.gb1{height:22px;margin-right:.5em;vertical-align:top}#gbar{float:left}}a.gb1,a.gb4{text-decoration:underline !important}a.gb1,a.gb4{color:#00c !important}.gbi .gb4{color:#dd8e27 !important}.gbf .gb4{color:#900 !important} But I want it outputted like this:
      <!doctype html> <html itemscope="" itemtype="http://schema.org/WebPage" lang="en"> <head> <meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"> <meta content="noodp" name="robots"> <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"> <meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"> <title>Google</title> <script> (function() { window.google = { kEI: 'DJtsdfgWGjwSE9JrICg', kEXPI: '18167,1354277,1354916,1355218,1355675,1355793,1356171,1356806,1357219,1357326,37sdfg0304,3700519,3700521,4003510,4029815,4031109,4043492,4045841,4048347,4081038,4081164,4095909,4096834,4097153,4097195,4097922,4097929,4098733,4098740,4098752,4102237,4102827,4103475,4103845,4106084,4107914,4109316,4109490,4112770,4113217,4115697,4116349,4116724,4116731,4116926,4116927,4116935,4117980,4118798,4119032,4119034,4119036,4120285,4120286,4120660,4121175,4121518,4122511,4123830,4123850,4124091,4124850,4125837,4126202,4126754,4126869,4127262,4127418,4127473,4127744,4127863,4128586,4128622,4129001,4129520,4129556,4129633,4130362,4130783,4131247,4131834,413sdfg56,4133114,4133509,4135025,4135088,4135249,4135934,4136073,4136092,4136137,4137597,4137646,4140792,4140849,4141281,4141707,4141915,4142071,4142328,4142420,4142443,4142503,4142678,4142729,4142829,4142834,4142847,4143278,4143527,4143902,4144442,4144550,4144704,4145074,4145075,4145082,4145088,4145461,4145485,4145622,4145688,4145713,4145836,4146146,4146183,4146874,4147032,4147043,4147096,4147443,4147800,4147951,4148257,4148304,4148436,4148498,4148573,6512220,10200083,10202524,10202562,15807763,19000288,190sdfg23,19000427,19001999,19002287,19002288,19002366,19002548,19002880,19003321,19003323,19003325,19003326,19003328,19003329,19003330,19003407,19003408,19003409,19004309,19004516,19004517,19004518,19004519,19004520,19004521,19004531,19004656,19004668,19004670,19004692,41317155', authuser: 0, kscs: 'c9c918f0_DJtTWvCOI6WGjwSE9JrICg', u: 'c9c918f0', kGL: 'US' }; google.kHL = 'en'; })(); ....... I checked the forums and did not see any UDFs that allow for this. I see the Chilkat UDF but that only supports JSON. Any help would be greatly appreciated.
    • islandspapand
      By islandspapand
      Hi all
      i am currently trying to click on an element in a HTML Table, but just can get it to work.
      i am able to click the top of the table so it changes to sort  but just can't click on the element in the table.
      an i need to click on element to continue in the site.
      i have attached the code so far and pictures of the table  element want to click plus the source of the table.
      i am able to get data in the table with $oTable = _IETableGetCollection($oIE, 2) but not able to click on them.
       
      Help is very much appreciated
       
      #cs ---------------------------------------------------------------------------- AutoIt Version: 3.3.14.2 Author: myName Script Function: Template AutoIt script. #ce ---------------------------------------------------------------------------- ; Script Start - Add your code below here #include <IE.au3> #include "DOM.au3" #include <Array.au3> #include <MsgBoxConstants.au3> Global $oIE = _IECreate("*") _IELoadWait($oIE) Sleep(2000) _PageLogin($oIE) _PageLoadWait() _PageNewReq($oIE) _PageLoadWait() _InputModelInf($oIE) _PageLoadWait() Sleep(1000) $aTableLink = BGe_IEGetDOMObjByXPathWithAttributes($oIE, "//table/tbody/tr/td[.='Name Of user']", 2000) ;~ $aTableLink = BGe_IEGetDOMObjByXPathWithAttributes($oIE, "//table/tbody/tr", 2000) ;~ _ArrayDisplay($aTableLink,"$aTableLink") If IsArray($aTableLink) Then ConsoleWrite("Able to BGe_IEGetDOMObjByXPathWithAttributes($oIE, //table/tbody/tr/td[.='Name Of user'])" & @CRLF) For $i = 0 To UBound($aTableLink)-1 ConsoleWrite(" OuterHTML : " & $aTableLink[$i].outerHTML & @CRLF) ConsoleWrite(" Parentnode : " & $aTableLink[$i].parentnode & @CRLF) ConsoleWrite(" Parentnode.click : " & $aTableLink[$i].parentnode.fireEvent("onclick","click") & @CRLF) $objClick = $aTableLink[$i].parentnode ;~ _IEAction($aTableLink[$i] , "focus") _IEAction($objClick , "focus") ;~ If _IEAction($aTableLink[$i], "click") Then If _IEAction($objClick, "click") Then ConsoleWrite("Able to _IEAction($aForumLink[0], 'click')" & @CRLF) _IELoadWait($oIE) Else ConsoleWrite("UNable to _IEAction($aForumLink[0], 'click')" & @CRLF) Exit 3 EndIf Next Else ConsoleWrite("Unable to BGe_IEGetDOMObjByXPathWithAttributes($oIE, //table/tbody/tr/td[.='Name Of user'])" & @CRLF) Exit 2 EndIf _PageLoadWait() Func _InputModelInf($oTmpIE) ; Add Var for Model & Serial in Func $oModelInput = _IEGetObjById($oTmpIE,"model") _IEAction($oModelInput,"focus") _IEDocInsertText($oModelInput, "*") $oSerialInput = _IEGetObjById($oTmpIE,"serial") _IEAction($oModelInput,"focus") _IEDocInsertText($oSerialInput, "*") $links = $oTmpIE.document.getElementsByClassName("btn btn-primary ng-scope") For $link In $links If $link.innertext = "Søg" Or $link.innertext = "Search" Then $link.click() ExitLoop EndIf Next Return True EndFunc Func _PageNewReq($oTmpIE) $links = $oTmpIE.document.getElementsByClassName("ng-scope k-link") For $link In $links If $link.innertext = "Send ny fejlmelding" Or $link.innertext = "Submit a New Service Request" Then $link.click() ExitLoop EndIf Next Return True EndFunc Func _PageLogin($oTmpIE) $oUserInput = _IEGetObjById($oTmpIE,"loginid") _IEDocInsertText($oUserInput, "*") $oPasswordInput = _IEGetObjById($oTmpIE,"password") _IEDocInsertText($oPasswordInput, "*") $links = $oTmpIE.document.getElementsByClassName("btn btn-primary login ng-scope") For $link In $links If $link.innertext = "Sign in" Then $link.click() ExitLoop EndIf Next Return True EndFunc Func _PageLoadWait() Local $PageLoadWait = False ;~ nav navbar-nav navbar-right ng-hide ;~ nav navbar-nav navbar-right $tags = $oIE.document.GetElementsByTagName("ul") For $tag in $tags $class_value = $tag.GetAttribute("class") If $class_value = "nav navbar-nav navbar-right" Then ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : Webpage loading :) ' & @CRLF) ;### Debug Console $PageLoadWait = True ExitLoop EndIf Next Do sleep(250) For $tag in $tags $class_value = $tag.GetAttribute("class") If $class_value = "nav navbar-nav navbar-right ng-hide" Then ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : Webpage load finished :)'& @CRLF) ;### Debug Console $PageLoadWait = False ExitLoop EndIf Next Until $PageLoadWait = False EndFunc  
      Thanks in advance
       
       


    • imitto
      By imitto
      Hello all!
      I use Autoit for a while, already made some automation for a TV station's master control room with it. I made a UDF to easily work with PAL timecode and time with milliseconds, convert, add or subtract them. Feel free to use it if you want something like this
      #Region ;**** Directives created by AutoIt3Wrapper_GUI **** #AutoIt3Wrapper_Res_Description=PAL Timecode Calculator UDF #AutoIt3Wrapper_Res_LegalCopyright=horvath.imre@gmail.com #EndRegion ;**** Directives created by AutoIt3Wrapper_GUI **** ; ; #FUNCTION# ; Name...........: _tcAdd ; Description....: Returns addition of two timecodes ; Syntax.........: _tcAdd($fTc1, fTc2 [, $fFormat = "P"]) ; ; Parameters.....: $fTc1 - First timecode in hh:mm:ss.ff format ; $fTc2 - Second timecode in hh:mm:ss.ff format ; $fFormat - Time base - "P" (default): PAL (25 fps) ; "M" : millisecond ; ; Return value...: Sum of the two timecode in the selected format Func _tcAdd($fTc1, $fTc2, $fFormat = "P", $fHourFormat = 1) Local $fMs1 = _tcToMs($fTc1) Local $fMs2 = _tcToMs($fTc2) Local $fSumMs = $fMs1 + $fMs2 Return _msToTc($fSumMs, $fFormat, $fHourFormat) EndFunc ; #FUNCTION# ; Name...........: _tcsSub ; Description....: Returns addition of two timecodes ; Syntax.........: _tcSub($fTc1, fTc2 [, $fFormat = "P"]) ; ; Parameters.....: $fTc1 - First timecode in hh:mm:ss.ff format ; $fTc2 - Second timecode in hh:mm:ss.ff format ; $fFormat - Time base - "P" (default): PAL (25 fps) ; "M" : millisecond ; ; Return value...: Subtract $fTc2 from $fTc1 in the source format Func _tcSub($fTc1, $fTc2, $fFormat = "P") Local $fMs1 = _tcToMs($fTc1) Local $fMs2 = _tcToMs($fTc2) Local $fSumMs = $fMs1 - $fMs2 If $fSumMs < 0 Then $fSumMs = _tcToMs("24:00:00.00") - ($fSumMs * -1) EndIf Return _msToTc($fSumMs, $fFormat) EndFunc ; #FUNCTION# ; Name...........: _tcToMs ; Description....: Returns timecode converted to total milliseconds ; Syntax.........: _tcToMs($fTc) ; ; Parameters.....: $fTc - Timecode in hh:mm:ss.ff or hh:mm:ss:xxx format, where xxx are milliseconds ; ; Return value...: Milliseconds as an integer value Func _tcToMs($fTc) Local $fTemp = StringSplit($fTc, ":.") Local $fChr = StringLen($fTemp[4]) Switch $fChr Case 2 Return ($fTemp[4] * 40) + ($fTemp[3] * 1000) + ($fTemp[2] * 60000) + ($fTemp[1] * 3600000) Case 3 Return ($fTemp[4]) + ($fTemp[3] * 1000) + ($fTemp[2] * 60000) + ($fTemp[1] * 3600000) EndSwitch EndFunc ; #FUNCTION# ; Name...........: _msToTc ; Description....: Converts total milliseconds to timecode ; Syntax.........: _msToTc($fIn, $fFormat = "P", $fHourFormat = 1) ; ; Parameters.....: $fIn - Time in milliseconds ; $fFormat - Output format "P": PAL TC (default) ; "M": hh:mm:ss.xxx where xxx are milliseconds ; $fHourFormat - Hour format "1": max. value is 23, then starts from 0 (default) ; "0": hours can be more then 23 ; ; Return value...: Timecode as string in the selected format Func _msToTc($fIn, $fFormat = "P", $fHourFormat = 1) Switch $fFormat Case "P" Local $fFr = StringFormat("%02i", (StringRight($fIn, 3) - Mod(StringRight($fIn, 3), 40)) / 40) Case "M" Local $fFr = StringFormat("%03i", StringRight($fIn, 3)) EndSwitch $fIn = StringTrimRight($fIn, 3) Local $fSec = StringFormat("%02i", Mod($fIn, 60)) $fIn -= $fSec Local $fMinTot = $fIn / 60 Local $fMin = StringFormat("%02i", Mod($fMinTot, 60)) $fIn -= $fMin*60 Local $fHourTot = $fIn / 60 / 60 Switch $fHourFormat Case 1 $fHour = StringFormat("%02i", Mod($fHourTot, 24)) Case 0 $fHour = StringFormat("%02i", $fHourTot) EndSwitch Return($fHour & ":" & $fMin & ":" & $fSec & "." & $fFr) EndFunc ; #FUNCTION# ; Name...........: _tcFormatChange ; Description....: Toggle TC format ; Syntax.........: _tcFormatChange($fTc) ; ; Parameters.....: $fTc - Timecode in hh:mm:ss.ff or hh:mm:ss:xxx format, where xxx are milliseconds ; ; Return value...: PAL timecode or time with milliseconds as string, depends on input Func _tcFormatChange($fTc) Local $fTemp = StringSplit($fTc, ":.") Local $fChr = StringLen($fTemp[4]) Switch $fChr Case 2 Return $fTemp[1]&":"&$fTemp[2]&":"&$fTemp[3]&"."&StringFormat("%03i", $fTemp[4]*40) Case 3 Return $fTemp[1]&":"&$fTemp[2]&":"&$fTemp[3]&"."&StringFormat("%02i", ($fTemp[4]-Mod($fTemp[4], 40))/40) EndSwitch EndFunc And the example script:
      #include<_PAL_TC_Calc.au3> $palTC1 = "00:01:12.20" $palTC2 = "23:59:50.02" $msTC1 = "00:01:12.800" $msTC2 = "23:59:50.120" MsgBox(0, "1", _tcAdd($palTC1, $palTC2)); Adds $palTC1 to $palTC2, turns hour back to 0 after 23, returns PAL TC format MsgBox(0, "2", _tcAdd($palTC1, $palTC2, "M")); Adds $palTC1 to $palTC2, turns hour back to 0 after 23, returns time with milliseconds format MsgBox(0, "3", _tcAdd($palTC1, $palTC2, "M", 0)); Adds $palTC1 to $palTC2, hours can be infinite, returns time with milliseconds format MsgBox(0, "4", _tcAdd($msTC1, $msTC2)); Adds $palTC1 to $palTC2, turns hour back to 0 after 23, returns PAL TC format MsgBox(0, "5", _tcAdd($msTC1, $msTC2, "M")); Adds $palTC1 to $palTC2, turns hour back to 0 after 23, returns time with milliseconds format MsgBox(0, "6", _tcAdd($msTC1, $msTC2, "M", 0)); Adds $palTC1 to $palTC2, hours can be infinite, returns time with milliseconds format MsgBox(0, "7", _tcSub($palTC2, $palTC1)); Subtract $palTC1 from $palTC2, returns PAL TC format MsgBox(0, "8", _tcSub($palTC2, $palTC1, "M")); Subtract $palTC1 from $palTC2, time with milliseconds format MsgBox(0, "9", _tcSub($msTC1, $msTC2)); Subtract $palTC1 from $palTC2, returns PAL TC format - when hits zero, counts back from 24:00:00.00 MsgBox(0, "10", _tcSub($msTC1, $msTC2, "M")); Subtract $palTC1 from $palTC2, time with milliseconds format - when hits zero, counts back from 24:00:00.000 MsgBox(0, "11", _tcFormatChange($palTC2)); Convert PAL TC to time with milliseconds and back MsgBox(0, "12", _tcFormatChange($msTC2)); Convert PAL TC to time with milliseconds and back  
      TC_CALC_example.au3
      _PAL_TC_Calc.au3
×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.