Sign in to follow this  
Followers 0
Zedna

RegExp - effective HTML table parsing (rows,columns)

16 posts in this topic

#1 ·  Posted (edited)

I'm doing parsing of HTML file with <table>. I need to go through rows and columns of table, ideally to get two dimensional array.
I use this way with simple two levels of calling StrinRegExp() for rows and columns:

;~ $html = FileRead('table.html')
$html = '<tr><td>r1c1</td> <td>r1c2</td></tr>  <tr><td>r2c1</td> <td>r2c2</td></tr>  <tr><td>r3c1</td> <td>r3c2</td></tr>'

$rows = StringRegExp($html, '(?s)(?i)<tr>(.*?)</tr>', 3)
For $i = 0 to UBound($rows) - 1
    $row = $rows[$i]
    ConsoleWrite("Row " & $i & ': ' & $row & @CRLF)

    $cols = StringRegExp($row, '(?s)(?i)<td>(.*?)</td>', 3)
    For $j = 0 to UBound($cols) - 1
        $col = $cols[$j]
        ConsoleWrite("  Col " & $j & ': ' & $col & @CRLF)
    Next
Next

Output:

Row 0:  <td>r1c1</td> <td>r1c2</td>
  Col 0: r1c1
  Col 1: r1c2
Row 1:  <td>r2c1</td> <td>r2c2</td>
  Col 0: r2c1
  Col 1: r2c2
Row 2:  <td>r3c1</td> <td>r3c2</td>
  Col 0: r3c1
  Col 1: r3c2

 

 

In my example there is called StringRegExp() for each row of table which is ineffective for many rows.

It works fine, but my question is if there is better and more effective approach, maybe some clever the only one RegExp pattern?

Or maybe using StringRegExp with option=4? I 'm not experienced with this option (array in array) and example in helpfile is not very clear to me so I don't know if this option=4 can be used also for HTML table parsing.

Edited by Zedna

Share this post


Link to post
Share on other sites

#2 ·  Posted (edited)

I believe you can get rid of the question mark in the group (.*?) [but only if you use  (?U) at the start of your pattern :whistle:]. Your approach is the same as I would use. I've never used option=4. It would be nice to see more examples.

Edited by czardas

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Well, not really sure, but here is a start point :

#include <Array.au3>


$html = '<tr><td>r1c1</td> <td>r1c2</td></tr>  <tr><td>r2c1</td> <td>r2c2</td></tr>  <tr><td>r3c1</td> <td>r3c2</td></tr>'


$aRes = StringRegExp($html, "(?is)<tr.*?>.*?(?:<td.*?>(.*?)<\/td>\s*)(?=(?:<td.*?>(.*?)<\/td>)?).*?<\/tr>", 4)

For $i = 0 To UBound($aRes) - 1
    If IsArray($aRes[$i]) Then
        $tab = $aRes[$i]
        _ArrayDisplay($tab)
    EndIf
Next


_ArrayDisplay($aRes)

I don't know why there is an empty last result...

Edit : I use <td.*?> to match this king of tag : <td id='toto' ....> (so out of your example)

Edited by jguinch
1 person likes this

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Edit : I use <td.*?> to match this king of tag : <td id='toto' ....> (so out of your example)

 

Nice example. Also silly me. The question mark is needed if you don't use (?U) at the start of the pattern. I tend to do that a lot which is why I thought the question mark wasn't needed. Sorry for the misinformation. :doh:

Edited by czardas

Share this post


Link to post
Share on other sites

AFAIK there is no way to get a 2D array from a single regex :)

I personally use something like this

#include <Array.au3>

;~ $html = FileRead('table.html')
$html = '<tr><td>r1c1</td> <td>r1c2</td> </tr>  <tr><td>r2c1   </td> <td>r2c2</td><td>r2c3</td></tr>  <tr><td></td><td>r3c2</td> <td>   r3c3</td></tr>  <tr><td></td><td>r4c2</td> <td> </td></tr>'

$rows = StringRegExp($html, '(?is)<tr>(.*?)</tr>', 3)
Local $a[UBound($rows)][100], $icol = 0

For $i = 0 to UBound($rows) - 1
    $cols = StringRegExp($rows[$i], '(?is)<td>(.*?)</td>', 3)
    $icol = ($icol > UBound($cols)) ? $icol : UBound($cols)
    For $j = 0 to UBound($cols) - 1
        $a[$i][$j] = StringStripWS($cols[$j], 3)
    Next
Next
Redim $a[UBound($rows)][$icol]
_ArrayDisplay($a)

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

I couldn't get option=4 to work. :( Here's a slightly different approach.

#include <Array.au3>

Local $html = '<tr><td>r1c1</td> <td>r1c2</td></tr>  <tr><td>r2c1</td> <td>r2c2</td></tr>  <tr><td>r3c1</td> <td>r3c2</td></tr>'
Local $aRes = StringRegExp($html, "(?isU)<tr>|(?:<td>)(.*)(?:</td>)", 3)
_ArrayDisplay($aRes)
Edited by czardas
1 person likes this

Share this post


Link to post
Share on other sites

Every options return an 1 dimension array, so you cannot get a 2D array, unless parsing yourself :(

Option 4 is not very interesting. Think like that, the "pattern" can be broken into "part". With option 3, you find "part". With option 4, you find entire "pattern", which returned at the element 0 of the nested array. And other "parts" is returned at other index.

I think the snippet you wrote is the best we can do. If performance is too important, you can use a C HTML/XML parsing library and DllCall().


99 little bugs in the code

99 little bugs!

Take one down, patch it around

117 little bugs in the code!

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

@czardas : very nice !

I'm suprised about the <tr> capturing, but AutoIt captures everything if no parenthese is used, so everything is <tr> :geek:

BTW, I think you are using useless non-capturing groups. It's also OK with this: (?isU)<tr>|<td>(.*)</td>

(?|(<tr>)|<td>(.*)</td>) for compatibility with regex101.com for example

Edited by jguinch

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

Not surprising at all as it captures both sides of the alternation

More noticeable trying this (?isU)<tr|<td>(.*)</td>

:)

Edited by mikell

Share this post


Link to post
Share on other sites

Nice examples jguinch and mikell. I think you're always learning new things with regexp - I know I am. :)

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

...

I think the snippet you wrote is the best we can do. If performance is too important, you can use a C HTML/XML parsing library and DllCall().

 

Here is another parsing snipet optimized for speed, only with one StringRegExp()

It's based on the premise of known number of columns.

Number of columns ($cols_on_row) can be checked by StringRegExp() before main For/Next loop if needed.

Rows needn't to be parsed by StringRegExp(), instead rows can be calculated by Mod() function.

$html = '<tr><td>r1c1</td> <td>r1c2</td></tr>  <tr><td>r2c1</td> <td>r2c2</td></tr>  <tr><td>r3c1</td> <td>r3c2</td></tr>'

$cols_on_row = 2 ; known number of columns
$row = 0
$cols = StringRegExp($html, '(?s)(?i)<td>(.*?)</td>', 3)
For $i = 0 to UBound($cols) - 1
    $col = $cols[$i]
    If Mod($i,$cols_on_row) = 0 Then $row += 1
    ConsoleWrite("Row " & $row & "  Col " & Mod($i,$cols_on_row) & ': ' & $col & @CRLF)
Next

Output:

 

Row 1  Col 0: r1c1

Row 1  Col 1: r1c2

Row 2  Col 0: r2c1

Row 2  Col 1: r2c2

Row 3  Col 0: r3c1

Row 3  Col 1: r3c2

Edited by Zedna
1 person likes this

Share this post


Link to post
Share on other sites

#13 ·  Posted (edited)

And here is slightly modified version to distinguish code part for row and for column

$html = '<tr><td>r1c1</td> <td>r1c2</td></tr>  <tr><td>r2c1</td> <td>r2c2</td></tr>  <tr><td>r3c1</td> <td>r3c2</td></tr>'

$cols_on_row = 2 ; known number of columns
$row = 0
$cols = StringRegExp($html, '(?s)(?i)<td>(.*?)</td>', 3)
For $i = 0 to UBound($cols) - 1
    $col = $cols[$i]
    If Mod($i,$cols_on_row) = 0 Then
        $row += 1
        ConsoleWrite("Row " & $row & @CRLF)
    EndIf
    ConsoleWrite("  Col " & Mod($i,$cols_on_row) & ': ' & $col & @CRLF)
Next

Output:

Row 1
  Col 0: r1c1
  Col 1: r1c2
Row 2
  Col 0: r2c1
  Col 1: r2c2
Row 3
  Col 0: r3c1
  Col 1: r3c2

Edited by Zedna
1 person likes this

Share this post


Link to post
Share on other sites

#14 ·  Posted (edited)

It's very interesting, but you will also not get a 2D array :)
And this is a little modified version, dynamic column:

$html = '<tr><td>r1c1</td> <td>r1c2</td></tr>  <tr><td>r2c1</td> <td>r2c2</td></tr>  <tr><td>r3c1</td> <td>r3c2</td></tr><tr><td>r4c1</td> <td>r4c2</td></tr>'

Local $aArray = StringRegExp($html, '(<tr>)*<td>(.*?)</td>', 4)
Local $aMatch = 0
Local $row = -1
Local $nCounter = 0
For $i = 0 To UBound($aArray) - 1
    $aMatch = $aArray[$i]
    If (StringLeft($aMatch[0], 3) = '<tr') Then
        $row += 1
        $nCounter = 0
        ConsoleWrite("Row " & $row & @CRLF)
    EndIf
    $col = $aMatch[2]
    ConsoleWrite("  Col " & $nCounter & ': ' & $col & @CRLF)
    $nCounter += 1
Next

Or better, check close tag instead of open tag, and forget about the id/class/attributes.... Also, it will result smaller array.

$html = '<tr><td>r1c1</td> <td>r1c2</td></tr>  <tr><td>r2c1</td> <td>r2c2</td></tr>  <tr><td>r3c1</td> <td>r3c2</td></tr><tr><td>r4c1</td> <td>r4c2</td></tr>'

Local $aArray = StringRegExp($html, '<td>(.*?)</td>(</tr>)*', 4)
Local $aMatch = 0
Local $row = 0
Local $nCounter = 0
ConsoleWrite("Row " & $row & @CRLF)

For $i = 0 To UBound($aArray) - 1
    $aMatch = $aArray[$i]
    $col = $aMatch[1]
    ConsoleWrite("  Col " & $nCounter & ': ' & $col & @CRLF)
    $nCounter += 1

    If (StringRight($aMatch[0], 5) = '</tr>') Then
        $row += 1
        $nCounter = 0
        ; Last ConsoleWrite should be ignored
        ConsoleWrite("Row " & $row & @CRLF)
    EndIf
Next

Output uglier string, but more performance in works
 
Or more better, use For... In to eliminate the array copy when assign variable $aMatch:

$html = '<tr><td>r1c1</td> <td>r1c2</td></tr>  <tr><td>r2c1</td> <td>r2c2</td></tr>  <tr><td>r3c1</td> <td>r3c2</td></tr><tr><td>r4c1</td> <td>r4c2</td></tr>'

Local $aArray = StringRegExp($html, '<td>(.*?)</td>(</tr>)*', 4)
Local $aMatch = 0
Local $row = 0
Local $nCounter = 0
ConsoleWrite("Row " & $row & @CRLF)

If (IsArray($aArray)) Then
    For $aMatch In $aArray
        $col = $aMatch[1]
        ConsoleWrite("  Col " & $nCounter & ': ' & $col & @CRLF)
        $nCounter += 1

        If (StringRight($aMatch[0], 5) = '</tr>') Then
            $row += 1
            $nCounter = 0
            ; Last ConsoleWrite should be ignored
            ConsoleWrite("Row " & $row & @CRLF)
        EndIf
    Next
EndIf

And don't need flag=4, we can use flag=3, too, shorter (and maybe more performance) version:

$html = '<tr><td>r1c1</td> <td>r1c2</td></tr>  <tr><td>r2c1</td> <td>r2c2</td></tr>  <tr><td>r3c1</td> <td>r3c2</td></tr><tr><td>r4c1</td> <td>r4c2</td></tr>'

Local $aArray = StringRegExp($html, '<td>(.*?)</td>(</tr>)*', 3)
Local $aMatch = 0
Local $row = 0
Local $col = 0
ConsoleWrite("Row " & $row & @CRLF)

If (IsArray($aArray)) Then
    For $ele In $aArray
        If ($ele <> '</tr>') Then
            ConsoleWrite("  Col " & $col & ': ' & $ele & @CRLF)
            $col += 1
        Else
            $row += 1
            ConsoleWrite("Row " & $row & @CRLF)
            $col = 0
        EndIf
    Next
EndIf
Edited by binhnx

99 little bugs in the code

99 little bugs!

Take one down, patch it around

117 little bugs in the code!

Share this post


Link to post
Share on other sites

#15 ·  Posted (edited)

Here is another parsing snipet optimized for speed, only with one StringRegExp()

It's based on the premise of known number of columns.

Number of columns ($cols_on_row) can be checked by StringRegExp() before main For/Next loop if needed.

Rows needn't to be parsed by StringRegExp(), instead rows can be calculated by Mod() function.

$html = '<tr><td>r1c1</td> <td>r1c2</td></tr>  <tr><td>r2c1</td> <td>r2c2</td></tr>  <tr><td>r3c1</td> <td>r3c2</td></tr>'

$cols_on_row = 2 ; known number of columns
$row = 0
$cols = StringRegExp($html, '(?s)(?i)<td>(.*?)</td>', 3)
For $i = 0 to UBound($cols) - 1
    $col = $cols[$i]
    If Mod($i,$cols_on_row) = 0 Then $row += 1
    ConsoleWrite("Row " & $row & "  Col " & Mod($i,$cols_on_row) & ': ' & $col & @CRLF)
Next

Output:

 

Here is modified version also with dynamic checking for number of columns from table header tags <th> </th>:

;~ $html = FileRead('table.html')
$html = '<tr><th>column1</th> <th>column2</th></tr>  <tr><td>r1c1</td> <td>r1c2</td></tr>  <tr><td>r2c1</td> <td>r2c2</td></tr>  <tr><td>r3c1</td> <td>r3c2</td></tr>'

$colnames = StringRegExp($html, '(?s)(?i)<th>(.*?)</th>', 3)
For $j = 0 to UBound($colnames) - 1
    ConsoleWrite("Col name " & $j & ': ' & $colnames[$j] & @CRLF)
Next
ConsoleWrite("Number of columns: " & UBound($colnames) & @CRLF & @CRLF)

$cols_on_row = UBound($colnames)
$row = 0
$cols = StringRegExp($html, '(?s)(?i)<td>(.*?)</td>', 3)

For $i = 0 to UBound($cols) - 1
    $col = $cols[$i]
    If Mod($i,$cols_on_row) = 0 Then
        $row += 1
        ConsoleWrite("Row " & $row & @CRLF)
    EndIf
    ConsoleWrite("  Col " & Mod($i,$cols_on_row) & ': ' & $col & @CRLF)
Next

Output:

Col name 0: column1

Col name 1: column2

Number of columns: 2

Row 1

  Col 0: r1c1

  Col 1: r1c2

Row 2

  Col 0: r2c1

  Col 1: r2c2

Row 3

  Col 0: r3c1

  Col 1: r3c2

 

 

This is final version which I will use in my project, because there is table with table header tags included.

Anyway thanks to all for given interesting RegExp ideas, feel free to add another ones ...  :-)

Edited by Zedna

Share this post


Link to post
Share on other sites

A last one for me, with a mix of other codes :

#Include <Array.au3>

$sHtml = '<tr><th>column1</th> <th>column2</th></tr>  <tr><td>r1c1</td> <td>r1c2</td></tr>  <tr><td>r2c1</td> <td>r2c2</td></tr>  <tr><td>r3c1</td> <td>r3c2</td></tr>'

$aRes = StringRegExp($sHtml, "(?isU)(?|<(/)tr>\s*|<t[dh].*>(.*)</t[dh]>)", 3)

Local $aResult[ UBound($aRes) ] [ UBound($aRes) ]
Local $iRow = 0, $iCol = 0, $iMaxRow = 0

For $i = 0 To UBound($aRes) - 1
    If $aRes[$i] = "/" Then
        $iRow += 1
        $iCol = 0
    Else
        $aResult[$iRow][$iCol] = $aRes[$i]
        $iCol += 1
        If $iCol > $iMaxRow Then $iMaxRow = $iCol
    EndIf
Next

Redim $aResult[$iRow][$iMaxRow]

_ArrayDisplay($aResult)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0

  • Similar Content

    • Ascer
      By Ascer
      1. Description.
      Udf working with MSDN System.Collections.ArrayList. Allow you to make fast operations on huge arrays, speed is even x10 better than basic _ArrayAdd.  Not prefered for small arrays < 600 items. 2. Requirements
      .NET Framework 1.1+ System Windows 3. Possibilities.
      ;=============================================================================================================== ; UDF Name: List.au3 ; ; Date: 2018-02-17, 10:52 ; Description: Simple udf to create System Collections as ArrayList and make multiple actions on them. ; ; Function(s): _ListCreate -> Creates a new list ; _ListCapacity -> Gets a list size in bytes ; _ListCount -> Gets items count in list ; _ListIsFixedSize -> Get bool if list if fixed size ; _ListIsReadOnly -> Get bool if list is read only ; _ListIsSynchronized -> Get bool if list is synchronized ; _ListGetItem -> Get item on index ; _ListSetItem -> Set item on index ; ; _ListAdd -> Add item at end of list ; _ListClear -> Remove all list items ; _ListClone -> Duplicate list in new var ; _ListContains -> Get bool if item is in list ; _ListGetHashCode -> Get hash code for list ; _ListGetRange -> Get list with items between indexs ; _ListIndexOf -> Get index of item ; _ListInsert -> Insert a new item on index ; _ListInsertRange -> Insert list into list on index ; _ListLastIndexOf -> Get index last of item ; _ListRemove -> Remove first found item ; _ListRemoveAt -> Remove item in index ; _ListRemoveRange -> Remove items between indexs ; _ListReverse -> Reverse all items in list ; _ListSetRange -> Set new value for items in range ; _ListSort -> Sort items in list (speed of reading) ; _ListToString -> Get list object name ; _ListTrimToSize -> Remove unused space in list ; ; Author(s): Ascer ;=============================================================================================================== 4. Downloads
      List.au3 5. Examples
      SpeedTest _ArrayAdd vs ListAdd SpeedTest ArraySearch vs ListIndexOf Basic usage - crating guild with members  
    • Miliardsto
      By Miliardsto
      Hello . How to do that
      $regexp = starts from "abcdef" and after this could be anything in name
      WinActivate($regexp)
    • natedog102
      By natedog102
      Hi everyone. I want to format the output of _INetGetSource to look nice and pretty. 
      Example google.com source output: 
      <!doctype html><html itemscope="" itemtype="http://schema.org/WebPage" lang="en"><head><meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"><meta content="noodp" name="robots"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script>(function(){window.google={kEI:'DJtTWvCOI6WGjwSE9JrICg',kEXPI:'18167,1354277,1354916,1355218,1355675,1355793,1356171,1356806,1357219,1357326,3700304,3700519,3700521,4003510,4029815,4031109,4043492,4045841,4048347,4081038,4081164,4095909,4096834,4097153,4097195,4097922,4097929,4098733,4098740,4098752,4102237,4102827,4103475,4103845,4106084,4107914,4109316,4109490,4112770,4113217,4115697,4116349,4116724,4116731,4116926,4116927,4116935,4117980,4118798,4119032,4119034,4119036,4120285,4120286,4120660,4121175,4121518,4122511,4123830,4123850,4124091,4124850,4125837,4126202,4126754,4126869,4127262,4127418,4127473,4127744,4127863,4128586,4128622,4129001,4129520,4129556,4129633,4130362,4130783,4131247,4131834,4132956,4133114,4133509,4135025,4135088,4135249,4135934,4136073,4136092,4136137,4137597,4137646,4140792,4140849,4141281,4141707,4141915,4142071,4142328,4142420,4142443,4142503,4142678,4142729,4142829,4142834,4142847,4143278,4143527,4143902,4144442,4144550,4144704,4145074,4145075,4145082,4145088,4145461,4145485,4145622,4145688,4145713,4145836,4146146,4146183,4146874,4147032,4147043,4147096,4147443,4147800,4147951,4148257,4148304,4148436,4148498,4148573,6512220,10200083,10202524,10202562,15807763,19000288,19000423,19000427,19001999,19002287,19002288,19002366,19002548,19002880,19003321,19003323,19003325,19003326,19003328,19003329,19003330,19003407,19003408,19003409,19004309,19004516,19004517,19004518,19004519,19004520,19004521,19004531,19004656,19004668,19004670,19004692,41317155',authuser:0,kscs:'c9c918f0_DJtTWvCOI6WGjwSE9JrICg',u:'c9c918f0',kGL:'US'};google.kHL='en';})();(function(){google.lc=[];google.li=0;google.getEI=function(a){for(var b;a&&(!a.getAttribute||!(b=a.getAttribute("eid")));)a=a.parentNode;return b||google.kEI};google.getLEI=function(a){for(var b=null;a&&(!a.getAttribute||!(b=a.getAttribute("leid")));)a=a.parentNode;return b};google.https=function(){return"https:"==window.location.protocol};google.ml=function(){return null};google.wl=function(a,b){try{google.ml(Error(a),!1,b)}catch(d){}};google.time=function(){return(new Date).getTime()};google.log=function(a,b,d,c,g){if(a=google.logUrl(a,b,d,c,g)){b=new Image;var e=google.lc,f=google.li;e[f]=b;b.onerror=b.onload=b.onabort=function(){delete e[f]};google.vel&&google.vel.lu&&google.vel.lu(a);b.src=a;google.li=f+1}};google.logUrl=function(a,b,d,c,g){var e="",f=google.ls||"";d||-1!=b.search("&ei=")||(e="&ei="+google.getEI(c),-1==b.search("&lei=")&&(c=google.getLEI(c))&&(e+="&lei="+c));c="";!d&&google.cshid&&-1==b.search("&cshid=")&&(c="&cshid="+google.cshid);a=d||"/"+(g||"gen_204")+"?atyp=i&ct="+a+"&cad="+b+e+f+"&zx="+google.time()+c;/^http:/i.test(a)&&google.https()&&(google.ml(Error("a"),!1,{src:a,glmm:1}),a="");return a};}).call(this);(function(){google.y={};google.x=function(a,b){if(a)var c=a.id;else{do c=Math.random();while(google.y[c])}google.y[c]=[a,b];return!1};google.lm=[];google.plm=function(a){google.lm.push.apply(google.lm,a)};google.lq=[];google.load=function(a,b,c){google.lq.push([[a],b,c])};google.loadAll=function(a,b){google.lq.push([a,b])};}).call(this);google.f={};var a=window.location,b=a.href.indexOf("#");if(0<=b){var c=a.href.substring(b+1);/(^|&)q=/.test(c)&&-1==c.indexOf("#")&&a.replace("/search?"+c.replace(/(^|&)fp=[^&]*/g,"")+"&cad=h")};</script><style>#gbar,#guser{font-size:13px;padding-top:1px !important;}#gbar{height:22px}#guser{padding-bottom:7px !important;text-align:right}.gbh,.gbd{border-top:1px solid #c9d7f1;font-size:1px}.gbh{height:0;position:absolute;top:24px;width:100%}@media all{.gb1{height:22px;margin-right:.5em;vertical-align:top}#gbar{float:left}}a.gb1,a.gb4{text-decoration:underline !important}a.gb1,a.gb4{color:#00c !important}.gbi .gb4{color:#dd8e27 !important}.gbf .gb4{color:#900 !important} But I want it outputted like this:
      <!doctype html> <html itemscope="" itemtype="http://schema.org/WebPage" lang="en"> <head> <meta content="Search the world's information, including webpages, images, videos and more. Google has many special features to help you find exactly what you're looking for." name="description"> <meta content="noodp" name="robots"> <meta content="text/html; charset=UTF-8" http-equiv="Content-Type"> <meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"> <title>Google</title> <script> (function() { window.google = { kEI: 'DJtsdfgWGjwSE9JrICg', kEXPI: '18167,1354277,1354916,1355218,1355675,1355793,1356171,1356806,1357219,1357326,37sdfg0304,3700519,3700521,4003510,4029815,4031109,4043492,4045841,4048347,4081038,4081164,4095909,4096834,4097153,4097195,4097922,4097929,4098733,4098740,4098752,4102237,4102827,4103475,4103845,4106084,4107914,4109316,4109490,4112770,4113217,4115697,4116349,4116724,4116731,4116926,4116927,4116935,4117980,4118798,4119032,4119034,4119036,4120285,4120286,4120660,4121175,4121518,4122511,4123830,4123850,4124091,4124850,4125837,4126202,4126754,4126869,4127262,4127418,4127473,4127744,4127863,4128586,4128622,4129001,4129520,4129556,4129633,4130362,4130783,4131247,4131834,413sdfg56,4133114,4133509,4135025,4135088,4135249,4135934,4136073,4136092,4136137,4137597,4137646,4140792,4140849,4141281,4141707,4141915,4142071,4142328,4142420,4142443,4142503,4142678,4142729,4142829,4142834,4142847,4143278,4143527,4143902,4144442,4144550,4144704,4145074,4145075,4145082,4145088,4145461,4145485,4145622,4145688,4145713,4145836,4146146,4146183,4146874,4147032,4147043,4147096,4147443,4147800,4147951,4148257,4148304,4148436,4148498,4148573,6512220,10200083,10202524,10202562,15807763,19000288,190sdfg23,19000427,19001999,19002287,19002288,19002366,19002548,19002880,19003321,19003323,19003325,19003326,19003328,19003329,19003330,19003407,19003408,19003409,19004309,19004516,19004517,19004518,19004519,19004520,19004521,19004531,19004656,19004668,19004670,19004692,41317155', authuser: 0, kscs: 'c9c918f0_DJtTWvCOI6WGjwSE9JrICg', u: 'c9c918f0', kGL: 'US' }; google.kHL = 'en'; })(); ....... I checked the forums and did not see any UDFs that allow for this. I see the Chilkat UDF but that only supports JSON. Any help would be greatly appreciated.
    • islandspapand
      By islandspapand
      Hi all
      i am currently trying to click on an element in a HTML Table, but just can get it to work.
      i am able to click the top of the table so it changes to sort  but just can't click on the element in the table.
      an i need to click on element to continue in the site.
      i have attached the code so far and pictures of the table  element want to click plus the source of the table.
      i am able to get data in the table with $oTable = _IETableGetCollection($oIE, 2) but not able to click on them.
       
      Help is very much appreciated
       
      #cs ---------------------------------------------------------------------------- AutoIt Version: 3.3.14.2 Author: myName Script Function: Template AutoIt script. #ce ---------------------------------------------------------------------------- ; Script Start - Add your code below here #include <IE.au3> #include "DOM.au3" #include <Array.au3> #include <MsgBoxConstants.au3> Global $oIE = _IECreate("*") _IELoadWait($oIE) Sleep(2000) _PageLogin($oIE) _PageLoadWait() _PageNewReq($oIE) _PageLoadWait() _InputModelInf($oIE) _PageLoadWait() Sleep(1000) $aTableLink = BGe_IEGetDOMObjByXPathWithAttributes($oIE, "//table/tbody/tr/td[.='Name Of user']", 2000) ;~ $aTableLink = BGe_IEGetDOMObjByXPathWithAttributes($oIE, "//table/tbody/tr", 2000) ;~ _ArrayDisplay($aTableLink,"$aTableLink") If IsArray($aTableLink) Then ConsoleWrite("Able to BGe_IEGetDOMObjByXPathWithAttributes($oIE, //table/tbody/tr/td[.='Name Of user'])" & @CRLF) For $i = 0 To UBound($aTableLink)-1 ConsoleWrite(" OuterHTML : " & $aTableLink[$i].outerHTML & @CRLF) ConsoleWrite(" Parentnode : " & $aTableLink[$i].parentnode & @CRLF) ConsoleWrite(" Parentnode.click : " & $aTableLink[$i].parentnode.fireEvent("onclick","click") & @CRLF) $objClick = $aTableLink[$i].parentnode ;~ _IEAction($aTableLink[$i] , "focus") _IEAction($objClick , "focus") ;~ If _IEAction($aTableLink[$i], "click") Then If _IEAction($objClick, "click") Then ConsoleWrite("Able to _IEAction($aForumLink[0], 'click')" & @CRLF) _IELoadWait($oIE) Else ConsoleWrite("UNable to _IEAction($aForumLink[0], 'click')" & @CRLF) Exit 3 EndIf Next Else ConsoleWrite("Unable to BGe_IEGetDOMObjByXPathWithAttributes($oIE, //table/tbody/tr/td[.='Name Of user'])" & @CRLF) Exit 2 EndIf _PageLoadWait() Func _InputModelInf($oTmpIE) ; Add Var for Model & Serial in Func $oModelInput = _IEGetObjById($oTmpIE,"model") _IEAction($oModelInput,"focus") _IEDocInsertText($oModelInput, "*") $oSerialInput = _IEGetObjById($oTmpIE,"serial") _IEAction($oModelInput,"focus") _IEDocInsertText($oSerialInput, "*") $links = $oTmpIE.document.getElementsByClassName("btn btn-primary ng-scope") For $link In $links If $link.innertext = "Søg" Or $link.innertext = "Search" Then $link.click() ExitLoop EndIf Next Return True EndFunc Func _PageNewReq($oTmpIE) $links = $oTmpIE.document.getElementsByClassName("ng-scope k-link") For $link In $links If $link.innertext = "Send ny fejlmelding" Or $link.innertext = "Submit a New Service Request" Then $link.click() ExitLoop EndIf Next Return True EndFunc Func _PageLogin($oTmpIE) $oUserInput = _IEGetObjById($oTmpIE,"loginid") _IEDocInsertText($oUserInput, "*") $oPasswordInput = _IEGetObjById($oTmpIE,"password") _IEDocInsertText($oPasswordInput, "*") $links = $oTmpIE.document.getElementsByClassName("btn btn-primary login ng-scope") For $link In $links If $link.innertext = "Sign in" Then $link.click() ExitLoop EndIf Next Return True EndFunc Func _PageLoadWait() Local $PageLoadWait = False ;~ nav navbar-nav navbar-right ng-hide ;~ nav navbar-nav navbar-right $tags = $oIE.document.GetElementsByTagName("ul") For $tag in $tags $class_value = $tag.GetAttribute("class") If $class_value = "nav navbar-nav navbar-right" Then ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : Webpage loading :) ' & @CRLF) ;### Debug Console $PageLoadWait = True ExitLoop EndIf Next Do sleep(250) For $tag in $tags $class_value = $tag.GetAttribute("class") If $class_value = "nav navbar-nav navbar-right ng-hide" Then ConsoleWrite('@@ Debug(' & @ScriptLineNumber & ') : Webpage load finished :)'& @CRLF) ;### Debug Console $PageLoadWait = False ExitLoop EndIf Next Until $PageLoadWait = False EndFunc  
      Thanks in advance