Jump to content

Regex for HTML name= id= values


LAGuy
 Share

Recommended Posts

Hi everyone,

I'm enjoying AutoIt scripting and now see that I need to get up to speed on Parsing HTML on web pages to

find hyperlinks, images, buttons etc relative to some html text on a Page.

$SampleHTML = "<td><input id=""TARGETID"" type=""image"" style=''border-width: 0px;'' onclick=''BLOCKED SCRIPTreturn confirm('Are You Sure You Wish To Delete This Item?');'' alt=''Delete'' src=''/DNN462/images/delete.gif'' title=''Delete'' name=''TARGETNAME''/></td><td>GainUser</td>'

e.g. I would look for $searchtext = "<td>GainUser</td>" and then find the input tag with name= or id= immediately preceding it in the html.

So I could use function: StringInStr to find this text and then need to scan backwards for name= or id= to then click on the image.

So I would like to learn the REGEX to get the value of the name="..." or id="..."

Ideally I could also search for the html tag: <input tag

name="TARGETNAME" or id="TARGETID"

Many Thanks, LA Guy

Link to comment
Share on other sites

Why not just use _IETagNameGetCollection()?

Thanks for the tip !

Well, if I did that I wouldn't know which input tag was closest (immediately preceding ) to my target string <td>GainUser</td> in the html

e.g. I have a Table where I want to click the Icon1 ( input image ) immediately preceding GainUser

Icon1 ABCUser xxxx yyyy zzzz

Icon1 ABCUser xxxx yyyy zzzz

Icon1 GainUser xxxx yyyy zzzz

Icon1 ABCUser xxxx yyyy zzzz

So, can I do this with _IETagNameGetCollection() ?

Thanks, LA Guy

Link to comment
Share on other sites

I think you may be able to. Just get the collections of both input and td tags,search the returned td tag array for the string you want,and compare the position to the input array. Dale is the resident IE expert so if he sees this he may be able to help you more than I can, but I hope this helps at least send you in the right direction.

Link to comment
Share on other sites

The Document Object Model is hierarchical, so you can scope your element searches within other elements. There could be easier ways to do this with your page if some elements have names or ID's you can count on, but assuming that this is the first table on the page, the following should at least be close based on the information you gave:

#include <IE.au3>
$oIE = _IECreate(your-url)
$oTable = _IETableGetCollection($oIE, 0)
$oTRs = _IETagnameGetCollection($oTable, "tr")
For $oTR in $oTRs
    If StringInStr(_IEPropertyGet($oTR, "innertext"), "GainUser") Then
        $oImg = _IETagnameGetCollection($oTR, 0)
        _IEAction($oImg, "click")
        ExitLoop
    EndIf
Next

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

Thank you dbzFanatic and Dale for helping on a late Sunday Night / Mon am.

I like this approach below.

Since the web page has multiple table on it.

I would need another For loop to loop thru the table tags and find the table with a TR with a TD = GainUser.

So the code below is a good start.

Appreciate a few suggestions on the coding :)

#include <IE.au3>
$oIE = _IECreate(your-url)

$oTable = _IETableGetCollection($oIE, 0)
; Change to loop thru all tables 
...
$oTRs = _IETagnameGetCollection($oTable, "tr")
For $oTR in $oTRs  

; change to loop thru the TDs in each TR to find GainUser
...
    If StringInStr(_IEPropertyGet($oTR, "innertext"), "GainUser") Then
        $oImg = _IETagnameGetCollection($oTR, 0)
        _IEAction($oImg, "click")
        ExitLoop
    EndIf
Next

Dale

Thank you, LA Guy
Link to comment
Share on other sites

You actually don't have to scope to a single table. The following looks at all TRs on the page and just is not as efficient:

$oTRs = _IETagnameGetCollection($oIE, "tr")
For $oTR in $oTRs
    If String(_IEPropertyGet($oTR, "innertext")) = "GainUser"
        $oImg = _IETagnameGetCollection($oTR, 0)
        _IEAction($oImg, "click")
        ExitLoop
    EndIf
Next

This will click on the first IMG tag in the table row that contains the string GainUser.

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

Hi Dale,

Ok, this is quite an "elegant" solution ( doesn't have to be efficient since speed is not the issue )

I will try this out tomorrow.

This will be a nice technique for locating text in tables and finding the related links/images.

Again, thanks for all your help past midnight in Colorado !

Regards, LA Guy

You actually don't have to scope to a single table. The following looks at all TRs on the page and just is not as efficient:

$oTRs = _IETagnameGetCollection($oIE, "tr")
For $oTR in $oTRs
    If String(_IEPropertyGet($oTR, "innertext")) = "GainUser"
        $oImg = _IETagnameGetCollection($oTR, 0)
        _IEAction($oImg, "click")
        ExitLoop
    EndIf
Next

This will click on the first IMG tag in the table row that contains the string GainUser.

Dale

Link to comment
Share on other sites

Hi all,

I'm back. :)

The web page I'm processing has about 4 levels of nested Tables.

So Dale's code will not work because the TR is looping thru the outermost table.

I don't want to have to hardcode to know how many levels of nesting.

The search problem is that I can how multiple rows with the same input image with a embedded Delete Gif on each row and there is no ID or NAME attribute to find the image specifically.

If I search by Image src="delete.gif" , I also need to know that the adjacent td has GainUser in it.

I could also try an locate the input image tag and try to click that ( but I still need to search for the adjacent td somehow )

$SampleHTML = "<td><input id=""TARGETID"" type=""image"" style=''border-width: 0px;'' onclick=''BLOCKED SCRIPTreturn confirm('Are You Sure You Wish To Delete This Item?');'' alt=''Delete'' src=''/DNN462/images/delete.gif'' title=''Delete'' name=''TARGETNAME''/></td><td>GainUser</td>'

Any ideas ?

Thanks, LA Guy

Hi Dale,

Ok, this is quite an "elegant" solution ( doesn't have to be efficient since speed is not the issue )

I will try this out tomorrow.

This will be a nice technique for locating text in tables and finding the related links/images.

Again, thanks for all your help past midnight in Colorado !

Regards, LA Guy

Link to comment
Share on other sites

$SampleHTML = "<td><input id=""TARGETID"" type=""image"" style=''border-width: 0px;'' onclick=''BLOCKED SCRIPTreturn confirm('Are You Sure You Wish To Delete This Item?');'' alt=''Delete'' src=''/DNN462/images/delete.gif'' title=''Delete'' name=''TARGETNAME''/></td><td>GainUser</td>';; This is horrible by the way
$sSearch = "GainUser"
$id = ""
$aRegEx = StringRegExp($SampleHTML, "(?i)<input\s*(?:id|name)\s*=\s*\W*(\w+)\W*.*<td>" & $sSearch & "</td>", 1)
If IsArray($aRegEx ) Then $id = $aRegEx [0]
If $id Then MsgBox(0,"Test Return", "The return value is " & $id)

You can just replace the $SampleHTML line with

$SampleHTML = _InetGetSource("some url")

Edit: Added MsgBox for testing.

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

Too many variables involved to make a generalized solution using the DOM without seeing the page -- unless the following works for you... this adds a check to see if there are any nested tables in the current element and skips it if there are...

$oTRs = _IETagnameGetCollection($oIE, "tr")
For $oTR in $oTRs
    $oTables = _IETagnameGetCollection($oTR, "table")
    If @extended Then ContinueLoop ; if @extended > 0, there are nexted tables
    If String(_IEPropertyGet($oTR, "innertext")) = "GainUser"
        $oImg = _IETagnameGetCollection($oTR, 0)
        _IEAction($oImg, "click")
        ExitLoop
    EndIf
Next

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

Thanks Dale and Geosoft,

Hi all,

Hope everyone is well and getting together with family and friends !

I'm almost there with all the wonderful help here ! :)

I'm learning DOM navigation from you guys the AutoIT way !

Recap: I'm trying to click on the input image ( to Delete a row ) in a web page table

but I don't want to use ID / NAME references.

I scan each TR row for a TD with a UserRoleName match and then try to click

on the Delete input image in the first TD of that row.

BUT, the input image click logic below is picking the first TR in the table and not the

row with the matching userrole name.

I had to loop thru the TDs of the row I found to drill down into the TDs.

Thanks for your help !

Regards, LA Guy

; ==== Check if the UserRole is added for this user on the page

$sHTML = _IEBodyReadHTML ($oIE) ; get all the HTML

$userroletd = "<td>" & $DNNAddUserRole & "</td>" ; search for this

$pos = StringInStr ( $sHTML, $userroletd ) ; Test for HTML match

if ( $pos > 0 ) Then ; this is the right page

; loop thru all TRs on the innermost TABLE

$oTRs = _IETagnameGetCollection($oIE, "tr")

For $oTR in $oTRs

$oTables = _IETagnameGetCollection($oTR, "table")

If @extended Then ContinueLoop ; if @extended > 0, there are nested tables

; Search for a TR with <td>GainUser</td>

If StringInStr( _IEPropertyGet($oTR, "innerhtml"), $userroletd ) > 0 then

$oTDs = _IETagnameGetCollection($oTR, "td")

for $oTD in $oTDs

; get the first TD and click on the input image

MsgBox(0, "TD info", "Tagname: " & $oTD.tagname & @CR & "innerhtml: " & $oTD.innerhtml)

_IEFormImageClick($oTD, "delete.gif", "src") ; clicks on the very first TR TD Delete.Gif in the TABLE ( not the right one ! )

;; _IEImgClick ($oTD, "delete.gif", "src") ; doesn't workk

exitloop ; TD loop

next

ExitLoop ; TR loop

EndIf

Next

HTML here:

<table cellspacing="0" cellpadding="4" border="0" summary="Security Roles Design Table" border="0" id="LongLongID" style="border-width:0px;border-style:None;width:100%;border-collapse:collapse;">

<tr class="NormalBold">

<td>&nbsp;</td><td>Security Role</td><td>Effective Date</td><td>Expiry Date</td>

</tr><tr class="Normal">

<td>

<!-- [DNN-4285] Hide the button if the user cannot be removed from the role -->

</td><td>Registered Users</td><td>

<span id="LongLongID_ctl02_Label2" class="Normal" name="Label1"></span>

</td><td>

<span id="LongLongID_ctl02_Label1" class="Normal" name="Label1"></span>

</td>

</tr><tr class="Normal">

<td>

<!-- [DNN-4285] Hide the button if the user cannot be removed from the role -->

<input type="image" name="LongLongNAME$ctl03$cmdDeleteUserRole" id="LongLongID_ctl03_cmdDeleteUserRole" title="Delete" src="/DNN462/images/delete.gif" alt="Delete" onclick="java script:return confirm('Are You Sure You Wish To Delete This Item?');" style="border-width:0px;" />

</td><td>VocationalAssessmentUser</td><td>

<span id="LongLongID_ctl03_Label2" class="Normal" name="Label1"></span>

</td><td>

<span id="LongLongID_ctl03_Label1" class="Normal" name="Label1"></span>

</td>

</tr><tr class="Normal">

<td>

<!-- [DNN-4285] Hide the button if the user cannot be removed from the role -->

<input type="image" name="LongLongNAME$ctl04$cmdDeleteUserRole" id="LongLongID_ctl04_cmdDeleteUserRole" title="Delete" src="/DNN462/images/delete.gif" alt="Delete" onclick="java script:return confirm('Are You Sure You Wish To Delete This Item?');" style="border-width:0px;" />

</td><td>GainUser</td><td>

<span id="LongLongID_ctl04_Label2" class="Normal" name="Label1"></span>

</td><td>

<span id="LongLongID_ctl04_Label1" class="Normal" name="Label1"></span>

</td>

</tr>

</table>

Link to comment
Share on other sites

The first part is easy enough, so lets take it a step at a time.

Replace

$sHTML = _IEBodyReadHTML ($oIE); get all the HTML
$userroletd = "<td>" & $DNNAddUserRole & "</td>"; search for this
$pos = StringInStr ( $sHTML, $userroletd ); Test for HTML match
if ( $pos > 0 ) Then; this is the right page

With

$sHTML = _IEBodyReadHTML ($oIE); get all the HTML
If StringRegExp($sHtml,"(?i)<td>" & $DNNAddUserRole & "</td>") Then; this is the right page
<more code here>; This will be stage 2 Which we will work at next
EndIf

I'm a bit slow today (okay, most days) so just give it to us in stages. What will stage 2 need?

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

Hi Geosoft !

Thankfully, Stage 1 is already working ! but your StringRegExp is much more elegant

Stage 2: Problem: Image Click is happening on the first input image in the table and not the one I want !

Once, I find a TR row with the matching html, I want to click on the Image in the first TD of that matching row.

The code below is not working :)

_IEFormImageClick($oTD, "delete.gif", "src")

Isn't this supposed to find the input image inside the current TD tag on the current TR row ?

Here is the TD

<td>
<!-- [DNN-4285] Hide the button if the user cannot be removed from the role --> 
<input type="image" name="LongLongNAME$ctl04$cmdDeleteUserRole" id="LongLongID_ctl04_cmdDeleteUserRole" title="Delete" src="/DNN462/images/delete.gif" alt="Delete" onclick="java script:return confirm('Are You Sure You Wish To Delete This Item?');" style="border-width:0px;" />
</td><td>GainUser</td><td>

Here's the code again:

loop thru all TRs on the innermost TABLE
$oTRs = _IETagnameGetCollection($oIE, "tr")
For $oTR in $oTRs
$oTables = _IETagnameGetCollection($oTR, "table")
If @extended Then ContinueLoop; if @extended > 0, there are nested tables

; Search for a TR with <td>GainUser</td> 
If StringInStr( _IEPropertyGet($oTR, "innerhtml"), $userroletd ) > 0 then
$oTDs = _IETagnameGetCollection($oTR, "td")
for $oTD in $oTDs
; get the first TD and click on the input image
MsgBox(0, "TD info", "Tagname: " & $oTD.tagname & @CR & "innerhtml: " & $oTD.innerhtml)

_IEFormImageClick($oTD, "delete.gif", "src"); clicks on the very first TR TD Delete.Gif in the TABLE ( not the right one ! )

exitloop; TD loop
next
Link to comment
Share on other sites

Ok, here is the confusion... _IEFormImageClick works in a document context... from help:

$o_object Object variable of any DOM element (will be converted to the associated document object)

For you, do this instead:

$oInput = _IETagnameGetCollection($oTD, "input", 0)

_IEAction($oInput, "click")

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

Okay, it looks like Dale beat me to it but if there is a need to get the actual Image name associated with $DNNAddUserRole so that you can use _IEImgClick() Then I also have that RegExp ready to go.

EDIT: Actually I'll give it to you anyway Just so I don't lose it. A bit ugly but it works.

$sHTML = _IEBodyReadHTML ($oIE); get all the HTML
$aRegEx = StringRegExp($sHTML, "(?i)<input type=\s?\W?image\W?.*name=.?\b(.*)\b.{2}?id=.*(?:\r\n|\r|\n)?.*?<td>\b" & $DNNAddUserRole & "\b</td>", 1)
If IsArray($aRegEx) Then
  ;;Array[0] will contain the image name associated with the input in the <td>
  ;; just prior to the one which contains $DNNAddUserRole, Tested with "VocationalAssessmentUser" as $DNNAddUserRole
   _IEImgClick ($oIE, $aRegEx[0], "Name");; This not tested for obvious reasons.
EndIf

Dale can correct the _IEImgClick() if it's wrong but I think it's fine.

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

Ok, here is the confusion... _IEFormImageClick works in a document context... from help:

$o_object Object variable of any DOM element (will be converted to the associated document object)

For you, do this instead:

$oInput = _IETagnameGetCollection($oTD, "input", 0)

_IEAction($oInput, "click")

Dale

Hi Dale,

Thanks so much for your help ! :)

That did the trick ( clicking on the Delete Image ).

But I then had to automate clicking the Delete Confirmation Javascript Alert Box, so I set focus to the Image and hit the Enter Key.

Is this the best way to do this ? ( lots of code )

$oInput = _IETagnameGetCollection($oTD, "input", 0)
;;;;;   _IEAction($oInput, "click"); we need to set focus and hit enter key to trap the javascript Alert Box 
            
        $hwnd = _IEPropertyGet($oIE, "hwnd")
        _IEAction ($oInput, "focus")
        ControlSend($hwnd, "", "[CLASS:Internet Explorer_Server; INSTANCE:1]", "{Enter}")

; Alert Box:  Windows Internet Explorer - Are you sure you want to delete this item ?
; Wait for Javascript Alert window, then click on OK
        $alerttext = "Are You Sure You Wish To Delete This Item?"
        WinWaitActive("Windows Internet Explorer", $alerttext,3)
        ControlClick("Windows Internet Explorer", $alerttext, "[CLASS:Button; TEXT:OK; Instance:1;]")
        
        _IELoadWait ($oIE, 1000);wait 1 sec

Thanks again, LA Guy

Edited by LAGuy
Link to comment
Share on other sites

Is this the best way to do this ?

Unfortunately it is the best way I know of. You have to get yourself out of the DOM execution thread as it is waiting for the alert to be precoeesed before returning.

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...