Sign in to follow this  
Followers 0
werd

Regex help

7 posts in this topic

#1 ·  Posted (edited)

Total regex newbie here. Been banging my head on this for a while and would appreciate any help or guidance...

I have a snippet of html that I'm trying to parse for the 'id=' values (alphanumeric). Here's the html:

<TD id=AB029E vAlign=top><WBR></WBR><SPAN id=WD029F-r class=aBarItmBtn cancollapse="false" show="true"><A onkeydown="return me.ur_Button_keypress(event);" id=AF00dF class=urBtnStd onclick="return me.ur_Button_click(event);" href="javascript:void(0);" ct="B" st="" ocl="return sapWD_Standard_Button_click(this,event);">Button1Text</A></SPAN><WBR></WBR><SPAN id=WD02A0 class=urTSep cancollapse="false" show="true"></SPAN><WBR></WBR><SPAN id=WD02A1-r class=urTbarItmBtn cancollapse="false" show="true"><A onkeydown="return me.ur_Button_keypress(event);" id=WD02A1 class=urBtnStd title="Complete the data input for a single evidence in expense settlement and reflect it in the Detail List." onclick="return me.ur_Button_click(event);" href="javascript:void(0);" ct="B" st="" ocl="return sapWD_Standard_Button_click(this,event);">Button2Text</A></SPAN><WBR></WBR><SPAN id=WD02A2-r class=urTbarItmBtn etc...

So, I'm trying to capture the 6-digit (alphanumeric) id tag value associated with "Button2Text", which, in this case, would be "WD02A1". This is the "id=" that first precedes the "Button2Text" string, but my attempt of creating a RegEx pattern so far is:

\bid=([[:alnum:]]{6,6})\b(?:[[:print:]]{1,999})>Button2Text</A>

However, this captures the first "id=" value which is AB029E. Since I'm trying to create a generalized function to capture "id=" values, I can't be certain what other "tags" (eg. onclick=, title=, ct=, ocl=) are going to appear. Any help is appreciated. Thanks in advance.

Edited by werd

Share this post


Link to post
Share on other sites



Here my version:

$html = '<TD id=AB029E vAlign=top><WBR></WBR><SPAN id=WD029F-r class=aBarItmBtn cancollapse="false" show="true"><A onkeydown="return me.ur_Button_keypress(event);" id=AF00dF class=urBtnStd onclick="return me.ur_Button_click(event);" href="javascript:void(0);" ct="B" st="" ocl="return sapWD_Standard_Button_click(this,event);">Button1Text</A></SPAN><WBR></WBR><SPAN id=WD02A0 class=urTSep cancollapse="false" show="true"></SPAN><WBR></WBR><SPAN id=WD02A1-r class=urTbarItmBtn cancollapse="false" show="true"><A onkeydown="return me.ur_Button_keypress(event);" id=WD02A1 class=urBtnStd title="Complete the data input for a single evidence in expense settlement and reflect it in the Detail List." onclick="return me.ur_Button_click(event);" href="javascript:void(0);" ct="B" st="" ocl="return sapWD_Standard_Button_click(this,event);">Button2Text</A></SPAN><WBR></WBR><SPAN id=WD02A2-r class=urTbarItmBt'
$chk = "Button2Text"
$aString = StringRegExp($html, ".*id=(.*) class.*>" & $chk & "<.*", 3)
MsgBox(0, "Test", $aString[0])

I'm not sure whether I understood you properly.

Br,

UEZ


Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

To Capture only that id in an array

$aSRE = StringRegExp($sHTML, "(?i)id=[\x22\x27\s]*([[:alnum:]]{6}).+Button2Text", 1)
If NOT @Error Then
    MsgBox(4096, "Result, $aSRE[0])
EndIf

To capture all of the ids in an array

#Include<array.au3>
$aSRE = StringRegExp($sHTML, "(?i)id=[\x22\x27\s]*([[:alnum:]]{6}).+?</a>", 3)
If NOT @Error Then
    _ArrayDisplay($aSRE, "Results")
EndIf

To return a string with only that id

$sStr = StringRegExpReplace($sHTML, "(?i).*id=[\x22\x27\s]*([[:alnum:]]{6}).+?Button2Text.+", "$1")
If NOT @Error Then
    MsgBox(4096, "Result", $sStr)
Else
    MsgBox(4096, "Error", "Error: " & @Error & @CRLF & "Extended: " & @Extended)
EndIf

All were tested using your input string and the PCRE Toolkit in my signature so if you get any errors it's because you didn't give us the right information.

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

thank you all for the input. i'll need to review and learn from this... This regex syntax is quite confusing for me, but i appreciate your helpful suggestions.

Share this post


Link to post
Share on other sites

thank you all for the input. i'll need to review and learn from this... This regex syntax is quite confusing for me, but i appreciate your helpful suggestions.

No problem

Try the tool I mentioned above for learning, it's in my signature. You will be able to get the source of a URL and work with it directly.


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

[deleted]

Edited by werd

Share this post


Link to post
Share on other sites

That is going about it the hard way.

Parsing HTML to get attributes of various elements is what the browser if for. Let it do all the work for you with the IE.au3 UDF:

$colLinks = _IELinkGetCollection($oIE)
For $oLink In $colLinks
    If $oLink.innerText & "" = "Button2Text" Then
        MsgBox(64, "Found", "Button2Text ID = " & $oLink.id)
        ExitLoop
    EndIf
Next

:x


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0