Jump to content
Sign in to follow this  
sammy1983

String between <a> tag

Recommended Posts

sammy1983

Hi,

Below is the "a" tag

<a title="Service Request Details" href="URL.do?srIndex=R0&amp;srId=10847908&amp;uniqueId=1395005125122-1747401740-SrSearchResults&amp;_flowId=ServiceRequestUpdateFlow&amp;_eventId=getServiceRequest&amp;c=csit_key%3Apq3ZMo6qFZF78kDBz%2BJDvDhz18Y%3D&amp;l=srIndex:srId:_flowId:_eventId:u:s">10847908</a>

I want to extract just "10847908". Honestly, I am very poor in StringRegExp. Tried going thru Tutorial (http://www.codeproject.com/Articles/9099/The-Minute-Regex-Tutorial) and using Expresso but couldn't get. Upon clicking the link and landing to next page, this number (10847908) is available and I am able to extract it using the below code:

$sHTML = _IEDocReadHTML($oIE)
$SR = StringRegExp($sHTML, "(\d{8})", 3)
$SRnumber = $SR[0]

However, I don't want to go to the next page jus to extract this. Can anyone help me? Thanks.

Share this post


Link to post
Share on other sites
jchd

Like this?

Local $string = '<a title="Service Request Details" href="URL.do?srIndex=R0&amp;srId=10847908&amp;uniqueId=1395005125122-1747401740-SrSearchResults&amp;_flowId=ServiceRequestUpdateFlow&amp;_eventId=getServiceRequest&amp;c=csit_key%3Apq3ZMo6qFZF78kDBz%2BJDvDhz18Y%3D&amp;l=srIndex:srId:_flowId:_eventId:u:s">10847908</a>'
Local $value = StringRegExp($string, ">(\d+)</a>", 3)
If Not @error Then ConsoleWrite($value[0] & @LF)

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
sammy1983

I tried this further:

$sHTML = _IEDocReadHTML($oIE)
$SR = StringRegExp($sHTML, '(?i)u:s">\d(.*?)</(?i)a>', 3)
For $b = 1 To UBound($SR) - 1
    $Test = $SR[$b]
    MsgBox(0, "", $Test)
Next
Exit

But the output is 0847908. The first digit is missing. Any idea? I am still checking myself as well.

Share this post


Link to post
Share on other sites
jchd

Because in your pattern d is not in the capture group. So the first digit is matched and discarded and the remaining digits are captured, which is what you observe.

Did my sample code fail?


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
jchd

The second (?i) is redundant. Case insensivity remains in effect until eventually negated later in the pattern, something that you can do with (?-i).


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
sammy1983

The second (?i) is redundant. Case insensivity remains in effect until eventually negated later in the pattern, something that you can do with (?-i).

 

I'll keep this in mind henceforth. Many thanks again but how to learn this StringRegExp? This is very useful but somehow I not understanding the logic. Might be I have to keep exploring it. Just bcoz I don't know how to use it, I am ignoring it. Will try going forward.  :thumbsup:

Share this post


Link to post
Share on other sites
jchd

Try links in my signature. Enjoy and keep on learning! Regexp are very powerful when used smartly but can turn into a maintainance nightmare if overused.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
jchd

Much preferable indeed.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
sammy1983

 

As _IEDocReadHTML is used, then why not use the _IE* dedicated funcs ?

Local $sTxt = "", $oLinks = _IELinkGetCollection($oIE)
For $oLink In $oLinks
    $text = Number($oLink.innertext)
    If $text > 0 Then $sTxt &= $text & @CRLF
Next
MsgBox(0, "", $sTxt)

 

Hi Mikell,

Somehow that didn't work. I received empty msgbox.

Share this post


Link to post
Share on other sites
sammy1983

jchd, thanks for solving my problem with digits. How about the below one?

<a title="Customer Account Details" href="URL?summary=true&amp;userName=account_credentials|10005668_v2:2cb34a98f78018943c2b0cb8180604b9&amp;c=csit_key_0%3ANJ38ktbqUGX2eXd0f51rcvKCfLs%3D&amp;l=summary:userName:u:s">Samuel Jeyaseelan</a>

Share this post


Link to post
Share on other sites
jchd

This should work with digits as well as any text:

Local $strings = [ _
    '<a title="Service Request Details" href="URL.do?srIndex=R0&amp;srId=10847908&amp;uniqueId=1395005125122-1747401740-SrSearchResults&amp;_flowId=ServiceRequestUpdateFlow&amp;_eventId=getServiceRequest&amp;c=csit_key%3Apq3ZMo6qFZF78kDBz%2BJDvDhz18Y%3D&amp;l=srIndex:srId:_flowId:_eventId:u:s">10847908</a>', _
    '<a title="Customer Account Details" href="URL?summary=true&amp;userName=account_credentials|10005668_v2:2cb34a98f78018943c2b0cb8180604b9&amp;c=csit_key_0%3ANJ38ktbqUGX2eXd0f51rcvKCfLs%3D&amp;l=summary:userName:u:s">Samuel Jeyaseelan</a>', _
    '<a title="Customer Account Details" href="URL?summary=true&amp;userName=account_credentials|10005668_v2:2cb34a98f78018943c2b0cb8180604b9&amp;c=csit_key_0%3ANJ38ktbqUGX2eXd0f51rcvKCfLs%3D&amp;l=summary:userName:u:s"></a>', _
    '<a title="Customer Account Details" href="URL?summary=true&amp;userName=account_credentials|10005668_v2:2cb34a98f78018943c2b0cb8180604b9&amp;c=csit_key_0%3ANJ38ktbqUGX2eXd0f51rcvKCfLs%3D&amp;l=summary">Samuel Jeyaseelan</a>' _
]
Local $values[UBound($strings)][3], $res
For $i = 0 To UBound($strings) - 1
    $res = StringRegExp($strings[$i], '(?i):u:s">(.*?)(?=</a>)', 3)
    If @error Then
        $values[$i][1] = False
        $values[$i][2] = ""
    Else
        $values[$i][1] = True
        $values[$i][2] = $res[0]
    EndIf
    $values[$i][0] = $strings[$i]
Next
_ArrayDisplay($values, "Matched values", Default, Default, Default, "Subject string|Valid|Value found")
Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
mikell

This should work with any string/number

Local $string = '<a title="Customer Account Details" href="URL?summary=true&amp;userName=account_credentials|10005668_v2:2cb34a98f78018943c2b0cb8180604b9&amp;c=csit_key_0%3ANJ38ktbqUGX2eXd0f51rcvKCfLs%3D&amp;l=summary:userName:u:s">Samuel Jeyaseelan</a>'
Local $value = StringRegExp($string, ">([^<]+)</a>", 3)
Msgbox(0,"", $value[0])

But this one too

Local $sTxt = "", $oLinks = _IELinkGetCollection($oIE)
For $oLink In $oLinks
    $sTxt &= $oLink.innertext & @CRLF
Next
MsgBox(0, "", $sTxt)

Edit

jchd already answered :)

Edited by mikell

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×