benners

Improving my StringRegExp

3 posts in this topic

#1 ·  Posted

I am trying to create a link to a web page from properties in the web pages source, following on from my thread here. Danp2 suggested another method which I have taken on board.

I just need help in refining the Regexp. The code below produces an array of matches and I return either the first or second depending on the method used. What I want to do is try and  change StringRegExp with StringRegExpReplace and then return it so I am not relying on specifying array elements

There are multiple matches for the regex pattern and currently they are all the same, so I could just return the first found match. I can narrow the matches down by changing the pattern so it returns only one result which should be the correct one. Is this possible to achieve with StringRegExpReplace?

#include <Inet.au3>
#include <Array.au3>

Local $s_Title = 'Update for Microsoft Office 2010 (KB3114555) 64-Bit Edition'
MsgBox(0, 'Url', Web_MUCGetUpdateDetailsUrl($s_Title))

Func Web_MUCGetUpdateDetailsUrl($s_Title)
    Local $s_Pattern = '[[:xdigit:]]{8}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{12}'
    Local $s_Source = _INetGetSource('https://catalog.update.microsoft.com/v7/site/Search.aspx?q=' & $s_Title)
    If @error Then Return SetError(@error, 1, '')

    ; i can get the first instance of the match with this, there are multiple matches for the pattern
    ; and they are currently all the same. (This maybe changed by MS) so I can just get the first instance
;~  Local $as_SRE = StringRegExp($s_Source, $s_Pattern, 3) ; returns multiple matches all the same
    Local $as_SRE = StringRegExp($s_Source, $s_Pattern, 1)

    ; to be sure it is the correct string I am better using this which returns 2 matches. 1 ($as_SRE[0]) is the only match
    ; on the page and the second ($as_SRE[1]) is the string I require
;~  Local $as_SRE = StringRegExp($s_Source, '(onclick=''goToDetails\("' & '(' & $s_Pattern & ')' & '"\))', 1)

    If Not IsArray($as_SRE) Then Return SetError(@error, 2, '')

    _ArrayDisplay($as_SRE)

    Return 'https://catalog.update.microsoft.com/v7/site/ScopedViewInline.aspx?updateid=' & $as_SRE[0]
;~  Return 'https://catalog.update.microsoft.com/v7/site/ScopedViewInline.aspx?updateid=' & $as_SRE[1]
EndFunc   ;==>Web_MUCGetUpdateDetailsUrl

 

Share this post


Link to post
Share on other sites



#2 ·  Posted

You can try this (untested)

$s_SRE = StringRegExpReplace($s_Source, '(?s).*?(' & $s_Pattern & ').*', "$1")

 

1 person likes this

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Well it's tested now and works. I altered it slightly to include the onclick portion and use the 2nd capture group

#include <Inet.au3>
#include <Array.au3>

Local $s_Title = 'Update for Microsoft Office 2010 (KB3114555) 64-Bit Edition'
MsgBox(0, '', Web_MUCGetUpdateDetailsUrl($s_Title))

Func Web_MUCGetUpdateDetailsUrl($s_Title)
    Local $s_Pattern = _
            'onclick=''goToDetails\("' & _ ; 1st capture
            '([[:xdigit:]]{8}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{4}-[[:xdigit:]]{12})' & _ ; 2nd capture
            '"\)'

    Local $s_Source = _INetGetSource('https://catalog.update.microsoft.com/v7/site/Search.aspx?q=' & $s_Title)
    If @error Then Return SetError(@error, 1, '')

    Return 'http://www.catalog.update.microsoft.com/ScopedViewInline.aspx?updateid=' & _
    StringRegExpReplace($s_Source, '(?s).*?(' & $s_Pattern & ').*', "$2")
EndFunc   ;==>Web_MUCGetUpdateDetailsUrl

Cheers mikell

Edited by benners

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now