Sign in to follow this  
Followers 0
molotofc

Searching for a string, after the string

31 posts in this topic

Dear all,

I'm not sure if I can explain this correctly.

Take this for example: http://www.bbc.co.uk/weather/2643743

On this webpage there is a part which says 'Sunrise' and then it gives the time of sunrise (at the moment 07:43). The sunrise value changes everyday obviously. I've not tried it but StringInStr should have no problem in finding the word 'Sunrise', but not for values that changes. What's the easiest and quickest way to find and return the sunrise value, or any other 'string of varying value'?

I think in VBA you can do something like this; not sure if AutoIt can.

Share this post


Link to post
Share on other sites



Lookup StringRegExp, especially the version that comes with the beta.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Because in the beta, there is a brand new and pretty niiiice help page for StringRegExp :)

$code = BinaryToString(InetRead("http://www.bbc.co.uk/weather/2643743"))

$res = StringRegExpReplace($code, '(?s).+>Sunrise\h*([^<]+).+', "$1")
Msgbox(0,"", "Sunrise at :  " & $res)

Share this post


Link to post
Share on other sites

abberration,
The sunrise thing was just for example, the real purpose is to find an unknown string which follows a known string
In this case the regular expression finds the characters different from '<' following the string '>Sunrise '

Share this post


Link to post
Share on other sites

That's cool. I suggested that because I have seen people ask questions for something specific because they did not know an another way was possible.


RAID Calculator | Software Installer

The truth has been suppressed since the dawn of time.

Share this post


Link to post
Share on other sites

Indeed it was just an example, but thanks for trying anyway :)

mikell: that works great! but could you please explain: 

$res = StringRegExpReplace($code, '(?s).+>Sunrise\h*([^<]+).+', "$1")

how exactly does it specify the next word to return - and if I want to go further, how could I look for the next next word?

Thank you

Share this post


Link to post
Share on other sites

Its this bit:  ([^<]+)  which is captured in backreference variable $1

the meaning is:

Match the regular expression below and capture its match into backreference number 1 «([^<]+)»
   Match any character that is NOT a “<” «[^<]+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match any single character «.+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

Share this post


Link to post
Share on other sites

mikell: that works great! but could you please explain: 

$res = StringRegExpReplace($code, '(?s).+>Sunrise\h*([^<]+).+', "$1")

how exactly does it specify the next word to return - and if I want to go further, how could I look for the next next word?

Thank you

 

For this the concerned helpfile page becomes your best friend as there are several ways to build a regular expression :)

In this particular case the string you want to get is matched by the part of expression between parentheses

>Sunriseh* : the string from which the search begins, meaning '>Sunrise' followed by 0 or more horizontal white spaces

[^<]+ means : "one or more characters which are not the '<' character " , this definition obviously includes the digits and the colon and allows the search to stop at the first '<' character found

So you understand that the expression should probably need to be adapted if you need to match something different

Share this post


Link to post
Share on other sites

Hi all,

This thread is looking at a problem that seems very similar to one I'm trying to solve myself at the moment. Apologies if this is a hijack but perhaps this discussion could also help the original poster.

I'm brand new to AutoIt but seem to have got to grips with it pretty quickly, I'm really pleased to see such an active user base.

Ok so in my work we buy client information from a third-party, this third-party are market leaders and unfortunately they know it, consequently their site hasn't been enhanced in over 10 years and whilst it works fine, due to the amount of data we buy from them an API would be a great help, currently we have to manually enter this data and that's a complete waste of time for the member of staff doing it, so I've been trying to automate it.

So what happens is the provider email us a snippet of the customer data we've bought, the full data is on their site which we have to log in to view and then copy to our DB.

My solution so far uses a VBA macro in Outlook to read the unique ID of this 'snippet' email and write it to a text file then call the AutoIt executable which loads up IE, logs-in, navigates to the customer data we want by reading the unique ID from the text file and then clicking the link that matches the ID. So far so good.

However the full customer data is displayed in a human readable format and not so great for programmatic integration.

Luckily for each page containing the customer date the tags either side are static, here's an example;

<span id="NameLabel">Mr Joe Bloggs</span>

What I need to do is read the first and last name and write that to a file, I don't need the title. Obviously the first and last name can vary in length.

I have been looking at StringRegExp and believe the answer lies there. So I guess I need to find the first group of characters after the first white-space and before the second white-space, then the second group of characters after the second white-space and before the '<' character.

Another problem could potentially be that the title will differ e.g. Mr/Mrs/Miss/Dr./Father etc. so perhaps it needs to find the '<span id="NameLabel">' then ignore the characters after this string until the first white-space?

Any and all help graciously received.

John.

Share this post


Link to post
Share on other sites

While grabbing data thru regexps is a possibility, it won't be as robust as doing the same with the _IE* functions, which i strongly recommend.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

Using regex the most secure way would be here to grab the full identity string

$res = StringRegExpReplace($code, '(?s).+<span id="NameLabel">([^<]+).+', "$1")

then parse the result with String* funcs , StringSplit or so, to prevent possibles errors with names like "Sir John William Smith", "Mrs Dr. H. Smith" etc

Share this post


Link to post
Share on other sites

Thanks guys. I haven't had time to return to this project just yet but I'm hoping to have some time for it later today. Jchd, regarding the _IE* actions, I was looking at these functions but couldn't find anything that would specifically read what I needed from the page, probably my lack of experience so I'll read up on those sections again but thanks also mikell for providing that line of code.

Share this post


Link to post
Share on other sites

#include <IE.au3>
$oIE = _IECreate("http://www.bbc.co.uk/weather/2643743")

 ConsoleWrite(_getText('span', 'sunrise') & @CRLF)
 ConsoleWrite(_getText('span', 'sunset') & @CRLF)

Func _getText($tag = 'div', $className = '')
    Local $oCorrectObj = ''
    Local $tags = $oIE.document.GetElementsByTagName($tag)
    For $tag In $tags
;~  $class_value = $tag.GetAttribute("class")
        $class_value = $tag.className
        If String($class_value) = $className Then
            $oCorrectObj = $tag
;~      MsgBox(0, "Level: ", "Level found :)")
            ExitLoop
        EndIf
    Next
    If IsObj($oCorrectObj) Then
        Return _IEPropertyGet($oCorrectObj, "innertext")
    EndIf
    Return -1
EndFunc   ;==>_getText


Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites

Whatever how you do it the _IE* way is (on my pc) minimum about 12 times slower than the InetRead+regex one ^^

Share this post


Link to post
Share on other sites

But it also works. :-) It was just a try to get it done.


Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites

Whatever how you do it the _IE* way is (on my pc) minimum about 12 times slower than the InetRead+regex one ^^

True but the _IE* functions are immune to meaningless changes in the html flow, each of them easily breaking regexp-based code.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites

stringinstring & c. based

$source = "On this webpage there is a part which says 'Sunrise' and then it gives the time of sunrise (at the moment 07:43). The sunrise value changes everyday obviously."
$substring = 'moment'
ConsoleWrite('The word after "' & $substring & '" is this -->' & @TAB & StringMid($source, (StringInStr($source, $substring) + StringLen($substring) + 1), StringInStr($source, " ", 0, 2, StringInStr($source, $substring) + StringLen($substring)) - (StringInStr($source, $substring) + StringLen($substring))) & @CRLF)

just a proof of concept


small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Share this post


Link to post
Share on other sites

#19 ·  Posted (edited)

If you use the UDF in my sig:

$oIE = _IECreate("http://www.bbc.co.uk/weather/2643743", 0, 0)
_IELoadWait($oIE)
$aSunrise = BGe_IEGetDOMObjByXPathWithAttributes($oIE,"//span[@class='sunrise']")
$sTime = StringRegExpReplace($aSunrise[0].innertext,"[^\d:]","")
ConsoleWrite($sTime & @CRLF)
_IEQuit($oIE)

or, with just ie.au3:

#include <ie.au3>
$oIE = _IECreate("http://www.bbc.co.uk/weather/2643743", 0, 1)
_IELoadWait($oIE)

$oSpans = _IETagNameGetCollection($oIE, "span")
For $oSpan In $oSpans
    If String($oSpan.className) = "sunrise" Then
        $sFullText = String($oSpan.innertext)
        ConsoleWrite($sFullText & @CRLF)
        $sTime = StringRegExpReplace($sFullText,"[^\d:]","")
        ConsoleWrite($sTime & @CRLF)
        ExitLoop
    EndIf
Next
_IEQuit($oIE)

haha, just noticed an example was already provided...

Edited by jdelaney

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

True but the _IE* functions are immune to meaningless changes in the html flow, each of them easily breaking regexp-based code.

 

jc, if such changes are likely to occur I can hardly imagine any really immune solution <_<

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0