Jump to content

Searching for a string, after the string


 Share

Recommended Posts

You're misled by your assumptions. A new version of some server component or a parameter change can result in insertion or removal of a number of whitespace characters in the flow, at about every point where they are allowed (yet meaningless) but uncommon. That breaks regexps, unless you insert [hs]* in UTF mode almost everywhere in your patterns, without ever missing one place where they can appear.

I had a few live examples of web services which would provide differing versions almost everytime you made the same request, even several times in a row within the same session. While the active content was exactly identical, the flow was incredibly different, with sometimes dozens or hundreds of spaces, linefeeds, tabs or other meaningless allowed whitespaces at some place where another run had nothing at all.

I was totally unaware such thing could happen and had to change hundreds of regexps to keep things working. IE functions are completely immune to such behavior.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

an easy way to find the word after (and also word before if needed) within the whole text content of the web page

example:

#include <IE.au3>
Local $oIE = _IECreate("http://www.bbc.co.uk/weather/2643743")
Local $source = _IEBodyReadText($oIE) ; retrieves the whole text content of the web page
_IEQuit($oIE)

$substring = "Sunset" ; <-- change here the word of the reference
$x = _WordAfter($source, $substring)
MsgBox(0, "Word after " & $substring, $x)

Func _WordBefore($source, $substring) ; returns the word before
    Local $a = StringInStr($source, $substring) ; position of character where start the found string
    ; Local $b = $a + StringLen($substring) ; position of last character of searched string
    Local $c = StringInStr($source, " ", 0, -2, $a) ; position before the starting of word before (space before)
    Local $d = StringMid($source, $c + 1, $a - $c - 2) ; word before
    Return $d
EndFunc   ;==>_WordBefore

Func _WordAfter($source, $substring) ; returns the word ofter
    Local $a = StringInStr($source, $substring) ; position of character where start the found string
    Local $b = $a + StringLen($substring) ; position of last character of searched string
    Local $c = StringInStr($source, " ", 0, 2, $b) ; position beyond the word after (second space)
    Local $d = StringMid($source, $b + 1, $c - $b - 1) ; word after
    Return $d
EndFunc   ;==>_WordAfter
Edited by PincoPanco

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

  • 6 months later...

 

an easy way to find the word after (and also word before if needed) within the whole text content of the web page

example:

#include <IE.au3>
Local $oIE = _IECreate("http://www.bbc.co.uk/weather/2643743")
Local $source = _IEBodyReadText($oIE) ; retrieves the whole text content of the web page
_IEQuit($oIE)

$substring = "Sunset" ; <-- change here the word of the reference
$x = _WordAfter($source, $substring)
MsgBox(0, "Word after " & $substring, $x)

Func _WordBefore($source, $substring) ; returns the word before
    Local $a = StringInStr($source, $substring) ; position of character where start the found string
    ; Local $b = $a + StringLen($substring) ; position of last character of searched string
    Local $c = StringInStr($source, " ", 0, -2, $a) ; position before the starting of word before (space before)
    Local $d = StringMid($source, $c + 1, $a - $c - 2) ; word before
    Return $d
EndFunc   ;==>_WordBefore

Func _WordAfter($source, $substring) ; returns the word ofter
    Local $a = StringInStr($source, $substring) ; position of character where start the found string
    Local $b = $a + StringLen($substring) ; position of last character of searched string
    Local $c = StringInStr($source, " ", 0, 2, $b) ; position beyond the word after (second space)
    Local $d = StringMid($source, $b + 1, $c - $b - 1) ; word after
    Return $d
EndFunc   ;==>_WordAfter

This is great thanks, but how to get it work on words that have mutiple occurrences? for example the word 'Range'

Link to comment
Share on other sites

molotofc,

It would be much easier if you point out what you exactly want to get 

$code = BinaryToString(InetRead("http://www.bbc.co.uk/weather/2643743"))

; isolate the needed part
$a = StringRegExpReplace($code, '(?s).+?environmental-summary.+?<h4>(.+?)[^>]+?clear.+', '$1')
;Msgbox(0,"", $a)

; remove tags and strip multiple spaces
$res = StringStripWS(StringRegExpReplace($a, '(?s)<.*?>', ""), 4)
Msgbox(0,"", $res)
Link to comment
Share on other sites

It would be smart from people asking for help to specify completely, exactly and head first what they want to achieve and the context they are working in, rather than refining their "specifications" right after every answer is given to a previous uncomplete request.

Else I'm afraid that helpers will become more and more reluctant to provide advices, noting that their helping efforts are systematically off the ever moving targets imposed by helpees.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

agree with jchd when questions are the fruit of lazyness. Anyway, sometimes it is not even clear to ourselves what we want to achieve, but the goal become clear only as you go along. Learn by mistakes is also a way to learn (when we are in good faith and not in lazyness)

here a slightly revised version of those functions
a little debugged and added the optional "Occurence" parameter to be used when needed:

_WordBefore($source, $substring, [$occurrence = 1])
_WordAfter($source, $substring, [$occurrence = 1])

#include <IE.au3>
#include <StringConstants.au3>

Local $oIE = _IECreate("http://www.bbc.co.uk/weather/2643743") ; ,0,0) add this to hide browser
; Local $source = _IEBodyReadText($oIE) ; retrieves the whole text content of the web page
Local $source = _IEPropertyGet($oIE, "innertext")
_IEQuit($oIE) ; quit the browser

$substring = "Sunset" ; <-- change here the word of the reference

$x = _WordAfter($source, $substring)
MsgBox(0, "Check Word after", "Word after " & $substring & " is:" & @CRLF & @CRLF & $x)
$x = _WordBefore($source, $substring)
MsgBox(0, "Check Word before", "Word before " & $substring & " is:" & @CRLF & @CRLF & $x)

Func _WordBefore($source, $substring, $occurrence = 1) ; returns the word before
    ;
    ; ------------ clean a bit the $source string
    ; replaces characters Chr(9) thru Chr(13) (which are HorizontalTab, LineFeed, VerticalTab, FormFeed, and CarriageReturn) with Whitespace
    For $i = 9 To 13
        $source = StringReplace($source, Chr($i), " ")
    Next
    ; this removes leading/trailing/double spaces
    $source = StringStripWS($source, $STR_STRIPLEADING + $STR_STRIPTRAILING + $STR_STRIPSPACES)
    ; ------------
    If $substring = " " Then Return SetError(1, 0, "") ; single space search not allowed
    Local $a = StringInStr($source, $substring, 0, $occurrence) ; position of character where starts the found string
    If $a = 1 Or Not $a Then Return SetError(1, 0, "") ; searched word is the first in string or searched word not found
    Local $c = StringInStr($source, " ", 0, -2, $a) ; position before the starting of word before (space before)
    Local $d = StringMid($source, $c + 1, $a - $c - 1) ; word before
    Return $d
EndFunc   ;==>_WordBefore

Func _WordAfter($source, $substring, $occurrence = 1) ; returns the word ofter
    ;
    ; ------------ clean a bit the $source string
    ; replaces characters Chr(9) thru Chr(13) (which are HorizontalTab, LineFeed, VerticalTab, FormFeed, and CarriageReturn) with Whitespace
    For $i = 9 To 13
        $source = StringReplace($source, Chr($i), " ")
    Next
    ; this removes leading/trailing/double spaces
    $source = StringStripWS($source, $STR_STRIPLEADING + $STR_STRIPTRAILING + $STR_STRIPSPACES)
    ; ------------
    If $substring = " " Then Return SetError(1, 0, "") ; single space search not allowed
    Local $a = StringInStr($source, $substring, 0, $occurrence) ; position of character where start the found string
    If Not $a Then Return SetError(1, 0, "") ; searched word not found
    Local $b = $a + StringLen($substring) ; position of last character of searched string
    Local $c = StringInStr($source, " ", 0, 2, $b) ; position beyond the word after (second space)
    Local $d = StringMid($source, $b + 1, $c - $b - 1) ; word after
    Return $d
EndFunc   ;==>_WordAfter

please, could someone (that knows how to use regexp) tell me how to translate this snippet in a single regexp?

; ------------ clean a bit the $source string
; replaces characters Chr(9) thru Chr(13) (which are HorizontalTab, LineFeed, VerticalTab, FormFeed, and CarriageReturn) with Whitespace
For $i = 9 To 13
    $source = StringReplace($source, Chr($i), " ")
Next
; this removes leading/trailing/double spaces
$source = StringStripWS($source, $STR_STRIPLEADING + $STR_STRIPTRAILING + $STR_STRIPSPACES)
; ------------

thanks

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

Chimp,

StringStripWS : "WS includes Chr(9) thru Chr(13)" , so  $source = StringStripWS($source, 7) is enough

 

Hi mikell

I would substitute that chr() with spaces, not remove

thanks

Edited by Chimp

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

Oh I understand, you don't want to keep the newlines

StringRegExpReplace($source, 's', " ") should work

 

thanks mikell, it works :)

it replaces only chrs 9 10 12 13, but I think it can do anyway.

this can simplify a bit my code.

#include <IE.au3>
#include <StringConstants.au3>

Local $oIE = _IECreate("http://www.bbc.co.uk/weather/2643743") ; ,0,0) add this to hide browser
; Local $source = _IEBodyReadText($oIE) ; retrieves the whole text content of the web page
Local $source = _IEPropertyGet($oIE, "innertext")
_IEQuit($oIE) ; quit the browser

$substring = "sunset" ; <-- change here the word of the reference

$x = _WordAfter($source, $substring, 1) ; 1 means find first occurrence of substring and then return to me word after it
MsgBox(0, "Check Word after", "Word after " & $substring & " is:" & @CRLF & @CRLF & $x, 5)
$x = _WordBefore($source, $substring, 1) ; 1 means find first occurrence of substring and then return to me word before it
MsgBox(0, "Check Word before", "Word before " & $substring & " is:" & @CRLF & @CRLF & $x, 5)

Func _WordBefore($source, $substring, $occurrence = 1) ; returns the word before
    ; this removes chr 9 10 12 13 and leading/trailing/double spaces
    $source = StringStripWS(StringRegExpReplace($source, '\s', " "), $STR_STRIPLEADING + $STR_STRIPTRAILING + $STR_STRIPSPACES)
    If $substring = " " Then Return SetError(1, 0, "") ; single space search not allowed
    Local $a = StringInStr($source, $substring, 0, $occurrence) ; position of character where starts the found string
    If $a = 1 Or Not $a Then Return SetError(1, 0, "") ; searched word is the first in string or searched word not found
    Local $c = StringInStr($source, " ", 0, -2, $a) ; position before the starting of word before (space before)
    Local $d = StringMid($source, $c + 1, $a - $c - 1) ; word before
    Return $d
EndFunc   ;==>_WordBefore

Func _WordAfter($source, $substring, $occurrence = 1) ; returns the word ofter
    ; this removes chr 9 10 12 13 and leading/trailing/double spaces
    $source = StringStripWS(StringRegExpReplace($source, '\s', " "), $STR_STRIPLEADING + $STR_STRIPTRAILING + $STR_STRIPSPACES)
    If $substring = " " Then Return SetError(1, 0, "") ; single space search not allowed
    Local $a = StringInStr($source, $substring, 0, $occurrence) ; position of character where start the found string
    If Not $a Then Return SetError(1, 0, "") ; searched word not found
    Local $b = $a + StringLen($substring) ; position of last character of searched string
    Local $c = StringInStr($source, " ", 0, 2, $b) ; position beyond the word after (second space)
    Local $d = StringMid($source, $b + 1, $c - $b - 1) ; word after
    Return $d
EndFunc   ;==>_WordAfter

Thanks again.

 

image.jpeg.9f1a974c98e9f77d824b358729b089b0.jpeg Chimp

small minds discuss people average minds discuss events great minds discuss ideas.... and use AutoIt....

Link to comment
Share on other sites

Thanks a lot. I agree that sometimes I cannot find the solution myself, but actually I am not clear what I want myself, but realise it would help to do so, in order to get the appropriate help. What Chimp and Mikell have posted is pretty much what I was looking for.

Many thanks.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...