Jump to content

Recommended Posts

Posted

Hi folks,

I have managed to use _inetgetsource to provide a source string that I have then parsed to get a string like:

<a href="/name/lockie" class="plain">Lockie</a> <a href="/name/aram-1" class="plain">Aram</a>

Now what I'm after is a way to reduce that to:

Lockie Aram

Those words will be random (it's a name) and I'd like the possibility of eventually having three or more names included. Each will be in that format in one string as supplied by the website.

Func _GetName()
    HttpSetProxy(0)

    $sSource = _INetGetSource('http://www.behindthename.com/random/random.php?number=2&gender=m&surname=&all=yes')
    ConsoleWrite($sSource & @CRLF)
    If StringInStr($sSource, 'Cache Access Denied.') then 
        MsgBox(0, "Proxy Blocked", "Cache Access Denied.",2)
        Return -1
    EndIf
    
    $sNamehtml = StringRegExp($sSource, '(<a href="/name/).*(" class="plain">).*(</a>)', 2)
    For $i = 0 To Ubound($sNamehtml)-1
        MsgBox(0, $i,$sNamehtml[$i])    
    Next
    $name = StringRegExpReplace($sNamehtml[0], '<.*">', '')
    MsgBox(0, 'Name', $name)
;~  Return $name
    
EndFunc;==> _GetName

Can anyone please help with this?

Posted

Hi nikink,

Just a note on Regular Expressions and AutoIt in case you didn't know.

AutoIt uses the PCRE (Perl Compatible Regular Expressions) engine.

If you are using google for tutorials or just searching for specifics, keep this in mind when looking for expressions or help on expressions.

Some things that may help you along with future questions or concerns:

QuickStart

Tutorials

RegExCoach

Posted

Well, this seems to work for now, but I think there should be a more efficient way. Anyone? Anyone? Bueller?

Func _GetName()
    HttpSetProxy(0)

    $sSource = _INetGetSource('http://www.behindthename.com/random/random.php?number=2&gender=m&surname=&all=yes')
;~  ConsoleWrite($sSource & @CRLF)
    If StringInStr($sSource, 'Cache Access Denied.') then 
        MsgBox(0, "Proxy Blocked", "Cache Access Denied.",2)
        Return -1
    EndIf
    
    $sNamehtml = StringRegExp($sSource, '(<a href="/name/).*(" class="plain">).*(</a>)', 2)
;~  For $i = 0 To Ubound($sNamehtml)-1
;~      MsgBox(0, $i,$sNamehtml[$i])    
;~  Next

;
;I would think that these steps can be done much more elegantly... am I right?
;
    ConsoleWrite('html string='&$sNamehtml[0] & @CRLF)
    $name = StringRegExpReplace($sNamehtml[0], '</a>', ''); Strip the </a>
    ConsoleWrite('Name 1='&$name & @CRLF)
    $name = StringRegExpReplace($name, '<.*?">', ''); Strip the Preceding html markup
    ConsoleWrite('Name 2='&$name & @CRLF)
    $name = StringRegExpReplace($name, '\h{2}', ' '); Reduce multiple spaces to one.
    ConsoleWrite('Name 3='&$name & @CRLF)

;~  Return $name
    
EndFunc;==> _GetName
Posted

#include <Inet.au3>

ConsoleWrite(_GetName())

Func _GetName()
    HttpSetProxy(0)

    Local $sSource = _INetGetSource('http://www.behindthename.com/random/random.php?number=2&gender=m&surname=&all=yes')
    
    If StringInStr($sSource, 'Cache Access Denied.') Then 
        MsgBox(16, "Proxy Blocked", "Cache Access Denied.", 2)
        Return -1
    EndIf
    
    Local $sRetName = StringRegExpReplace($sSource, _
        '(?s).*?<a href="/name/.*?" class="plain">(.*?)</a> +<a href="/name/.*?" class="plain">(.*?)</a>.*$', '\1 \2')
    
    Return $sRetName
EndFunc  ;==> _GetName

 

  Reveal hidden contents

 

 

AutoIt is simple, subtle, elegant. © AutoIt Team

Posted
  MrCreatoR said:

Local $sRetName = StringRegExpReplace($sSource, _
        '(?s).*?<a href="/name/.*?" class="plain">(.*?)</a> +<a href="/name/.*?" class="plain">(.*?)</a>.*$', '\1 \2')
    
    Return $sRetName
Thankyou very much! (I don't suppose you (or someone else) could explain what it's doing for me? I am trying to learn this stuff...) :P
  • Moderators
Posted
  nikink said:

Thankyou very much! (I don't suppose you (or someone else) could explain what it's doing for me? I am trying to learn this stuff...) :P

You were given 3 links in the 2nd post that could explain what it is doing.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Posted

  SmOke_N said:

You were given 3 links in the 2nd post that could explain what it is doing.

And I've read them and reread them, and still have trouble understanding.

Oh well, thanks anyway. I'm sure I'll get it eventually! :P

  • Moderators
Posted

  nikink said:

And I've read them and reread them, and still have trouble understanding.

Oh well, thanks anyway. I'm sure I'll get it eventually! :P

'(?s).*?<a href="/name/.*?" class="plain">(.*?)</a> +<a href="/name/.*?" class="plain">(.*?)</a>.*$'

There are only few regular expressions being used here.

(?s)

.*?

(.*?)

.*

$

That's what you would look up to understand.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Posted

(?s) : match anything including new line

.*? : match any single character except newline, repeat Zero or more times, as few times as possible for a match

(.*?) : a group of characters, match any single character except newline, repeat Zero or more times, as few times as possible for a match

.* : match any single character except newline, repeat Zero or more times

$ : end of string

Right...?

So:

'match anything including new line

match any single character except newline, repeat Zero or more times, as few times as possible for a match

match <a href="/name/

match any single character except newline, repeat Zero or more times, as few times as possible for a match

match " class="plain">

a group of characters, match any single character except newline, repeat Zero or more times, as few times as possible for a match

match </a>

match space one or more times

match <a href="/name/

match any single character except newline, repeat Zero or more times, as few times as possible for a match

match " class="plain">

a group of characters, match any single character except newline, repeat Zero or more times, as few times as possible for a match

match </a>

match any single character except newline, repeat Zero or more times

match end of string'

I long for the day when this becomes something easier than trudging up hill through molasses for me.

So that line of regex returns 2 (and only 2) groups of characters. Why is that StringRegExpReplace?

The first (.*?) group gets put into \1, right? and the second (.*?) into \2?

(So it won't work as written for more than 2 names at a time.)

Why are the matching patterns not replaced by '\1 \2'? And why does the StringRegExpReplace function return two strings (ok, one string made of two matches separated by a space)?

As my fumbling attempt shows, I thought StringRegExp would be the way to go as it can return an arbitrarily sized array of matches.

Am I overthinking this? Am i I making an incorrect assumption somewhere?

Anyway, I do appreciate the help given in this forum. :P

Posted

bumping cuz I really would like an answer to this.

Simply put, why does stringregexpreplace('pattern1sourcepattern2', 'patterntomatch', '\1 \2') return 'pattern1 pattern2' instead of '\1source\2'?

StringRegExpReplace: Replace text in a string based on regular expressions.

Return Value

@Error Meaning

0 Executed properly. Check @Extended for the number of replacements performed.

2 Pattern invalid. @Extended = offset of error in pattern.

from the help file.

Am I just dumb?

Posted

  Quote

why does stringregexpreplace('pattern1sourcepattern2', 'patterntomatch', '\1 \2') return 'pattern1 pattern2' instead of '\1source\2'?

Ho sais it should return '\1source\2'? :P

Your pattern is not matching the string, and frankly i don't understand what you expecting from it, what is «\1 \2»? are you trying to get the string between pattern1 and pattern2? then try this:

$String = StringRegExpReplace('pattern1sourcepattern2', 'pattern1(.*)pattern2', '\1')
ConsoleWrite($String)

 

  Reveal hidden contents

 

 

AutoIt is simple, subtle, elegant. © AutoIt Team

Posted

  Quote

Why are the matching patterns not replaced by '\1 \2'?

You should group them:

$String = StringRegExpReplace('My Group10 String Group20', '.*(Group\d+).*(Group\d+)', '\1 and \2')

ConsoleWrite($String)

 

  Reveal hidden contents

 

 

AutoIt is simple, subtle, elegant. © AutoIt Team

Posted

  MrCreatoR said:

Ho sais it should return '\1source\2'? :P

Your pattern is not matching the string, and frankly i don't understand what you expecting from it, what is «\1 \2»? are you trying to get the string between pattern1 and pattern2? then try this:

$String = StringRegExpReplace('pattern1sourcepattern2', 'pattern1(.*)pattern2', '\1')
ConsoleWrite($String)

Hm. I guess what I'm expecting is for that part of the source that matches the supplied pattern to be *replaced* by characters specified and thus the source returned with the matched pattern replaced.

I'm getting confused by StringRegExpReplace *not* returning the original string (with replacements if any), but returning the actual parts of the string that match the pattern.

Does that make more sense as to my confusion?

(Thankyou, btw, I will play around with the examples of code you've given... :()

  • Moderators
Posted (edited)

  nikink said:

Hm. I guess what I'm expecting is for that part of the source that matches the supplied pattern to be *replaced* by characters specified and thus the source returned with the matched pattern replaced.

I'm getting confused by StringRegExpReplace *not* returning the original string (with replacements if any), but returning the actual parts of the string that match the pattern.

Does that make more sense as to my confusion?

(Thankyou, btw, I will play around with the examples of code you've given... :P)

StringRegExpReplace does not return any part other than the replacement you have requested.... Anything in the 2nd parameter will be replaced.

For everything in a parenthesis can be put back into the string that is returned.

$s_string = "Iamastring"

$s_temp = StringRegExpReplace($s_string, "(.am)(.)(string)", "\1")

$s_temp will = Iam

$s_temp = StringRegExpReplace($s_string, "(.am)(.)(string)", "\2")

$s_temp will = a

$s_temp = StringRegExpReplace($s_string, "(.am)(.)(string)", "\3")

$s_temp will = string

$s_temp = StringRegExpReplace($s_string, "(.)(am)(.)(string)", "\1 \2 \3 \4")

$s_temp will = I am a string

^^ I added spaces in the replacement param...

So you see, for every parenthesis setup, we can add back to the string what we replaced up to 9 deep (back referencing with StringRegExpReplace() is only capable of 9 back references).

@extended for StringRegExpReplace will return the number of replacements made. (I don't use anything else but that, if it's zero, I know it failed, or there was nothing in the string I passed).

So though the two are similar (StringRegExp and StringRegExpReplace), you'll see they accomplish two different things.

Edited by SmOke_N

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Posted

Btw, we can (re)use \1 inside the pattern:

$sWeb_Site = 'somewebsite.com'
$sString = '<a href="http://' & $sWeb_Site & '">' & $sWeb_Site & '</a>'

$sRet = StringRegExpReplace($sString, '(?i)<(\w+) href="http://(.*?)">\2</\1>', '\2')

ConsoleWrite($sRet & @CRLF & @extended & @CRLF)

Not sure why we need it, but it's good to know :P

 

  Reveal hidden contents

 

 

AutoIt is simple, subtle, elegant. © AutoIt Team

Posted

Thankyou very much MrCreator and Smoke_N, you've given me a lot to think about and play with. :P It's making a bit more sense (in that I can see what it's doing and imagine a lot of usefulness arising), but I feel the function name is a little... misleading (not that I have a better one)... and the help file explanation and example are not very clear.

I appreciate your patience and helpfulness! :(

Posted

  Quote

but I feel the function name is a little... misleading

How?

StringRegExpReplace...

String - Representing the first parameter (the initial string to test :( ).

RegExp - Representing the second parameter (Regular Expression - the pattern).

Replace - Representing the 3-d parameter (replacement string, wich supports kind of limited expression - groups matching).

So.. what is misleading here? :P

 

  Reveal hidden contents

 

 

AutoIt is simple, subtle, elegant. © AutoIt Team

Posted (edited)

  MrCreatoR said:

How?

StringRegExpReplace...

String - Representing the first parameter (the initial string to test :( ).

RegExp - Representing the second parameter (Regular Expression - the pattern).

Replace - Representing the 3-d parameter (replacement string, wich supports kind of limited expression - groups matching).

So.. what is misleading here? :P

Well, here's the example code:

MsgBox(0, "Regular Expression Replace Test", StringRegExpReplace("Where have all the flowers gone, long time passing?", "[aeiou]", "@"))

String = "Where have all the flowers gone, long time passing?"

RegExp = "[aeiou]"

Replace = "@"

And it displays "Wh@r@ h@v@ @ll th@ fl@w@rs g@n@, l@ng t@m@ p@ss@ng?"

So it has Replaced all characters that match the RegExp in the String, and returned the String with all the Replacements done.

That seems nice and straightforward.

So when it returns only that part of the string that matches the pattern, it seems a bit odd, as it goes counterintuitive to the example given.

To me anyway. :idea: Obviously others have understood from that example and the help page exactly what the function does. I'm not ashamed to admit confusion or ignorance though. :) I am but a beginner...

Edited by nikink
Posted
  Quote

when it returns only that part of the string that matches the pattern, it seems a bit odd, as it goes counterintuitive to the example given.

It will return the initial string after replacement in it as set in pattern. Just as with StringReplace(), but here you can use regular expression, that's it :P . Can you tell me please what is the expected return in this example?

 

  Reveal hidden contents

 

 

AutoIt is simple, subtle, elegant. © AutoIt Team

Posted

Going back to the original problem, here is a non-regexp way of doing this using the little-used _StringBetween....

#include <array.au3>
#Include <String.au3>

$string = '<a href="/name/lockie" class="plain">Lockie</a> <a href="/name/aram-1" class="plain">Aram</a>'
$array = _StringBetween($string,">","<")
$string = _ArrayToString($array,"")

MsgBox(0,"",$string)
- Table UDF - create simple data tables - Line Graph UDF GDI+ - quickly create simple line graphs with x and y axes (uses GDI+ with double buffer) - Line Graph UDF - quickly create simple line graphs with x and y axes (uses AI native graphic control) - Barcode Generator Code 128 B C - Create the 1/0 code for barcodes. - WebCam as BarCode Reader - use your webcam to read barcodes - Stereograms!!! - make your own stereograms in AutoIT - Ziggurat Gaussian Distribution RNG - generate random numbers based on normal/gaussian distribution - Box-Muller Gaussian Distribution RNG - generate random numbers based on normal/gaussian distribution - Elastic Radio Buttons - faux-gravity effects in AutoIT (from javascript)- Morse Code Generator - Generate morse code by tapping your spacebar!

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...