Jump to content
Sign in to follow this  
PMac

Struggling with StringRegExp

Recommended Posts

PMac

Hi, I'm new to AutoIt, with some limited experience of Python, and I've been trying to figure out the text-handling side of it. I've written a script which opens Firefox, logs into my Travian (online game) account, copies the page source to the clipboard and assigns a variable to it, but I've gotten stuck when trying to parse the page source and extract the information I want. I followed the tutorial for StringRegExp in AutoIt Help and was able to pull out a string, but when I tried to process that string through StringRegExp a second time, it threw up an error.

Here's some code to illustrate the problem:

;Get page source and parse it for resource amounts and other info
Sleep(3000)
;opens the page source viewer in firefox
Send("^u")
;Give the browser time to respond
Sleep(100)
;Highlight the contents
Send("^a")
;Copy them to the clipboard
Send("^c")
$pagesource = Clipget()

The section of HTML source I'm interested in looks like this:

<td><img class="res" src="img/un/r/1.gif" title="Wood"></td>
<td id=l4 title=8>768/800</td>
<td class="s7"> <img class="res" src="img/un/r/2.gif" title="Clay"></td>
<td id=l3 title=8>768/800</td>
<td class="s7"> <img class="res" src="img/un/r/3.gif" title="Iron"></td>

<td id=l2 title=8>768/800</td><td class="s7"> <img class="res" src="img/un/r/4.gif" title="Wheat"></td>
<td id=l1 title=10>773/800</td>

To process it, I tried to isolate each resource by including their unique id's in the search pattern, which worked fine. This pulls out the stats for the wood resource, along with the HTML that identifies it as being the wood resource:

$wood1 = StringRegExp($pagesource, "(id=l4 title=8>[0-9]{3,9}/[0-9]{3,9})", 1)
MsgBox(0, "Wood1", $wood1[0])

This returns "id=l4 title=8>768/800"

The problem came when I tried to remove the extraneous HTML in a second processing step:

$wood2 = StringRegExp($wood1, "([0-9]{3,9}/[0-9]{3,9})", 1)
MsgBox(0, "Wood2", $wood2[0])

This returns the following error:

"MsgBox(0, "Wood2", $wood2[0])

MsgBox(0, "Wood2", $wood2^ ERROR

Error: Subscript used with non-Array variable."

Can someone please point out what I'm doing wrong? Also, is there any good beginners material online? I've searched, but I've only found a handful of tutorials, and I've mostly been stumbling around in the dark trying to figure out how to do things from reading the official documentation.

Share this post


Link to post
Share on other sites
Paulie

First of all, You are much better off with

_INetGetSource()

Secondly, Try this:

#include <Array.au3>
$String = '<td><img class="res" src="img/un/r/1.gif" title="Wood"></td>'&@CRLF& _
'<td id=l4 title=8>768/800</td>'&@CRLF& _
'<td class="s7"> <img class="res" src="img/un/r/2.gif" title="Clay"></td>'&@CRLF& _
'<td id=l3 title=8>768/800</td>'&@CRLF& _
'<td class="s7"> <img class="res" src="img/un/r/3.gif" title="Iron"></td>'&@CRLF& _
'<td id=l2 title=8>768/800</td><td class="s7"> <img class="res" src="img/un/r/4.gif" title="Wheat"></td>'&@CRLF& _
'<td id=l1 title=10>773/800</td>'

$NumberPattern = "\d{3,9}/\d{3,9}"
$TitlePattern = '(?: title=")(.{4,6})"'

$Result2 = StringRegExp($String, $NumberPattern, 3)
$Result1 = StringRegExp($String, $TitlePattern, 3)

$Bound = Ubound($Result1)
Dim $Combo[$Bound][2]
For $x = 0 to $Bound-1
    $Combo[$x][0] = $Result1[$x]
    $Combo[$x][1] = $Result2[$x]
Next

_ArrayDisplay($Combo)

Or if you want to do it your way, with the look for each based on the IDs, what you need to use is a non-capturing group. (?: ...)

Edited by Paulie

Share this post


Link to post
Share on other sites
PsaltyDS

Hi, I'm new to AutoIt, with some limited experience of Python, and I've been trying to figure out the text-handling side of it. I've written a script which opens Firefox, logs into my Travian (online game) account, copies the page source to the clipboard and assigns a variable to it, but I've gotten stuck when trying to parse the page source and extract the information I want. I followed the tutorial for StringRegExp in AutoIt Help and was able to pull out a string, but when I tried to process that string through StringRegExp a second time, it threw up an error.

To process it, I tried to isolate each resource by including their unique id's in the search pattern, which worked fine. This pulls out the stats for the wood resource, along with the HTML that identifies it as being the wood resource:

$wood1 = StringRegExp($pagesource, "(id=l4 title=8>[0-9]{3,9}/[0-9]{3,9})", 1)
MsgBox(0, "Wood1", $wood1[0])

This returns "id=l4 title=8>768/800"

The problem came when I tried to remove the extraneous HTML in a second processing step:

$wood2 = StringRegExp($wood1, "([0-9]{3,9}/[0-9]{3,9})", 1)
MsgBox(0, "Wood2", $wood2[0])

This returns the following error:

"MsgBox(0, "Wood2", $wood2[0])

MsgBox(0, "Wood2", $wood2^ ERROR

Error: Subscript used with non-Array variable."

Can someone please point out what I'm doing wrong? Also, is there any good beginners material online? I've searched, but I've only found a handful of tutorials, and I've mostly been stumbling around in the dark trying to figure out how to do things from reading the official documentation.

I don't have any problem running this:
$wood1 = "id=l4 title=8>768/800"
$wood2 = StringRegExp($wood1, "([0-9]{3,9}/[0-9]{3,9})", 1)
If @error Then
    MsgBox(16, "Error", "StringRegExp() failed, @error = " & @error & ", @extended = " & @extended & @LF)
Else
    MsgBox(0, "Wood2", $wood2[0])
EndIf
It returns "768/800".

muttley


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites
PMac

First of all, You are much better off with

_INetGetSource()

Secondly, Try this:

#include <Array.au3>
$String = '<td><img class="res" src="img/un/r/1.gif" title="Wood"></td>'&@CRLF& _
'<td id=l4 title=8>768/800</td>'&@CRLF& _
'<td class="s7"> <img class="res" src="img/un/r/2.gif" title="Clay"></td>'&@CRLF& _
'<td id=l3 title=8>768/800</td>'&@CRLF& _
'<td class="s7"> <img class="res" src="img/un/r/3.gif" title="Iron"></td>'&@CRLF& _
'<td id=l2 title=8>768/800</td><td class="s7"> <img class="res" src="img/un/r/4.gif" title="Wheat"></td>'&@CRLF& _
'<td id=l1 title=10>773/800</td>'

$NumberPattern = "\d{3,9}/\d{3,9}"
$TitlePattern = '(?: title=")(.{4,6})"'

$Result2 = StringRegExp($String, $NumberPattern, 3)
$Result1 = StringRegExp($String, $TitlePattern, 3)

$Bound = Ubound($Result1)
Dim $Combo[$Bound][2]
For $x = 0 to $Bound-1
    $Combo[$x][0] = $Result1[$x]
    $Combo[$x][1] = $Result2[$x]
Next

_ArrayDisplay($Combo)

Or if you want to do it your way, with the look for each based on the IDs, what you need to use is a non-capturing group. (?: ...)

Thanks. I tried _INetGetSource(), but the site requires cookies and throws up the source to the login page when I use it, and I've no idea at this point about where to begin with cookie handling to make it work. I've only been learning the language for the past couple of days, so I'm quite limited in what I can do, and learning how to navigate around with my browser seemed like a good place to start.

I'm not set on doing it any particular way, but I'll look up non-capturing groups and try to figure out how the code you posted works. Thanks for the pointers.

Edited by PMac

Share this post


Link to post
Share on other sites
PMac

I don't have any problem running this:

$wood1 = "id=l4 title=8>768/800"
$wood2 = StringRegExp($wood1, "([0-9]{3,9}/[0-9]{3,9})", 1)
If @error Then
    MsgBox(16, "Error", "StringRegExp() failed, @error = " & @error & ", @extended = " & @extended & @LF)
Else
    MsgBox(0, "Wood2", $wood2[0])
EndIf
It returns "768/800".

muttley

That works for me too, but when I change $wood1 to StringRegExp(<HTML source from clipboard>, <search pattern>), it causes an error when fed to $wood2, though I don't know why.

Share this post


Link to post
Share on other sites
PsaltyDS

That works for me too, but when I change $wood1 to StringRegExp(<HTML source from clipboard>, <search pattern>), it causes an error when fed to $wood2, though I don't know why.

$wood1 = '<td><img class="res" src="img/un/r/1.gif" title="Wood"></td>' & @CR & _
'<td id=l4 title=8>768/800</td>' & @CR & _
'<td class="s7"> <img class="res" src="img/un/r/2.gif" title="Clay"></td>' & @CR & _
'<td id=l3 title=8>768/800</td>' & @CR & _
'<td class="s7"> <img class="res" src="img/un/r/3.gif" title="Iron"></td>' & @CR & _
'<td id=l2 title=8>768/800</td><td class="s7"> <img class="res" src="img/un/r/4.gif" title="Wheat"></td>' & @CR & _
'<td id=l1 title=10>773/800</td>'

$wood2 = StringRegExp($wood1, "([0-9]{3,9}/[0-9]{3,9})", 1)
If @error Then
    MsgBox(16, "Error", "StringRegExp() failed, @error = " & @error & ", @extended = " & @extended & @LF)
Else
    MsgBox(0, "Wood2", $wood2[0])
EndIf

Still works fine...

muttley


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites
PMac

$wood1 = '<td><img class="res" src="img/un/r/1.gif" title="Wood"></td>' & @CR & _
'<td id=l4 title=8>768/800</td>' & @CR & _
'<td class="s7"> <img class="res" src="img/un/r/2.gif" title="Clay"></td>' & @CR & _
'<td id=l3 title=8>768/800</td>' & @CR & _
'<td class="s7"> <img class="res" src="img/un/r/3.gif" title="Iron"></td>' & @CR & _
'<td id=l2 title=8>768/800</td><td class="s7"> <img class="res" src="img/un/r/4.gif" title="Wheat"></td>' & @CR & _
'<td id=l1 title=10>773/800</td>'

$wood2 = StringRegExp($wood1, "([0-9]{3,9}/[0-9]{3,9})", 1)
If @error Then
    MsgBox(16, "Error", "StringRegExp() failed, @error = " & @error & ", @extended = " & @extended & @LF)
Else
    MsgBox(0, "Wood2", $wood2[0])
EndIf

Still works fine...

muttley

However, this doesn't:

Posted Image

Share this post


Link to post
Share on other sites
PsaltyDS

However, this doesn't:

What version of AutoIt are you running?

muttley


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites
PMac

What version of AutoIt are you running?

muttley

Version 3.2.12.1. I just installed the beta (v3.2.13.4) and tried that, but it gives the same error.

Share this post


Link to post
Share on other sites
PsaltyDS

Version 3.2.12.1. I just installed the beta (v3.2.13.4) and tried that, but it gives the same error.

Can't help you. I get no errors running the code I posted with the same version of AutoIt.

muttley


Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites
PMac

Can't help you. I get no errors running the code I posted with the same version of AutoIt.

muttley

Strange. I was using MsgBox() because it was the closest I could find to Python's Print command to give me feedback on what I was doing as I tested my code, but I found ConsoleWrite() today, which is closer to what I wanted, and it works without any problems. The issue seems to be specific to MsgBox().

;This code works fine
$pagesource = '<td><img class="res" src="img/un/r/1.gif" title="Wood"></td>' & @CR & _
'<td id=l4 title=8>768/800</td>' & @CR & _
'<td class="s7"> <img class="res" src="img/un/r/2.gif" title="Clay"></td>' & @CR & _
'<td id=l3 title=8>768/800</td>' & @CR & _
'<td class="s7"> <img class="res" src="img/un/r/3.gif" title="Iron"></td>' & @CR & _
'<td id=l2 title=8>768/800</td><td class="s7"> <img class="res" src="img/un/r/4.gif" title="Wheat"></td>' & @CR & _
'<td id=l1 title=10>773/800</td>'

$NumberPattern = "\d{3,9}/\d{3,9}"
$TitlePattern = '(?:title=")(.{4,6})"'

$Result2 = StringRegExp($pagesource, $NumberPattern, 3)
$Result1 = StringRegExp($pagesource, $TitlePattern, 3)

ConsoleWrite($Result2[0] & @LF)
ConsoleWrite($Result2[1] & @LF)
ConsoleWrite($Result2[2] & @LF)
ConsoleWrite($Result2[3] & @LF)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×