Jump to content

[SOLVED] Regular Expression Help


Recommended Posts

Ok, so um,

$array = StringRegExp($sHTML, '<a id="_ctl0_LatestAdditions_GenericdBGrid__ctl4_articleLink" title="(.*?)" class="small_link" href="(.*?)">(.*?)</a>', 1, $nOffset)

Now, what I want to do is just grab the part betwen the > and </a> lol but not sure how to do that. here's an unformatted string:

<a id="_ctl0_LatestAdditions_GenericdBGrid__ctl3_articleLink" title="INFORMATION ON THE TITLE" class="small_link" href="http://kb.mysite.com/blah">INFORMATION I WANT</a>

So, um, er, did i even come close to doing it right?

Edited by zackrspv

-_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë 맧ëñ§ë øƒ !ïƒë.

Link to comment
Share on other sites

  • Moderators

1st learn about escape characters and anything else you need to know:

http://perldoc.perl.org/perlre.html#Regular-Expressions

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

1st learn about escape characters and anything else you need to know:

http://perldoc.perl.org/perlre.html#Regular-Expressions

I will not lie, i know JACK about regular expressions, but in reality, i need like Regular Expressions for Dummies lol That link you sent me is all well and fine, but since i'm not used to them, i have nfc how to go about doing what I want to do.

I do know, from past help, that (.*?) is something that I want lol and in the help file i can see why, but as to get just the data I want? no idea.

I know it's tedious to keep asking these ?'s, and sorry for it, but just not sure what else to do :)

-_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë 맧ëñ§ë øƒ !ïƒë.

Link to comment
Share on other sites

  • Moderators

I will not lie, i know JACK about regular expressions, but in reality, i need like Regular Expressions for Dummies lol That link you sent me is all well and fine, but since i'm not used to them, i have nfc how to go about doing what I want to do.

I do know, from past help, that (.*?) is something that I want lol and in the help file i can see why, but as to get just the data I want? no idea.

I know it's tedious to keep asking these ?'s, and sorry for it, but just not sure what else to do :)

That showed no desire to even learn what it is doing or how it is doing it... Just wanting a quick fix.

Now the crappy part about that, is this is your 2nd on regexp, so this is where I say study and learn, because it won't be the last time you want to use them.

Me giving you the answer (which you almost have anyway, just too many things you are grabing), is not going to make you learn something you are obviously using quite a bit.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

That showed no desire to even learn what it is doing or how it is doing it... Just wanting a quick fix.

Now the crappy part about that, is this is your 2nd on regexp, so this is where I say study and learn, because it won't be the last time you want to use them.

Me giving you the answer (which you almost have anyway, just too many things you are grabing), is not going to make you learn something you are obviously using quite a bit.

I don't disagree with you, and i do not think that showed a lack of desire to learn. After all I did download 'The Regex Coach' application to help me out here. My problem is the link you gave me was way too....how do i say this....detailed for me to learn much off of. The reason why this is really my 3rd question on regex's is because i just don't understand the functionality between the various operators and how they interact.

Right now, i have this:

[<a id="_ctl0_LatestAdditions_GenericdBGrid__ctl3_articleLink"]>(.*?)<

According to 'The Regex Coach' that will display the information I need, but when i implement it, it does not show it fully; while it does show the information I need, it also shows tons of other information which I didn't (to my understanding) ask for.

So right now, its not a matter of wanting the exact code I need, it's wanting to know exactly what I'm doing wrong and why it's displaying incorrectly. Knowing that information, I can learn from my mistakes instead of just throwing random operators in there until I get what I want.

I'm sorry that you didn't get that from my post. I've been nothing but nice in all of my posts here, and have tried my darnedest to learn the program language, gui design requirements, etc. I would hope that i can extend that same type of mentality into this problem too. Sorry that you did not see that.

-_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë 맧ëñ§ë øƒ !ïƒë.

Link to comment
Share on other sites

  • Moderators

Escaping special characters... Things like < need a \ in front of them to escape them (find special character nodes) ... so it would look like \< not just <

Also making it the easiest expression possible will speed up the process and make it (believe it or not) less error prone (on your behalf).

#include <array.au3>
$s = '<a id="_ctl0_LatestAdditions_GenericdBGrid__ctl3_articleLink" title="INFORMATION ON THE TITLE" class="small_link" href="http://kb.mysite.com/blah">INFORMATION I WANT</a>'
$a = StringRegExp($s, "(?s)(?i)\<a id=._ctl0_LatestAdditions_GenericdBGrid__ctl3_articleLink.*?http:.*?>(.*?)\<", 3)
_ArrayDisplay($a)

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

$a = StringRegExp($s, "(?s)(?i)\<a id=._ctl0_LatestAdditions_GenericdBGrid__ctl3_articleLink.*?http:.*?>(.*?)\<", 3)
_ArrayDisplay($a)

Aha, see now that i can learn from. I've already been taught what (?s) and (?i) are used for. And now that i understand about escaping the <'s it should be easier for me to find what i need. I note, however, that you also left exactly what I wanted to see, ie search for the specified string, and then include whatever after it.

But, why .*?http:.*?

If i read the instructions right, it's asking it to find my exact string there, (not caring what's after it, and before 'http') and then not caring whats after 'http' and then before the >. But, why was the > not escaped?

See, I do want to learn, and examples like these teach me so much more than a manual ever could.

-_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë 맧ëñ§ë øƒ !ïƒë.

Link to comment
Share on other sites

  • Moderators

$a = StringRegExp($s, "(?s)(?i)\<a id=._ctl0_LatestAdditions_GenericdBGrid__ctl3_articleLink.*?http:.*?>(.*?)\<", 3)
_ArrayDisplay($a)

Aha, see now that i can learn from. I've already been taught what (?s) and (?i) are used for. And now that i understand about escaping the <'s it should be easier for me to find what i need. I note, however, that you also left exactly what I wanted to see, ie search for the specified string, and then include whatever after it.

But, why .*?http:.*?

If i read the instructions right, it's asking it to find my exact string there, (not caring what's after it, and before 'http') and then not caring whats after 'http' and then before the >. But, why was the > not escaped?

See, I do want to learn, and examples like these teach me so much more than a manual ever could.

Actually it isn't listed as a special character, but it does have a special use sometimes, so out of habit I escape it.

http://www.cs.tut.fi/~jkorpela/perl/regexp.html

Look at the meta (special) characters, those are what you have to escape.

As far as what I did with .*?

I told it to find anything after

<a id=._ctl0_LatestAdditions_GenericdBGrid__ctl3_articleLink

And anything after

http

That in itself is enough (from the string you provided) to know if we have the right spot or not... _ctl0_LatestAdditions_GenericdBGrid__ctl3_articleLink being the deciding factor on that.

If it was the URL then obviously you would keep that in.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

That in itself is enough (from the string you provided) to know if we have the right spot or not... _ctl0_LatestAdditions_GenericdBGrid__ctl3_articleLink being the deciding factor on that.

If it was the URL then obviously you would keep that in.

Was awesome of you to help explain that, i was able to find a flaw in my requirement, but changed the code to this:

(?s)(?i)\<a class=small_link id=_ctl0_LatestAdditions_GenericdBGrid__ctl.*?>(.*?)\<

I noted that each link had it's own ctl code, so like ctl3, ctl4, etc, so by just moving it over, i got it to display all the listings.

Is this still valid, i've tested it and it seems to work, but should I actually put that notation in there?

(?s)(?i)\<a class=small_link id=_ctl0_LatestAdditions_GenericdBGrid__ctl[0-9].*?>(.*?)\<

-_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë 맧ëñ§ë øƒ !ïƒë.

Link to comment
Share on other sites

  • Moderators

All a backslash does in front of a special character is tell it to look for that character literally.

However, using it for characters such as z or b or r or n or others (check the site I gave) make it take on an entirely different look.

\s says look for a space

\n says look for a line feed

\r says look for a carriage return

\z says look for end of string

etc..

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

If you happen to be wanted by certain people who want to pain you, you might not be the kind of person who trusts regular expressions; for us - another option:

$result = "What the flock went wrong!?"
$myStr = '<a id="_ctl0_LatestAdditions_GenericdBGrid__ctl3_articleLink" title="INFORMATION ON THE TITLE" class="small_link" href="http://kb.mysite.com/blah">INFORMATION I WANT</a>'
$beginStr = StringInStr($myStr, ">")
$beginStr += 1
$endStr = StringInStr($myStr, "</a>")
$countChars = $endStr - $beginStr
$result = StringMid($myStr, $beginStr, $countChars)
MsgBox(0, "My Data", $result)

Das Häschen benutzt Radar

Link to comment
Share on other sites

  • Moderators

If you happen to be wanted by certain people who want to pain you, you might not be the kind of person who trusts regular expressions; for us - another option:

$result = "What the flock went wrong!?"
$myStr = '<a id="_ctl0_LatestAdditions_GenericdBGrid__ctl3_articleLink" title="INFORMATION ON THE TITLE" class="small_link" href="http://kb.mysite.com/blah">INFORMATION I WANT</a>'
$beginStr = StringInStr($myStr, ">")
$beginStr += 1
$endStr = StringInStr($myStr, "</a>")
$countChars = $endStr - $beginStr
$result = StringMid($myStr, $beginStr, $countChars)
MsgBox(0, "My Data", $result)
1. I used to say the same thing about RegExp, and I've found others do too, until they understand how to use them.

2. $result will never = "What the flock went wrong!?" after your StringMid() func

3. You can't really think that he's not parsing an html file, so your attempt would have little effect, in fact, 99.9% of the time would return the wrong result I'm sure :)

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

@smokeN - Just off the top of my head, I'm inclined to feel myself to be corrected officially, by any AutoIt MVP, expecially in consideration that it was the good looks that the gods gave to me - see my avatar - and it is only ever august refined sense that the gods would ever have given to any genuine MVP. But what I was responding to was the first post in this thread where he said he wanted to "grab the part betwen the > and </a>", with respect to the given string there in that post, which I supposed wasn't expected to change much in its general structure (an HTML hyperlink).

And my rather elaborate code was intended as a quick cut a paste for purposes of demonstrating the principle of never stumbling over reg-ex's without having duely been issued a permit to be frustrated.

(A retort to me which is just sometimes employed and hereby suggest to you: "Yup, your stove pipe won't shut.")

Edited by Squirrely1

Das Häschen benutzt Radar

Link to comment
Share on other sites

  • Moderators

@smokeN - Just off the top of my head, I'm inclined to feel myself to be corrected officially, by any AutoIt MVP, expecially in consideration that it was the good looks that the gods gave to me - see my avatar - and it is only ever august refined sense that the gods would ever have given to any genuine MVP. But what I was responding to was the first post in this thread where he said he wanted to "grab the part betwen the > and </a>", with respect to the given string there in that post, which I supposed wasn't expected to change much in its general structure (an HTML hyperlink).

And my rather elaborate code was intended as a quick cut a paste for purposes of demonstrating the principle of never stumbling over reg-ex's without having duely been issued a permit to be frustrated.

(A retort to me which is just sometimes employed and hereby suggest to you: "Yup, your stove pipe won't shut.")

I was just trying to explain the error in your original thought about the first thread, point out the error in your thought in the code, and then point out the error in your thought about regular expressions :) .

I agree that regular expressions can be a pain sometimes, I certainly didn't get them right away. Once you have a "basic" syntax grasp on them though, It is fun turning 5 or 20 line string manipulation functions into one :).

Just a side note, I still don't have regular expressions down completely. StringRegExpReplace throws me for a loop often.

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...