Sign in to follow this  
Followers 0
Sparrowlord

Need StringRegExp Help

9 posts in this topic

I'm trying to get the content between two tags using StringRegExp, and I'm running into a problem.. it's not working. :D

<h1 class="r_outline">
 // lots of random crap between
</h1>

I tried the following with no luck:

StringRegExp($source, '<(?i)h1 class="r_outline">(.*?)</(?i)h1>')

Help? :D

Share this post


Link to post
Share on other sites



. does not match new line by default. Try adding (?s) in front of your regexp.

I tried adding "(?s)" in front of my regexp, that didn't work.. any more suggestions?

Share this post


Link to post
Share on other sites

Try with _StringBetween function

#include <String.au3>
#include <Array.au3>
$source = '<h1 class="r_outline">// lots of random crap between</h1>'
$aArray1 = _StringBetween ($source,'<h1 class="r_outline">','</h1>')
_ArrayDisplay($aArray1)

It's work fine on my pc.

Share this post


Link to post
Share on other sites

I can't get any suggestions to work, and I'm not quite sure if it's because there's some sort of spaces in front of it..

<h1 class="r_outline">
        // stuff I want here
      </h1>

That's exactly how it appears when I view the page source.

Share this post


Link to post
Share on other sites

Try with this :

#include <String.au3>
#include <Array.au3>
#include <INet.au3>
$source = _INetGetSource ("http://www.xxxxx.com")
$aArray1 = _StringBetween ($source,'<h1 class="r_outline">','</h1>')
_ArrayDisplay($aArray1)

If you don't show us the pagesource is hard to help you.

Share this post


Link to post
Share on other sites

Don't forget the 3rd parameter of StringRegExp() ( Not using it uses zero as default and only returns a boolean for found or not found) (Use it like Manadar has his).

And then try this expression:

"(?s)(?i)<h1\W*class=\x22r_outline\x22>.+?//\W*(.+?)\s*</h1>"

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites

I figured out the problem, it appears when _IEDocReadHTML() is executed it changed all of the source code ( which I copied mine from firefox ).. once I wrote the output to a file from _IEDocReadHTML() it was noticeable.

It made my tag all capital letters, and removed the quotes around "r_outline". I adjusted this accordingly and all is working well now.

Many thanks. :D

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0