Sparrowlord Posted August 31, 2009 Posted August 31, 2009 I'm trying to get the content between two tags using StringRegExp, and I'm running into a problem.. it's not working. <h1 class="r_outline"> // lots of random crap between </h1> I tried the following with no luck: StringRegExp($source, '<(?i)h1 class="r_outline">(.*?)</(?i)h1>') Help?
jvanegmond Posted August 31, 2009 Posted August 31, 2009 . does not match new line by default. Try adding (?s) in front of your regexp. github.com/jvanegmond
Sparrowlord Posted August 31, 2009 Author Posted August 31, 2009 . does not match new line by default. Try adding (?s) in front of your regexp.I tried adding "(?s)" in front of my regexp, that didn't work.. any more suggestions?
jvanegmond Posted August 31, 2009 Posted August 31, 2009 It works when I try it: #include <Array.au3> $source = '<h1 class="r_outline">' & @CRLF & _ ' // lots of random crap between' & @CRLF & _ '</h1>'& @CRLF $regexp = StringRegExp($source, '(?s)<(?i)h1 class="r_outline">(.*?)</(?i)h1>', 3) If @error Then MsgBox(0,"", @error) _ArrayDisplay($regexp) github.com/jvanegmond
AuToItItAlIaNlOv3R Posted August 31, 2009 Posted August 31, 2009 Try with _StringBetween function #include <String.au3> #include <Array.au3> $source = '<h1 class="r_outline">// lots of random crap between</h1>' $aArray1 = _StringBetween ($source,'<h1 class="r_outline">','</h1>') _ArrayDisplay($aArray1) It's work fine on my pc.
Sparrowlord Posted August 31, 2009 Author Posted August 31, 2009 I can't get any suggestions to work, and I'm not quite sure if it's because there's some sort of spaces in front of it.. <h1 class="r_outline"> // stuff I want here </h1> That's exactly how it appears when I view the page source.
AuToItItAlIaNlOv3R Posted August 31, 2009 Posted August 31, 2009 Try with this : #include <String.au3> #include <Array.au3> #include <INet.au3> $source = _INetGetSource ("http://www.xxxxx.com") $aArray1 = _StringBetween ($source,'<h1 class="r_outline">','</h1>') _ArrayDisplay($aArray1) If you don't show us the pagesource is hard to help you.
Moderators SmOke_N Posted August 31, 2009 Moderators Posted August 31, 2009 Don't forget the 3rd parameter of StringRegExp() ( Not using it uses zero as default and only returns a boolean for found or not found) (Use it like Manadar has his).And then try this expression:"(?s)(?i)<h1\W*class=\x22r_outline\x22>.+?//\W*(.+?)\s*</h1>" Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.
Sparrowlord Posted August 31, 2009 Author Posted August 31, 2009 I figured out the problem, it appears when _IEDocReadHTML() is executed it changed all of the source code ( which I copied mine from firefox ).. once I wrote the output to a file from _IEDocReadHTML() it was noticeable. It made my tag all capital letters, and removed the quotes around "r_outline". I adjusted this accordingly and all is working well now. Many thanks.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now