Sign in to follow this  
Followers 0
motionman95

Read text from between html tags?

19 posts in this topic

I searched the Documentation, and most of the examples I found involve the browser launching and then the source code being read. What I wanted was to do was use _INetGetSource to well, get the source and then using some IE function to read some code from the page. Would that work? In all examples I found you had to use an open browser, or a browser object.

Thanks for any replies!


Share this post


Link to post
Share on other sites



Any special reason why not to use the IE object? You can do it hidden in the background, you don't have to invoke a GUI.

_IEDocReadHTML ( ByRef $o_object )

Share this post


Link to post
Share on other sites

Any special reason why not to use the IE object? You can do it hidden in the background, you don't have to invoke a GUI.

_IEDocReadHTML ( ByRef $o_object )

Is it faster than using _INetGetSource and StringRegExp?


Share this post


Link to post
Share on other sites

I'm not sure, escp. StringRegExp() will use some cpu-power, but on the other hand I guess that's the same way it's done in the ie.au3 (without looking). Maybe it's a bit slower. But I has been done and you'll save a lot of coding time utilizing it instead of reinventing the wheel :)...

Share this post


Link to post
Share on other sites

I'm having problems. I'm using this code:

$results = StringRegExp($html,"(?U)<a>(.*)</a>",3)

But where it says

<a(HERE)>(.*)</a>

I want to put something that means "any thing can be here"

Help please?


Share this post


Link to post
Share on other sites

Hey,

THis should work.

(?s) enables "." to match newlines, might not be needed.

$results = StringRegExp($html,"(?s)(?U)<a.*>(.*)</a>",3)

Share this post


Link to post
Share on other sites

Hey,

THis should work.

(?s) enables "." to match newlines, might not be needed.

$results = StringRegExp($html,"(?s)(?U)<a.*>(.*)</a>",3)
Thanks, that worked! The only problem I'm having now is that when the text in between the <a></a> tags Has an "|" as in

<a>ABC | 123</a>

It will only show "ABC " why is that? And is there a solution?


Share this post


Link to post
Share on other sites

$html = '<a>ABC | 123</a>'

$results = StringRegExp($html,'(?s)(?U)<a.*>(.*)</a>', 1)
Dim $Text = ''
For $i = 0 To UBound($results)-1
    $Text &= $results[$i] & @CRLF
Next
MsgBox(0x40, 'Title', $Text)

It does match the '|' and ' acb' as well but the _ArrayDisplay() is referring to '|' as special character.

Share this post


Link to post
Share on other sites

Ahhh, thank you Authenticity this would have gotten me all night if I didn't refresh this

$asResults = StringRegExp($sTest,"(?i:<a.*>)(.*?)(?i:</a>)",3)

0x576520616C6C206469652C206C697665206C69666520617320696620796F75207765726520696E20746865206C617374207365636F6E642E

Share this post


Link to post
Share on other sites

Here is the correct way a viewing your data:

#include <array.au3>

$sTest = "<a src=""bitch.bitch.com"">ABC | 123</a>"

$asResults = StringRegExp($sTest,"(?i:<a.*>)(.*?)(?i:</a>)",3)

_ArrayDisplay($asResults, "Test String", -1, 0, "", "")oÝ÷ Ù8b±×¥z׬¶ªºjº_®²¢êßwb(­«­¢+Ø¥¹±Õ±ÐíÉÉä¹ÔÌÐì((ÀÌØíÍQÍÐôÅÕ½Ðì±ÐíÍÉôÅÕ½ÐìÅÕ½Ðí¥Ñ ¹¥Ñ ¹½´ÅÕ½ÐìÅÕ½ÐìÐí ðÄÈ̱Ðì½ÐìÅÕ½Ðì((ÀÌØíÍIÍÕ±ÑÌôMÑÉ¥¹IáÀ ÀÌØíÍQÍаÅÕ½Ðì ý¤è±Ðí¸¨Ðì¤ ¸¨ü¤ ý¤è±Ðì½Ðì¤ÅÕ½Ðì°Ì¤()}ÉÉå¥ÍÁ±ä ÀÌØíÍIÍÕ±ÑÌ°ÅÕ½ÐíQÍÐMÑÉ¥¹ÅÕ½Ðì°´Ä°À°ÅÕ½ÐìÅÕ½Ðì°ÅÕ½ÐìÅÕ½Ðì

But that deletes the " " from your findings


0x576520616C6C206469652C206C697665206C69666520617320696620796F75207765726520696E20746865206C617374207365636F6E642E

Share this post


Link to post
Share on other sites

What I'm trying to do is read the result Titles from the results of a google search page. This is the code I'm using:

$html = _INetGetSource("http://www.google.com/search?hl=en&q=spiderman")

$results = StringRegExp($html,"(?s)(?U)<li.*><h3.*><a.*>(.*)</a>", 3)
_ArrayDisplay($results)

But this is what it returns, and it is not completely correct:

post-44565-1234066454_thumb.gif


Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

So you need something more like ((?:)(<em>)|(<b>)))?

Didn't test it though.

Edit: This is quite sophisticating :

#include <INET.au3>
#include <Array.au3>

$html = _INetGetSource("http://www.google.com/search?hl=en&q=spiderman")
$results = StringRegExp($html,"(?s)(?U)<li.*><h3.*><a.*>(.*)</a>", 3)

For $i = 0  To UBound($results)-1
    $results[$i] = StringRegExpReplace($results[$i], '((</?b>)|(</?em>))', '')
Next

_ArrayDisplay($results)
Edited by Authenticity

Share this post


Link to post
Share on other sites

I have nothing against StringRegEx, but this is how it would be done with IE.au3...

#include <IE.au3>

$oIE = _IECreate("http://www.google.com/search?hl=en&q=spiderman")
$oLinks = _IELinkGetCollection($oIE)
For $oLink in $oLinks
    If String($oLink.classname) = "l" Then ConsoleWrite(_IEPropertyGet($oLink, "innertext") & @CRLF)
Next

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

I have nothing against StringRegEx

:):)^_^

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites

I have nothing against StringRegEx, but this is how it would be done with IE.au3...

#include <IE.au3>

$oIE = _IECreate("http://www.google.com/search?hl=en&q=spiderman")
$oLinks = _IELinkGetCollection($oIE)
For $oLink in $oLinks
    If String($oLink.classname) = "l" Then ConsoleWrite(_IEPropertyGet($oLink, "innertext") & @CRLF)
Next

Dale

Is there a way to make it invisible?

Share this post


Link to post
Share on other sites

Yup, but the object is going to be loaded to the memory anyway. $oIE = _IECreate(Default, 0, 0, 0, 0)

Share this post


Link to post
Share on other sites

Share this post


Link to post
Share on other sites

Is there a way to make it invisible?

Yes, look at the docs for _IECreate()

Dale

@Sm0ke - honest!


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

#19 ·  Posted (edited)

Thanks guys for being patient and all the help. I promise, this is the last question, LOL. Anyway, I'm trying to detect the double-click on a List View item and then output (using ConsoleWrite) the name of the item. For instance: I double click an item named "Free Games" and "Free Games" is the output. I tried searching the forum, but the code I found was waaaayyy out dated and threw a bunch of errors. Any ideas? Once again, thanks guys!

Edited by motionman95

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0