Jump to content

Read text from between html tags?


Recommended Posts

I searched the Documentation, and most of the examples I found involve the browser launching and then the source code being read. What I wanted was to do was use _INetGetSource to well, get the source and then using some IE function to read some code from the page. Would that work? In all examples I found you had to use an open browser, or a browser object.

Thanks for any replies!

Link to comment
Share on other sites

Link to comment
Share on other sites

I'm not sure, escp. StringRegExp() will use some cpu-power, but on the other hand I guess that's the same way it's done in the ie.au3 (without looking). Maybe it's a bit slower. But I has been done and you'll save a lot of coding time utilizing it instead of reinventing the wheel :)...

Link to comment
Share on other sites

Hey,

THis should work.

(?s) enables "." to match newlines, might not be needed.

$results = StringRegExp($html,"(?s)(?U)<a.*>(.*)</a>",3)
Thanks, that worked! The only problem I'm having now is that when the text in between the <a></a> tags Has an "|" as in

<a>ABC | 123</a>

It will only show "ABC " why is that? And is there a solution?

Link to comment
Share on other sites

Here is the correct way a viewing your data:

#include <array.au3>

$sTest = "<a src=""bitch.bitch.com"">ABC | 123</a>"

$asResults = StringRegExp($sTest,"(?i:<a.*>)(.*?)(?i:</a>)",3)

_ArrayDisplay($asResults, "Test String", -1, 0, "", "")oÝ÷ Ù8b±×¥z׬¶ªºjº_®²¢êßwb(­«­¢+Ø¥¹±Õ±ÐíÉÉä¹ÔÌÐì((ÀÌØíÍQÍÐôÅÕ½Ðì±ÐíÍÉôÅÕ½ÐìÅÕ½Ðí¥Ñ ¹¥Ñ ¹½´ÅÕ½ÐìÅÕ½ÐìÐí ðÄÈ̱Ðì½ÐìÅÕ½Ðì((ÀÌØíÍIÍÕ±ÑÌôMÑÉ¥¹IáÀ ÀÌØíÍQÍаÅÕ½Ðì ý¤è±Ðí¸¨Ðì¤ ¸¨ü¤ ý¤è±Ðì½Ðì¤ÅÕ½Ðì°Ì¤()}ÉÉå¥ÍÁ±ä ÀÌØíÍIÍÕ±ÑÌ°ÅÕ½ÐíQÍÐMÑÉ¥¹ÅÕ½Ðì°´Ä°À°ÅÕ½ÐìÅÕ½Ðì°ÅÕ½ÐìÅÕ½Ðì

But that deletes the " " from your findings

0x576520616C6C206469652C206C697665206C69666520617320696620796F75207765726520696E20746865206C617374207365636F6E642E

Link to comment
Share on other sites

What I'm trying to do is read the result Titles from the results of a google search page. This is the code I'm using:

$html = _INetGetSource("http://www.google.com/search?hl=en&q=spiderman")

$results = StringRegExp($html,"(?s)(?U)<li.*><h3.*><a.*>(.*)</a>", 3)
_ArrayDisplay($results)

But this is what it returns, and it is not completely correct:

post-44565-1234066454_thumb.gif

Link to comment
Share on other sites

So you need something more like ((?:)(<em>)|(<b>)))?

Didn't test it though.

Edit: This is quite sophisticating :

#include <INET.au3>
#include <Array.au3>

$html = _INetGetSource("http://www.google.com/search?hl=en&q=spiderman")
$results = StringRegExp($html,"(?s)(?U)<li.*><h3.*><a.*>(.*)</a>", 3)

For $i = 0  To UBound($results)-1
    $results[$i] = StringRegExpReplace($results[$i], '((</?b>)|(</?em>))', '')
Next

_ArrayDisplay($results)
Edited by Authenticity
Link to comment
Share on other sites

I have nothing against StringRegEx, but this is how it would be done with IE.au3...

#include <IE.au3>

$oIE = _IECreate("http://www.google.com/search?hl=en&q=spiderman")
$oLinks = _IELinkGetCollection($oIE)
For $oLink in $oLinks
    If String($oLink.classname) = "l" Then ConsoleWrite(_IEPropertyGet($oLink, "innertext") & @CRLF)
Next

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

I have nothing against StringRegEx, but this is how it would be done with IE.au3...

#include <IE.au3>

$oIE = _IECreate("http://www.google.com/search?hl=en&q=spiderman")
$oLinks = _IELinkGetCollection($oIE)
For $oLink in $oLinks
    If String($oLink.classname) = "l" Then ConsoleWrite(_IEPropertyGet($oLink, "innertext") & @CRLF)
Next

Dale

Is there a way to make it invisible?
Link to comment
Share on other sites

Link to comment
Share on other sites

Is there a way to make it invisible?

Yes, look at the docs for _IECreate()

Dale

@Sm0ke - honest!

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

Thanks guys for being patient and all the help. I promise, this is the last question, LOL. Anyway, I'm trying to detect the double-click on a List View item and then output (using ConsoleWrite) the name of the item. For instance: I double click an item named "Free Games" and "Free Games" is the output. I tried searching the forum, but the code I found was waaaayyy out dated and threw a bunch of errors. Any ideas? Once again, thanks guys!

Edited by motionman95
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...