Sign in to follow this  
Followers 0
arunachandu

How to extract the text between the HTML tags

6 posts in this topic

HI,

I was trying to extract the text that is placed in between the HTML tags.

<title>

eGrabber - Prospects lists lead generation software | business lead generation program | address | list

| email processing

</title>

The code i wrote is:

$title=_StringBetween($html,'<title>','</title>')

MsgBox(0,"title",$title)

I tried using reg expresions also

$title= StringRegExp($html, '<title>(.*?)</title>',3)

But nothing worked out.

Can someone help me in extracting the text between the tags?

Thanks

Share this post


Link to post
Share on other sites



Hey i was able to extract the string in between the tags.

modified little bit.

$title=_StringBetween($html,'<title>','</title>')

$page_title=$title[0]

MsgBox(0,"title",$page_title)

But the issue now is it is extracting the line numbers also.

For ex:

5 <title>

5 eGrabber - Prospects lists lead generation software | business lead generation program | address | list

5 | email processing

5 </title>

The result it was displaying is:

5 eGrabber - Prospects lists lead generation software | business lead generation program | address | list

5 | email processing

How do i remove the line numbers from the text?

Thanks

Share this post


Link to post
Share on other sites

Good to see you figured it out.

Look up StringTrimLeft in the help file.


#include <ByteMe.au3>

Share this post


Link to post
Share on other sites

Hi,

I tried triming the line numbers using the function.

Following is the code.

$title=_StringBetween($html,'<title>','</title>')

$page_title=$title[0]

$Final_Title=StringTrimLeft($page_title, 6)

$Final_Title_1 =StringTrimRight($Final_Title,10)

MsgBox(0,"title",$Final_Title_1)

But here i have guessed and hardcoded the range.

I would like to know is there any function which will automatically checks the range and trim it.

Thanks

Share this post


Link to post
Share on other sites

I would think you would only need StringTrimLeft($page_title, 2), no StringTrimRight. But you shouldn't need StringTrimLeft either. What is $html, and where/how do you get it? I've added an example. No line numbers show up.

#Include <String.au3>
#Include <INET.au3>

$html = _StringBetween(_INetGetSource('http://somdcomputerguy.com'), '<title>', '</title>')
MsgBox(0, "title", $html[0])

- Bruce /*somdcomputerguy */  If you change the way you look at things, the things you look at change.

Share this post


Link to post
Share on other sites

This works for me.

#Include <String.au3>

$html = "<title>eGrabber - Prospects lists lead generation software | business lead generation program | address | list| email processing</title>"
$title=_StringBetween($html, '<title>', '</title>')
MsgBox(0, "title", $title[0])

- Bruce /*somdcomputerguy */  If you change the way you look at things, the things you look at change.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0