Jump to content

How to extract the text between the HTML tags


Recommended Posts

HI,

I was trying to extract the text that is placed in between the HTML tags.

<title>

eGrabber - Prospects lists lead generation software | business lead generation program | address | list

| email processing

</title>

The code i wrote is:

$title=_StringBetween($html,'<title>','</title>')

MsgBox(0,"title",$title)

I tried using reg expresions also

$title= StringRegExp($html, '<title>(.*?)</title>',3)

But nothing worked out.

Can someone help me in extracting the text between the tags?

Thanks

Link to comment
Share on other sites

Hey i was able to extract the string in between the tags.

modified little bit.

$title=_StringBetween($html,'<title>','</title>')

$page_title=$title[0]

MsgBox(0,"title",$page_title)

But the issue now is it is extracting the line numbers also.

For ex:

5 <title>

5 eGrabber - Prospects lists lead generation software | business lead generation program | address | list

5 | email processing

5 </title>

The result it was displaying is:

5 eGrabber - Prospects lists lead generation software | business lead generation program | address | list

5 | email processing

How do i remove the line numbers from the text?

Thanks

Link to comment
Share on other sites

Hi,

I tried triming the line numbers using the function.

Following is the code.

$title=_StringBetween($html,'<title>','</title>')

$page_title=$title[0]

$Final_Title=StringTrimLeft($page_title, 6)

$Final_Title_1 =StringTrimRight($Final_Title,10)

MsgBox(0,"title",$Final_Title_1)

But here i have guessed and hardcoded the range.

I would like to know is there any function which will automatically checks the range and trim it.

Thanks

Link to comment
Share on other sites

I would think you would only need StringTrimLeft($page_title, 2), no StringTrimRight. But you shouldn't need StringTrimLeft either. What is $html, and where/how do you get it? I've added an example. No line numbers show up.

#Include <String.au3>
#Include <INET.au3>

$html = _StringBetween(_INetGetSource('http://somdcomputerguy.com'), '<title>', '</title>')
MsgBox(0, "title", $html[0])

- Bruce /*somdcomputerguy */  If you change the way you look at things, the things you look at change.

Link to comment
Share on other sites

This works for me.

#Include <String.au3>

$html = "<title>eGrabber - Prospects lists lead generation software | business lead generation program | address | list| email processing</title>"
$title=_StringBetween($html, '<title>', '</title>')
MsgBox(0, "title", $title[0])

- Bruce /*somdcomputerguy */  If you change the way you look at things, the things you look at change.

Link to comment
Share on other sites

  • 9 years later...

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...