Sign in to follow this  
Followers 0
arunachandu

Extract data from the web page and store it in notepad

11 posts in this topic

Can someone tell me how do i extract data from the web page and store it in notepad?

Share this post


Link to post
Share on other sites



Hello arunachandu,

take a look at the helpfile, see documentation for InetRead or InetGet. You can save the recieved data as a text file with FileWrite, FileWriteLine, ...

:unsure:


Regards,Hannes[spoiler]If you can't convince them, confuse them![/spoiler]

Share this post


Link to post
Share on other sites

You can use _INetGetSource or some IE functions ! Posted Image

What type of data ?


AutoIt 3.3.14.2 X86 - SciTE 3.6.0WIN 8.1 X64 - Other Example Scripts

Share this post


Link to post
Share on other sites

You can use _INetGetSource or some IE functions ! Posted Image

What type of data ?

I passed a string in the google search and search results will be displayed. I want to store all the links(urls) of the search results in a notepad or excel sheet.

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

I passed a string in the google search and search results will be displayed. I want to store all the links(urls) of the search results in a notepad or excel sheet.

Can you show your script that i adapt to it...

Edit i remember you ! Posted Image

Try this :

#include <IE.au3>

$oIE = _IECreate ("http://www.google.com")
$oForm = _IEFormGetObjByName ($oIE, "f")
$oQuery = _IEFormElementGetObjByName ($oForm, "q")
_IEFormElementSetValue ($oQuery, "AutoIt IE.au3")
_IEFormSubmit ($oForm )
$oLinks = _IELinkGetCollection ($oIE)
$iNumLinks = @extended
MsgBox(0, "Link Info", $iNumLinks & " links found")
For $oLink In $oLinks
    MsgBox(0, "Link Info", $oLink.href)
Next
Edited by wakillon

AutoIt 3.3.14.2 X86 - SciTE 3.6.0WIN 8.1 X64 - Other Example Scripts

Share this post


Link to post
Share on other sites

A way to filter links ! Posted Image

#include <IE.au3>
#include <Array.au3>

$oIE = _IECreate ("http://www.google.com")
$oForm = _IEFormGetObjByName ($oIE, "f")
$oQuery = _IEFormElementGetObjByName ($oForm, "q")
_IEFormElementSetValue ($oQuery, "AutoIt IE.au3")
_IEFormSubmit ($oForm )
$oLinks = _IELinkGetCollection ($oIE)
$iNumLinks = @extended
$_Display=0
Dim $_LinkArray[1]

For $oLink In $oLinks
    If $_Display Then _ArrayAdd ( $_LinkArray, $oLink.href )
    If StringInStr ( $oLink.href, 'advanced_search?' ) <> 0 Then $_Display=1
    If StringInStr ( $oLink.href, 'google.com/support' ) <> 0 Then ExitLoop
Next

$_LinkArray = _DeleteArrayElementWithStringInstr ( $_LinkArray, 'webcache.' )
$_LinkArray = _DeleteArrayElementWithStringInstr ( $_LinkArray, 'search?' )
_ArrayDisplay ( $_LinkArray )
_IEQuit ( $oIE )

Exit

Func _DeleteArrayElementWithStringInstr ( $_Array, $_String )
    Local $_Item
    For $_Element In $_Array
        If StringInStr ( $_Element, $_String ) <> 0 Then
            _ArrayDelete ( $_Array, $_Item )
        Else
            $_Item+=1
        EndIf
    Next
    Return $_Array
EndFunc ;==> _DeleteArrayElementWithStringInstr ( )

AutoIt 3.3.14.2 X86 - SciTE 3.6.0WIN 8.1 X64 - Other Example Scripts

Share this post


Link to post
Share on other sites

Thanks a lot for the reply.i just started leaning script.so getting all silly doubts.

I tried extracting the headings in the web page.

The code is as follows:

$oIE = _IECreate ("$url")

$heading=_IEGetObjById ($oIE, "summary")-----to extract only h2 level headings and the id for that is summary

$result=_IEPropertyGet($heading, "innertext")

MsgBox(0,"heading",$result)

the result it is showing in the message box is "0"

how do i get the content in the h2 tag?

Share this post


Link to post
Share on other sites

Without url, i can't try anything...


AutoIt 3.3.14.2 X86 - SciTE 3.6.0WIN 8.1 X64 - Other Example Scripts

Share this post


Link to post
Share on other sites

This is the code i wrote:

#include <IE.au3>

$oIE = _IECreate("www.google.com")

$html=_IEBodyReadText($oIE)

$htmlfile=FileOpen("..\Htmlfile.txt",1)

if $htmlfile = -1 Then

MsgBox(0, "Error", "Unable to open file.")

Exit

EndIf

FileWrite($htmlfile, $html)

$testurl=StringRegExp($htmlfile, '(?i)(?s)A faster way to browse the web',1)

MsgBox(0,"data",$testurl)

I was able to open the page write into the file but not able pick the text and display in the message.

The message it was showing is 0.

Can you please help me in this?

Thanks

Aruna

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

Unless you need the page text later, there's no need to write to a file, but that's pretty much where the problem stems from.

StringRegExp() will not open a file to do a search, in the usage you have above the function is just matching against the name of the file. ( ..\Htmlfile.txt )

Looking in the help you can see what your return of "0" is referring to, though it depends on the mode you're matching in.

I'm not sure what you're trying to match exactly, but to give you an example using google, try this:

#include <IE.au3>
#include <array.au3>

$oIE = _IECreate("www.google.com",0,0)
$html=_IEBodyReadText($oIE)


MsgBox(0,"",$html)

$testurl=StringRegExp($html, '(?i).*search.*',3)

_IEQuit($oIE)
_ArrayDisplay($testurl)
Edited by bwochinski

Share this post


Link to post
Share on other sites

bwochinski is right, no need to write to a file.

I have tried _IEBodyReadText and IEBodyReadHTML and i don't find the string you want !

could you be more precise ? Posted Image


AutoIt 3.3.14.2 X86 - SciTE 3.6.0WIN 8.1 X64 - Other Example Scripts

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0