Jump to content

Copy paste data from a web page


Duff360
 Share

Recommended Posts

Hello,

I'm still very new with AutoIT.

I would like to copy and paste data from a website page to Excel (or Notepad).

 

Example with this page: http://www.sciencedirect.com/science/article/pii/S1750946716301660

I would like to copy the title, volume issue and link to an Excel file automatically.

Colum A : The New Zealand minds for minds autism spectrum disorder self-reported cohort

Colum B : Volume 36, April 2017, Pages 1–7

Colum C: http://dx.doi.org/10.1016/j.rasd.2016.12.003

 

Thank you very much!

Link to comment
Share on other sites

Unfortunately Column C: would be difficult to get as it's a dynamic link and not within the page source the others are fairly easy to get using IE GetElementsbyTagName which then can be used to add to Excel, for example:

#include <Array.au3>
#include <IE.au3>
$oIE = _IECreate ("http://www.sciencedirect.com/science/article/pii/S1750946716301660", 0, 0, 1, 0)
    If IsObj($oIE) = False Then Exit
    Local $aValues[1][3]
    $aValues[0][0] = _GetInnerText('p',  'volIssue')
    $aValues[0][1] = _GetInnerText('h1', 'svTitle')
    $aValues[0][2] = _GetInnerText('a',  'cLink')
_IEQuit ($oIE)
_ArrayDisplay($aValues)

Func _GetInnerText($sTagName, $sTagValue, $sTagType = 'className')
    Local $sResult, $oIETags
    $oIETags = $oIE.document.GetElementsByTagName($sTagName)
    For $oIEItem in $oIETags
        Switch $sTagType
            Case 'className'
                $sResult = $oIEItem.className
            Case 'id'
                $sResult = $oIEItem.id
        EndSwitch
        Switch $sTagName
            Case 'a'
                If $sResult = $sTagValue Then Return $oIEItem.href
            Case Else
                If $sResult = $sTagValue Then Return $oIEItem.innerText
        EndSwitch
    Next
    Return ''
EndFunc

 

Link to comment
Share on other sites

Hello @Subz

Thank you very much for your answer. It is much more complicated than I expected.
Your example works perfectly with ScienceDirect, but I wasn't able to adapt your script to my other application.

First, my page is protected by a password. So I don't think I can use $oIE = _IECreate
Is there a function like WinActive but for IE?

Second, if my code is:

<div class="file-entry">
<div class="section title">
<div class="data"> <h3>Green Technologies for Sustainable Water Management</h3>

Would my code be: 

    $aValues[0][2] = _GetInnerText('h3',  'section title')

  Thank you very much for your help! :)

Edited by Duff360
Link to comment
Share on other sites

Hi Duff360

Just need to load the page visible, please code below.  You can then enter your username and password, then add a WinWaitActive for the title page before grabbing the data.

$oIE = _IECreate ("http://www.sciencedirect.com/science/article/pii/S1750946716301660", 0)

Generally speaking you want to use id rather than class, since there should only be one id per page, whereas classes can be used multiple times.  However if it isn't available you can use a class, in your example above your code would want to be the following tagname div and the class data:

$aValues[0][2] = _GetInnerText('div',  'data')

InnerText will strip HTML code so it will only return "Green Technologies for Sustainable Water Management" removing the <h3> tags.

Anyway I hope that kind of makes sense.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...