Sign in to follow this  
Followers 0
chynawhyte

Is there a better way to do this?

2 posts in this topic

I have a collection of about 700-900 old static html pages that I need to pull information from to add to an excel spreadsheet. The layout of the information is farily the same on 80-90% of the pages however, some pages may have some differences.

Examples of the code in the html files:

<h1 align="center">DPI  4728 - AL </h1>
<p align="center">3350 Boyington Drive Suite 200<br />
Carrollton, TX 75006</p>
<p align="center">Repair: 800-414-2065 </p>
<hr />
<table width="100%" border="0">
  <tr>
    <td align="left" valign="top" bordercolor="#bddc7e" bgcolor="#bddc7e"><p><strong>BTN:</strong></p>
        <p>6110684728,050908</p></td>
    <td align="left" valign="top" bordercolor="#bddc7e" bgcolor="#bddc7e"><p><strong>BTN:</strong></p>
        <p>&nbsp;</p></td>
    <td align="left" valign="top" bgcolor="#bddc7e"><p><strong>BTN:</strong></p>
        <p>&nbsp;</p></td>
  </tr>
  <tr>
    <td align="left" valign="top" bordercolor="#bddc7e" bgcolor="#bddc7e"><p><strong>TXJUR:</strong></p>
        <p>XM07</p></td>
    <td align="left" valign="top" bordercolor="#bddc7e" bgcolor="#bddc7e"><p><strong>Vendor SPID:</strong></p>
        <p>N/A</p></td>
    <td align="left" valign="top" bgcolor="#bddc7e"><p><strong>Vendor Type:</strong></p>
        <p>Resale</p></td>
  </tr>

ideally, I would like to pull the text from the page and enter each onto a cell on a spreadsheet. For example, the "DPI 4728 - AL" would be entered in one cell, then the "3350 Boyington Drive Suite 200" into another, then the "Carrollton, TX 75006" etc etc until all of the information is pulled.

I am fairly new to AutoIt and have created a few scripts but I want to make sure there isnt an easier way before i spend too much time writing a lengthy script.

I've used the INETGetsource function with the StringBetween function to search for specific text, then copied to clipboard and paste on an excel spreadsheet. With all of the information on the page, this seems pretty lengthy:

$s_URL = _ExcelReadCell($excelfile, $row, 1)
$html = _INETGetSource ($s_URL)
$s_VendorTypeStart="<p><strong>Vendor Type:</strong></p><p>"
$s_VendorTypeEnd="</p>"
$Array1 = _StringBetween($html, $s_VendorTypeStart, $s_VendorTypeEnd)
_ArrayToClip ($Array1)
$result1 = (ClipGet())

WinWaitActive("Microsoft Excel - Vendors.xls")
$vendortype = _ExcelWriteCell($excelfile, $result1, $row, 9)

Is there an easier and faster way to do this??

Any help or guidance would be much appreciated!

Thanks in advance!

Share this post


Link to post
Share on other sites



If everything is inside a table.....

Look at _IETableWriteToArray - This could be better, it will get all the information from the table and dump it too an array. Which you can then manipulate into excell.


They call me MrRegExpMan

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0