Jump to content

Need help using html commands


lilx
 Share

Recommended Posts

I need to have some information about some products of a site, instate of typing it all over, hours of copy and paste i thought that autoit might give me a help i need to do the following.

1. I need to go to Website

There are all their products located. For example we start at the first product.

2. It’s called 1_4_gn_265x162mm, the link for that article is Website at first article as you can see the site do have a simple linking systeem ( homesite + product. )

3. Now on that page the information I want are located at <div class=”middle” ….. in that div there is <td align=”center” …. and <td align=”middle” …. In this part are the article information located.

4. Now I want to copy them and save the information in a .txt file. I have the knowledge enough to do that.

5. Go to the other product and repeat.

Some of you guys maybe thinking is it allowed or for what are you doing it?

Well I don’t know if it is allowed but I am not doing any damage to the site en there are no copyrights on those products as far as I know. Second i need to build a database with all these products in it and this way if the most efficient for me.

And my question for you guys is; I tired I couple of times to work with the html stuff on autoit. But I never that manage to understand it. Maybe I little example or the commands I will be needing would be appreciated.

Greetings

Link to comment
Share on other sites

My first plan was to use the _IE functions, but that had some drawbacks, so I decided this was a better method.

It is terribly slow, but I think that has to do with the website. I've commented as clearly as I could.

#include <INet.au3>
#include <String.au3>
#include <Array.au3>

$Links = _GetLinks() ;get the links we want
For $i = 0 To UBound($Links)-1 ;exectute the following function once for each link
    $Details = _GetDetails($Links[$i])  ;get all the details for one item at a time
    _ArrayDisplay($Details) ;Display the results for testing
    ; Save your data the way you like
Next

; Returns an array of relative links to items (with leading "/")
Func _GetLinks()
    Local $Source = _INetGetSource('http://www.horecaworld.biz/a_z.php') ;download the source containing the links
    Local $Links = _StringBetween($Source, '<Table>', '</Table>')   ;Look for the table containing the links
    $Links = _StringBetween($Links[0],'<div><a href="', '">')   ;Create an array of all the links in the table
    Return $Links   ;return the table
EndFunc

; Returns an array of details of an item
Func _GetDetails($link)
    Local $Details[5] ;create an array to hold the data
    Local $Source = _INetGetSource('http://www.horecaworld.biz' & $Link)    ;get the source of an item page
    $Source = _StringBetween($Source, '<td align="center">', '</table>')    ; get the table that contains the details (or most of the table anyways)
    Local $TempArray =  _StringBetween($Source[0], '<a href="', '"')    ;get the link to the product image
    $Details[0] = $TempArray[0]
    $TempArray = _StringBetween($Source[0], '<div class="pd_name">', '</div>') ;get the product name
    $Details[1] = $TempArray[0]
    $TempArray = _StringBetween($Source[0], '<div class="pd_art">', '</div>')   ;get the product number (You can use StringReplace to replace &nbsp with " ")
    $Details[2] = $TempArray[0]
    $TempArray = _StringBetween($Source[0], '<div class="pd_price">', '</div>') ; Get the price. (Stringreplace to add € instead of the code.)
    $Details[3] = $TempArray[0]
    $TempArray = _StringBetween($Source[0], '<div class="pd_details">', '</div>') ;get the details. (Stringreplace to cut out unwanted <br />)
    $Details[4] = $TempArray[0]
    Return $Details
EndFunc
Link to comment
Share on other sites

thanks for your reply tvern,

thank you for the example it saves me now a lot of manueel boring work.

i am not a pro in autoit but much of these commands i know of and i will look at the code try to work from that.

Greetings

Link to comment
Share on other sites

My first plan was to use the _IE functions, but that had some drawbacks, so I decided this was a better method.

It is terribly slow, but I think that has to do with the website. I've commented as clearly as I could.

#include <INet.au3>
#include <String.au3>
#include <Array.au3>

$Links = _GetLinks() ;get the links we want
For $i = 0 To UBound($Links)-1 ;exectute the following function once for each link
    $Details = _GetDetails($Links[$i])  ;get all the details for one item at a time
    _ArrayDisplay($Details) ;Display the results for testing
    ; Save your data the way you like
Next

; Returns an array of relative links to items (with leading "/")
Func _GetLinks()
    Local $Source = _INetGetSource('http://www.horecaworld.biz/a_z.php') ;download the source containing the links
    Local $Links = _StringBetween($Source, '<Table>', '</Table>')   ;Look for the table containing the links
    $Links = _StringBetween($Links[0],'<div><a href="', '">')   ;Create an array of all the links in the table
    Return $Links   ;return the table
EndFunc

; Returns an array of details of an item
Func _GetDetails($link)
    Local $Details[5] ;create an array to hold the data
    Local $Source = _INetGetSource('http://www.horecaworld.biz' & $Link)    ;get the source of an item page
    $Source = _StringBetween($Source, '<td align="center">', '</table>')    ; get the table that contains the details (or most of the table anyways)
    Local $TempArray =  _StringBetween($Source[0], '<a href="', '"')    ;get the link to the product image
    $Details[0] = $TempArray[0]
    $TempArray = _StringBetween($Source[0], '<div class="pd_name">', '</div>') ;get the product name
    $Details[1] = $TempArray[0]
    $TempArray = _StringBetween($Source[0], '<div class="pd_art">', '</div>')   ;get the product number (You can use StringReplace to replace &nbsp with " ")
    $Details[2] = $TempArray[0]
    $TempArray = _StringBetween($Source[0], '<div class="pd_price">', '</div>') ; Get the price. (Stringreplace to add € instead of the code.)
    $Details[3] = $TempArray[0]
    $TempArray = _StringBetween($Source[0], '<div class="pd_details">', '</div>') ;get the details. (Stringreplace to cut out unwanted <br />)
    $Details[4] = $TempArray[0]
    Return $Details
EndFunc

Thats a good script. I was just wondering what you see as the drawbacks in the IE functions.

Post your code because code says more then your words can. SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y. Use Opt("MustDeclareVars", 1)[topic="84960"]Brett F's Learning To Script with AutoIt V3[/topic][topic="21048"]Valuater's AutoIt 1-2-3, Class... is now in Session[/topic]Contribution: [topic="87994"]Get SVN Rev Number[/topic], [topic="93527"]Control Handle under mouse[/topic], [topic="91966"]A Presentation using AutoIt[/topic], [topic="112756"]Log ConsoleWrite output in Scite[/topic]

Link to comment
Share on other sites

You may find this a bit faster up to getting the links and no need to include _INet.au3.

$sURL = "http://www.horecaworld.biz/a_z.php"
$sSource = BinaryToString(InetRead($sURL)) ;; Read the source

$aTable = StringRegExp($sSource, "(?i)(?s)<table>(.+?<a href.+?)\s*</table>", 1) ;; Get the table, this appears to be the right table.
If NOT @Error Then
    $aLinks = StringRegExp($aTable[0], "(?i)div>\s*<a.+\x22(.+?)\x22>", 3) ;; Get the links into an array
EndIf

After that point I'm not quite sure what you want to do. Also if you need the url AND the text from those links, it can be done with a slight modification to the SRE.

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

hi guys, thanks the reply

i worked out the first example that tvern gave me, and it works quite nice for me i must tell.

when i posted my question i still was thinking on how i wanted to store the information.

i came up with a excel file so it can be imported when needed to the database.

#include <INet.au3>
#include <String.au3>
#include <Array.au3>
#include <Excel.au3>

HotKeySet ( "{ESC}", "Sluit" )

;Open Excel File
$bestandslocatie = FileSaveDialog( "Choose a name.", @DesktopDir, "All (*.*)", 2)
Sleep ( 1000 )
Local $oExcel = _ExcelBookOpen($bestandslocatie)

$i = 2 ; First line in the Excel file are heads.
$Links = _GetLinks() ;get the links we want
For $i = 0 To UBound($Links)-1 ;exectute the following function once for each link
    $Details = _GetDetails($Links[$i])  ;get all the details for one item at a time
    ;_ArrayDisplay($Details) ;Display the results for testing
    ; Save your data the way you like
    InetGet ( $Details[0], "C:\Users\....\Images\" & $Details[2] & ".jpg" ) ; download product image and rename to product number
    _ExcelWriteCell($oExcel, $Details[2] , $i, 1) ;Write Product Number
    _ExcelWriteCell($oExcel, $Details[1] , $i, 2) ;Write Product Name
    _ExcelWriteCell($oExcel, $Details[4] , $i, 3) ;Write Product Details
    _ExcelWriteCell($oExcel, $Details[3] , $i, 4) ;Write Product Price
    $i = $i + 1
Next

; Returns an array of relative links to items (with leading "/")
Func _GetLinks()
    Local $Source = _INetGetSource('http://www.horecaworld.biz/a_z.php') ;download the source containing the links
    Local $Links = _StringBetween($Source, '<Table>', '</Table>')   ;Look for the table containing the links
    $Links = _StringBetween($Links[0],'<div><a href="', '">')   ;Create an array of all the links in the table
    Return $Links   ;return the table
EndFunc

; Returns an array of details of an item
Func _GetDetails($link)
    Local $Details[5] ;create an array to hold the data
    Local $Source = _INetGetSource('http://www.horecaworld.biz' & $Link)    ;get the source of an item page
    $Source = _StringBetween($Source, '<td align="center">', '</table>')    ; get the table that contains the details (or most of the table anyways)
    Local $TempArray =  _StringBetween($Source[0], '<a href="', '"')    ;get the link to the product image
    $Details[0] = $TempArray[0]


    $TempArray = _StringBetween($Source[0], '<div class="pd_name">', '</div>') ;get the product name
    $Details[1] = $TempArray[0]

    $TempArray = _StringBetween($Source[0], '<div class="pd_art">', '</div>')   ;get the product number (You can use StringReplace to replace &nbsp with " ")
    $StringRepl = StringTrimLeft($TempArray[0], 13)                             ;get only the product number wihtout Art Nr:&nbsp;
    $Details[2] = $StringRepl

    $TempArray = _StringBetween($Source[0], '<div class="pd_price">', '</div>') ; Get the price. (Stringreplace to add € instead of the code.)
    $StringRepl = StringTrimLeft($TempArray[0], 14)                             ; Get only the prijs number wihtout Prijs: €
    $Details[3] = $StringRepl

    $TempArray = _StringBetween($Source[0], '<div class="pd_details">', '</div>') ;get the details. (Stringreplace to cut out unwanted <br />)
    $StringRepl = StringReplace($TempArray[0], "<br />", @CR)                     ; Get the details wihtout <br />
    $Details[4] = $StringRepl
    Return $Details
EndFunc

Func Sluit()
    TrayTip ( "", "Program Closed", 2 )
    Sleep ( 2000 )
    Exit
EndFunc
Edited by lilx
Link to comment
Share on other sites

Thats a good script. I was just wondering what you see as the drawbacks in the IE functions.

1. _IETableGetCollection() was able to create an object for the table containing the links, but when I used _IELinkClickByIndex() on that object it would click the first link in the page, not the first link in the table.

2. Using _IELinkClickByIndex() would navigate away from the main page, meaning that after extracting the information needed I'd need to navigate back to the main page, rather then going to the next link directly.

3. I had some trouble extracting the data from the item pages using _IE functions.

Most of these problems are probably my own shortcomings, nonetheless it was more practical for me to evaluate the source this way.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...