lilx Posted March 31, 2010 Share Posted March 31, 2010 I need to have some information about some products of a site, instate of typing it all over, hours of copy and paste i thought that autoit might give me a help i need to do the following.1. I need to go to WebsiteThere are all their products located. For example we start at the first product.2. It’s called 1_4_gn_265x162mm, the link for that article is Website at first article as you can see the site do have a simple linking systeem ( homesite + product. )3. Now on that page the information I want are located at <div class=”middle” ….. in that div there is <td align=”center” …. and <td align=”middle” …. In this part are the article information located. 4. Now I want to copy them and save the information in a .txt file. I have the knowledge enough to do that.5. Go to the other product and repeat.Some of you guys maybe thinking is it allowed or for what are you doing it?Well I don’t know if it is allowed but I am not doing any damage to the site en there are no copyrights on those products as far as I know. Second i need to build a database with all these products in it and this way if the most efficient for me.And my question for you guys is; I tired I couple of times to work with the html stuff on autoit. But I never that manage to understand it. Maybe I little example or the commands I will be needing would be appreciated.Greetings Link to comment Share on other sites More sharing options...
Tvern Posted March 31, 2010 Share Posted March 31, 2010 My first plan was to use the _IE functions, but that had some drawbacks, so I decided this was a better method. It is terribly slow, but I think that has to do with the website. I've commented as clearly as I could. #include <INet.au3> #include <String.au3> #include <Array.au3> $Links = _GetLinks() ;get the links we want For $i = 0 To UBound($Links)-1 ;exectute the following function once for each link $Details = _GetDetails($Links[$i]) ;get all the details for one item at a time _ArrayDisplay($Details) ;Display the results for testing ; Save your data the way you like Next ; Returns an array of relative links to items (with leading "/") Func _GetLinks() Local $Source = _INetGetSource('http://www.horecaworld.biz/a_z.php') ;download the source containing the links Local $Links = _StringBetween($Source, '<Table>', '</Table>') ;Look for the table containing the links $Links = _StringBetween($Links[0],'<div><a href="', '">') ;Create an array of all the links in the table Return $Links ;return the table EndFunc ; Returns an array of details of an item Func _GetDetails($link) Local $Details[5] ;create an array to hold the data Local $Source = _INetGetSource('http://www.horecaworld.biz' & $Link) ;get the source of an item page $Source = _StringBetween($Source, '<td align="center">', '</table>') ; get the table that contains the details (or most of the table anyways) Local $TempArray = _StringBetween($Source[0], '<a href="', '"') ;get the link to the product image $Details[0] = $TempArray[0] $TempArray = _StringBetween($Source[0], '<div class="pd_name">', '</div>') ;get the product name $Details[1] = $TempArray[0] $TempArray = _StringBetween($Source[0], '<div class="pd_art">', '</div>') ;get the product number (You can use StringReplace to replace   with " ") $Details[2] = $TempArray[0] $TempArray = _StringBetween($Source[0], '<div class="pd_price">', '</div>') ; Get the price. (Stringreplace to add € instead of the code.) $Details[3] = $TempArray[0] $TempArray = _StringBetween($Source[0], '<div class="pd_details">', '</div>') ;get the details. (Stringreplace to cut out unwanted <br />) $Details[4] = $TempArray[0] Return $Details EndFunc Link to comment Share on other sites More sharing options...
lilx Posted March 31, 2010 Author Share Posted March 31, 2010 thanks for your reply tvern, thank you for the example it saves me now a lot of manueel boring work. i am not a pro in autoit but much of these commands i know of and i will look at the code try to work from that. Greetings Link to comment Share on other sites More sharing options...
bo8ster Posted March 31, 2010 Share Posted March 31, 2010 My first plan was to use the _IE functions, but that had some drawbacks, so I decided this was a better method. It is terribly slow, but I think that has to do with the website. I've commented as clearly as I could. #include <INet.au3> #include <String.au3> #include <Array.au3> $Links = _GetLinks() ;get the links we want For $i = 0 To UBound($Links)-1 ;exectute the following function once for each link $Details = _GetDetails($Links[$i]) ;get all the details for one item at a time _ArrayDisplay($Details) ;Display the results for testing ; Save your data the way you like Next ; Returns an array of relative links to items (with leading "/") Func _GetLinks() Local $Source = _INetGetSource('http://www.horecaworld.biz/a_z.php') ;download the source containing the links Local $Links = _StringBetween($Source, '<Table>', '</Table>') ;Look for the table containing the links $Links = _StringBetween($Links[0],'<div><a href="', '">') ;Create an array of all the links in the table Return $Links ;return the table EndFunc ; Returns an array of details of an item Func _GetDetails($link) Local $Details[5] ;create an array to hold the data Local $Source = _INetGetSource('http://www.horecaworld.biz' & $Link) ;get the source of an item page $Source = _StringBetween($Source, '<td align="center">', '</table>') ; get the table that contains the details (or most of the table anyways) Local $TempArray = _StringBetween($Source[0], '<a href="', '"') ;get the link to the product image $Details[0] = $TempArray[0] $TempArray = _StringBetween($Source[0], '<div class="pd_name">', '</div>') ;get the product name $Details[1] = $TempArray[0] $TempArray = _StringBetween($Source[0], '<div class="pd_art">', '</div>') ;get the product number (You can use StringReplace to replace   with " ") $Details[2] = $TempArray[0] $TempArray = _StringBetween($Source[0], '<div class="pd_price">', '</div>') ; Get the price. (Stringreplace to add € instead of the code.) $Details[3] = $TempArray[0] $TempArray = _StringBetween($Source[0], '<div class="pd_details">', '</div>') ;get the details. (Stringreplace to cut out unwanted <br />) $Details[4] = $TempArray[0] Return $Details EndFunc Thats a good script. I was just wondering what you see as the drawbacks in the IE functions. Post your code because code says more then your words can. SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y. Use Opt("MustDeclareVars", 1)[topic="84960"]Brett F's Learning To Script with AutoIt V3[/topic][topic="21048"]Valuater's AutoIt 1-2-3, Class... is now in Session[/topic]Contribution: [topic="87994"]Get SVN Rev Number[/topic], [topic="93527"]Control Handle under mouse[/topic], [topic="91966"]A Presentation using AutoIt[/topic], [topic="112756"]Log ConsoleWrite output in Scite[/topic] Link to comment Share on other sites More sharing options...
GEOSoft Posted March 31, 2010 Share Posted March 31, 2010 You may find this a bit faster up to getting the links and no need to include _INet.au3. $sURL = "http://www.horecaworld.biz/a_z.php" $sSource = BinaryToString(InetRead($sURL)) ;; Read the source $aTable = StringRegExp($sSource, "(?i)(?s)<table>(.+?<a href.+?)\s*</table>", 1) ;; Get the table, this appears to be the right table. If NOT @Error Then $aLinks = StringRegExp($aTable[0], "(?i)div>\s*<a.+\x22(.+?)\x22>", 3) ;; Get the links into an array EndIf After that point I'm not quite sure what you want to do. Also if you need the url AND the text from those links, it can be done with a slight modification to the SRE. George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
lilx Posted April 1, 2010 Author Share Posted April 1, 2010 (edited) hi guys, thanks the reply i worked out the first example that tvern gave me, and it works quite nice for me i must tell. when i posted my question i still was thinking on how i wanted to store the information. i came up with a excel file so it can be imported when needed to the database. expandcollapse popup#include <INet.au3> #include <String.au3> #include <Array.au3> #include <Excel.au3> HotKeySet ( "{ESC}", "Sluit" ) ;Open Excel File $bestandslocatie = FileSaveDialog( "Choose a name.", @DesktopDir, "All (*.*)", 2) Sleep ( 1000 ) Local $oExcel = _ExcelBookOpen($bestandslocatie) $i = 2 ; First line in the Excel file are heads. $Links = _GetLinks() ;get the links we want For $i = 0 To UBound($Links)-1 ;exectute the following function once for each link $Details = _GetDetails($Links[$i]) ;get all the details for one item at a time ;_ArrayDisplay($Details) ;Display the results for testing ; Save your data the way you like InetGet ( $Details[0], "C:\Users\....\Images\" & $Details[2] & ".jpg" ) ; download product image and rename to product number _ExcelWriteCell($oExcel, $Details[2] , $i, 1) ;Write Product Number _ExcelWriteCell($oExcel, $Details[1] , $i, 2) ;Write Product Name _ExcelWriteCell($oExcel, $Details[4] , $i, 3) ;Write Product Details _ExcelWriteCell($oExcel, $Details[3] , $i, 4) ;Write Product Price $i = $i + 1 Next ; Returns an array of relative links to items (with leading "/") Func _GetLinks() Local $Source = _INetGetSource('http://www.horecaworld.biz/a_z.php') ;download the source containing the links Local $Links = _StringBetween($Source, '<Table>', '</Table>') ;Look for the table containing the links $Links = _StringBetween($Links[0],'<div><a href="', '">') ;Create an array of all the links in the table Return $Links ;return the table EndFunc ; Returns an array of details of an item Func _GetDetails($link) Local $Details[5] ;create an array to hold the data Local $Source = _INetGetSource('http://www.horecaworld.biz' & $Link) ;get the source of an item page $Source = _StringBetween($Source, '<td align="center">', '</table>') ; get the table that contains the details (or most of the table anyways) Local $TempArray = _StringBetween($Source[0], '<a href="', '"') ;get the link to the product image $Details[0] = $TempArray[0] $TempArray = _StringBetween($Source[0], '<div class="pd_name">', '</div>') ;get the product name $Details[1] = $TempArray[0] $TempArray = _StringBetween($Source[0], '<div class="pd_art">', '</div>') ;get the product number (You can use StringReplace to replace   with " ") $StringRepl = StringTrimLeft($TempArray[0], 13) ;get only the product number wihtout Art Nr: $Details[2] = $StringRepl $TempArray = _StringBetween($Source[0], '<div class="pd_price">', '</div>') ; Get the price. (Stringreplace to add € instead of the code.) $StringRepl = StringTrimLeft($TempArray[0], 14) ; Get only the prijs number wihtout Prijs: € $Details[3] = $StringRepl $TempArray = _StringBetween($Source[0], '<div class="pd_details">', '</div>') ;get the details. (Stringreplace to cut out unwanted <br />) $StringRepl = StringReplace($TempArray[0], "<br />", @CR) ; Get the details wihtout <br /> $Details[4] = $StringRepl Return $Details EndFunc Func Sluit() TrayTip ( "", "Program Closed", 2 ) Sleep ( 2000 ) Exit EndFunc Edited April 1, 2010 by lilx Link to comment Share on other sites More sharing options...
Tvern Posted April 1, 2010 Share Posted April 1, 2010 Thats a good script. I was just wondering what you see as the drawbacks in the IE functions.1. _IETableGetCollection() was able to create an object for the table containing the links, but when I used _IELinkClickByIndex() on that object it would click the first link in the page, not the first link in the table.2. Using _IELinkClickByIndex() would navigate away from the main page, meaning that after extracting the information needed I'd need to navigate back to the main page, rather then going to the next link directly.3. I had some trouble extracting the data from the item pages using _IE functions.Most of these problems are probably my own shortcomings, nonetheless it was more practical for me to evaluate the source this way. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now