Matterz Posted October 15, 2009 Share Posted October 15, 2009 (edited) I know that this topic has probably been discussed ad nauseum, but I've been searching the forums for a while and didn't find a clear answer. Basically I want to read information off multiple webpages and store the relevant data in a text file. There is about 26,000 pages total, which takes a long time to cycle through. Most of the time is taken up by waiting for the pages to load so it seemed like a good idea to break up the search into 26 IE tabs and have each tab scan 1,000 pages. I've written the following script to do this: expandcollapse popup#include <IE.au3> #include <string.au3> #include <Array.au3> #include <File.au3> Global $startPage = "http://search.com.jsp?query=a&lq=a&sta=00&zone=00&srt=rank&sid=&irc=n&gid=0&fex1=1&fex2=&fex3=&fex4=&pos=filter&site=&etc=" Const $navOpenInNewTab = 0x0800 Const $navOpenInBackgroundTab = 0x1000 Dim $aIE[26] $oIE = _IECreate($startPage, 1,0) _IEErrorNotify(False) Func stringStripAlpha($alphNumString) Local $aItems=StringRegExp($alphNumString,"(\d+)",3) If @error Then Return "" Local $sRet="" For $i=0 To UBound($aItems)-1 $sRet&=$aItems[$i] Next Return $sRet EndFunc Func lookupLetter($letterToCheck) Local $letterToNumber Select Case $letterToCheck="a" $letterToNumber=1 Case $letterToCheck="b" $letterToNumber=2 Case $letterToCheck="c" $letterToNumber=3 Case $letterToCheck='d' $letterToNumber=4 Case $letterToCheck='e' $letterToNumber=5 Case $letterToCheck='f' $letterToNumber=6 Case $letterToCheck='g' $letterToNumber=7 Case $letterToCheck='h' $letterToNumber=8 Case $letterToCheck='i' $letterToNumber=9 Case $letterToCheck='j' $letterToNumber=10 Case $letterToCheck='k' $letterToNumber=11 Case $letterToCheck='l' $letterToNumber=12 Case $letterToCheck='m' $letterToNumber=13 Case $letterToCheck='n' $letterToNumber=14 Case $letterToCheck='o' $letterToNumber=15 Case $letterToCheck='p' $letterToNumber=16 Case $letterToCheck='q' $letterToNumber=17 Case $letterToCheck='r' $letterToNumber=18 Case $letterToCheck='s' $letterToNumber=19 Case $letterToCheck='t' $letterToNumber=20 Case $letterToCheck='u' $letterToNumber=21 Case $letterToCheck='v' $letterToNumber=22 Case $letterToCheck='w' $letterToNumber=23 Case $letterToCheck='x' $letterToNumber=24 Case $letterToCheck='y' $letterToNumber=25 Case $letterToCheck='z' $letterToNumber=26 EndSelect Return $letterToNumber EndFunc Func nameSearch($letter) Local $letterNumber = lookupLetter($letter) Local $urlPage=00 Local $nextPage = "http://search.com.jsp?query="&$letter&"&lq=a&sta="&$urlPage&"&zone=00&srt=rank&sid=&irc=n&gid=0&fex1=1&fex2=&fex3=&fex4=&pos=filter&site=&etc=" Local $maxPage Local $aStringReturned Local $numberReturned Local $oInputs Local $oMaxChars Local $string Local $class Local $aNames[1] Local $aLevel[1] Local $aClass[1] ;$oIE.Navigate2($nextPage, $navOpenInNewTab) __IENavigate($oIE, $nextPage, 1, $navOpenInBackgroundTab) Do $aIE[$letterNumber]=_IEAttach($nextPage,"url") Sleep(100) Until IsObj($aIE[$letterNumber]) ;Initial Page ;##################################################################################### $oMaxChars = _IETagNameGetCollection($aIE[$letterNumber],"P") For $oMaxChar In $oMaxChars If(StringInStr($oMaxChar.innertext,"Results")) Then $aStringReturned = StringSplit($oMaxChar.innertext," )") $numberReturned = Int(stringStripAlpha($aStringReturned[4])) $maxPage = Int(($numberReturned+9)/10) EndIf Next $oInputs = _IETagNameGetCollection($oIE,"TBody") For $oInput In $oInputs $string = StringSplit($oInput.innertext,@CRLF&" ") $class = StringSplit($string[15],"L",1) If(StringInStr($string[1],"A",1)) Then If($class[1]>17) Then _ArrayAdd($aNames,$string[1]) _ArrayAdd($aLevel,$string[4]) _ArrayAdd($aClass,$class[1]) EndIf EndIf Next ;Multi-page Loop ;##################################################################################### ;~ For $urlPage=10 to $maxPage Step 10 ;~ $nextPage = "http://search.com.jsp?query="&$letter&"&lq=a&sta="&$urlPage&"&zone=00&srt=rank&sid=&irc=n&gid=0&fex1=1&fex2=&fex3=&fex4=&pos=filter&site=&etc=" ;~ _IENavigate($oaIE[$letterNumber],$nextPage) ;~ $oInputs = _IETagNameGetCollection($oIE,"TBody") ;~ For $oInput In $oInputs ;~ $string = StringSplit($oInput.innertext,@CRLF&" ") ;~ $class = StringSplit($string[15],"L",1) ;~ If(StringInStr($string[1],"A",1)) Then ;~ If($class[1]>17) Then ;~ _ArrayAdd($aNames,$string[1]) ;~ _ArrayAdd($aLevel,$string[4]) ;~ _ArrayAdd($aClass,$class[1]) ;~ EndIf ;~ EndIf ;~ Next ;~ Next ;Write to Text File ;##################################################################################### _ArrayDelete($aNames,0) _ArrayDelete($aLevel,0) _ArrayDelete($aClass,0) $file = FileOpen("C:\Documents and Settings\user\Desktop\File"&$letter, 10) If $file = -1 Then ConsoleWrite("Unable to open file."&@CRLF) Exit EndIf For $row=0 to UBound($aNames)-1 FileWrite($file,$aNames[$row]&@TAB) FileWrite($file,$aLevel[$row]&@TAB) FileWrite($file,$aClass[$row]&@CRLF) Next FileClose($file) EndFunc My code can probably be optimized (especially converting letters to numbers), but my main issue is getting 26 nameSearch() Functions to run at the same time. Obviously if I can't run them all simultaneously then I save nothing by breaking the scan into chucks. The only solution I've read about so far would be to make an exe for each letter and run them all separately. I'm hoping there is a cleaner method? Thanks Edited October 15, 2009 by Matterz Link to comment Share on other sites More sharing options...
Moderators Melba23 Posted October 15, 2009 Moderators Share Posted October 15, 2009 Matterz,AutoIt is single-threaded - 1 function at a time. Even if you think you are running lots of functions simultaneously (e.g. via Adlib) you still really only get one running at any one time.Search for multi-threading - you will then know more about the subject than you ever wished to! M23 Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind Open spoiler to see my UDFs: Spoiler ArrayMultiColSort ---- Sort arrays on multiple columnsChooseFileFolder ---- Single and multiple selections from specified path treeview listingDate_Time_Convert -- Easily convert date/time formats, including the language usedExtMsgBox --------- A highly customisable replacement for MsgBoxGUIExtender -------- Extend and retract multiple sections within a GUIGUIFrame ---------- Subdivide GUIs into many adjustable framesGUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView itemsGUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeViewMarquee ----------- Scrolling tickertape GUIsNoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxesNotify ------------- Small notifications on the edge of the displayScrollbars ----------Automatically sized scrollbars with a single commandStringSize ---------- Automatically size controls to fit textToast -------------- Small GUIs which pop out of the notification area Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now