Falling Posted July 13, 2004 Share Posted July 13, 2004 (edited) I used a earlier guys code posted in scripts and scrapes and tried to make it recursivly call all following pages also... eventually i wanted to make a tree structured layout of a webpage using the recursive call. But I've run into some errors here is the code... Thanks for any help. * EDITED CODE FROM PREVIOUS RESPONSE ADVICE. STILL BROKEN. expandcollapse popup$sURL = InputBox("What webpage?", "Enter the webpage") Foo_($sURL) Func Foo_($url) $urlInTextFile = 0 $file = FileOpen("temp.txt", 2) $a = 1 While 1 $line = FileReadLine($file, $a) $a = $a + 1 If $line = $url Then FileWrite($file, $url & @CRLF) $urlInTextFile = 1 EndIf If @error = -1 Then ExitLoop Wend FileClose($file) IF $urlInTextFile = 0 Then $file = FileOpen("temp.txt", 2) FileWrite($file, $url & @CRLF) FileClose($file) local $linklist = _GetLinks($url) For $nZ = 1 to $linklist[0] Foo_($linklist[$nZ]) Next endif EndFunc Func _GetLinks($psURL) ;Returns an array of links from a webpage ;------------------------------------------------------------------------------ ;Download the HTML to a temporary file $sTempFile = "$tridsf13.htm" URLDownloadToFile($psURL, $sTempFile) $sHTML = FileRead($sTempFile, FileGetSize($sTempFile)) FileDelete($sTempFile) ;Cleanup the HTML for better consumption $sHTML = StringReplace($sHTML, @CR, "") $sHTML = StringReplace($sHTML, @LF, "") $sHTML = StringReplace($sHTML, @TAB, " ") ;Break it into chewable bytes $sHTML = StringReplace($sHTML, "href=", @LF & "href=") $sHTML = StringReplace($sHTML, "</a>", @LF & "scrap") $asHTML = StringSplit($sHTML, @LF) ;Spit out the bones $sLinks = "" For $nX = 1 to $asHTML[0] ;Process only "href=" lines If StringLeft($asHTML[$nX],5) = "href=" then $asLink = StringSplit($asHTML[$nX], ">") $sLinks = $sLinks & @LF & $asLink[1] Endif Next ;Return the juicy links Return StringSplit(StringTrimLeft($sLinks,1), @LF) EndFunc Edited July 14, 2004 by Falling Link to comment Share on other sites More sharing options...
Developers Jos Posted July 13, 2004 Developers Share Posted July 13, 2004 (edited) I used a earlier guys code posted in scripts and scrapes and tried to make it recursivly call all following pages also...eventually i wanted to make a tree structured layout of a webpage using the recursive call.But I've run into some errors here is the code... Thanks for any help.I didn't run it to test, but what is your error ?One thing you run into for sure in doing a call to Foo_() again without closing file "temp.txt"... don't think you can keep on opening that without closing it first... Edited July 13, 2004 by JdeB SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past. Link to comment Share on other sites More sharing options...
Falling Posted July 14, 2004 Author Share Posted July 14, 2004 Well I put all the reading and writing before the recursion goes off now... see below, BUT it still just stalls out and never completes. Any advice would be appricated. Here is the new Code: expandcollapse popup$sURL = InputBox("What webpage?", "Enter the webpage") Foo_($sURL) Func Foo_($url) $urlInTextFile = 0 $file = FileOpen("temp.txt", 2) $a = 1 While 1 $line = FileReadLine($file, $a) $a = $a + 1 If $line = $url Then FileWrite($file, $url & @CRLF) $urlInTextFile = 1 EndIf If @error = -1 Then ExitLoop Wend FileClose($file) IF $urlInTextFile = 0 Then $file = FileOpen("temp.txt", 2) FileWrite($file, $url & @CRLF) FileClose($file) local $linklist = _GetLinks($url) For $nZ = 1 to $linklist[0] Foo_($linklist[$nZ]) Next endif EndFunc Func _GetLinks($psURL) ;Returns an array of links from a webpage ;------------------------------------------------------------------------------ ;Download the HTML to a temporary file $sTempFile = "$tridsf13.htm" URLDownloadToFile($psURL, $sTempFile) $sHTML = FileRead($sTempFile, FileGetSize($sTempFile)) FileDelete($sTempFile) ;Cleanup the HTML for better consumption $sHTML = StringReplace($sHTML, @CR, "") $sHTML = StringReplace($sHTML, @LF, "") $sHTML = StringReplace($sHTML, @TAB, " ") ;Break it into chewable bytes $sHTML = StringReplace($sHTML, "href=", @LF & "href=") $sHTML = StringReplace($sHTML, "</a>", @LF & "scrap") $asHTML = StringSplit($sHTML, @LF) ;Spit out the bones $sLinks = "" For $nX = 1 to $asHTML[0] ;Process only "href=" lines If StringLeft($asHTML[$nX],5) = "href=" then $asLink = StringSplit($asHTML[$nX], ">") $sLinks = $sLinks & @LF & $asLink[1] Endif Next ;Return the juicy links Return StringSplit(StringTrimLeft($sLinks,1), @LF) EndFunc Link to comment Share on other sites More sharing options...
Falling Posted July 14, 2004 Author Share Posted July 14, 2004 Anyone got any ideas? Link to comment Share on other sites More sharing options...
Developers Jos Posted July 14, 2004 Developers Share Posted July 14, 2004 (edited) Anyone got any ideas?Dont think this will ever complete: For $NZ = 1 To $LINKLIST[0] Foo_($LINKLIST[$NZ]) Next Everytime you restart Foo() it will start at 1 so keeps on recursing till you hit the max recursion level... EDIT: spoke to soon....$LINKLIST is local ....... checking a bit more.... EDIT2: you open a file for WRITE but do a FileReadLine without error checking. Could that be it ?? $URLINTEXTFILE = 0 $FILE = FileOpen("temp.txt", 2) $A = 1 While 1 $LINE = FileReadLine($FILE, $A) $A = $A + 1 Edited July 14, 2004 by JdeB SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past. Link to comment Share on other sites More sharing options...
Falling Posted July 15, 2004 Author Share Posted July 15, 2004 EDIT2: you open a file for WRITE but do a FileReadLine without error checking. Could that be it ?? there is error checking with that filereadline... i used the exact sample from the help file... which is not saying much cause there are lot of errors in that help file. also i open the file before so if it was going to throw a error it should throw it then. I might have the syntax wrong for filereadline though...i think it said it needed full path but i used $file... anyway i'll mess with it again tommorow. While 1 $line = FileReadLine($file, $a) $a = $a + 1 If $line = $url Then FileWrite($file, $url & @CRLF) $urlInTextFile = 1 EndIf If @error = -1 Then ExitLoop Wend Link to comment Share on other sites More sharing options...
Developers Jos Posted July 15, 2004 Developers Share Posted July 15, 2004 EDIT2: you open a file for WRITE but do a FileReadLine without error checking. Could that be it ??there is error checking with that filereadline...i used the exact sample from the help file... which is not saying much cause there are lot of errors in that help file.also i open the file before so if it was going to throw a error it should throw it then.I might have the syntax wrong for filereadline though...i think it said it needed full path but i used $file... anyway i'll mess with it again tommorow.I guess i was stating two things in my EDIT2.1. You open the file for WRITE but are Reading AND Writing to it using the same FileHandle.. not sure if this works correctly. 2. You are not doing proper error checking. You need to do the @ERROR test rigth after the FileReadLine operation... like the helpfiles shows in the examples! The Error thrown at open time is all about opening the file ...The Error thrown at reading time is to check if you have reached End-Of-File. (and also if thefile exists)which is not saying much cause there are lot of errors in that help fileBy the way, just sent/post the list of errors you have found that aren't reported yet so the helpfile can be made better for the next release... SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now