ParoXsitiC Posted July 18, 2005 Share Posted July 18, 2005 expandcollapse popup#include <file.au3> #include <Array.au3> Dim $Links[1] Dim $FileLinks[1] Global $LastLink = 0 $LOGFILE="Links.txt" $LOGFILE=@ScriptDir&"\"&$LOGFILE ExtractLinks('http://newslink.org/') _ArrayDelete($Links,0) _ArrayDisplay($Links, "Links Array") DeleteDuplicates() LogLinks() ;========================================================================================== Func ExtractLinks($Site) Local $Source, $PageLinks = 0 $Source = _INetGetSource($Site) $PageLinks = StringRegExp($Source, '(?i)<A href="(.*?)">', 3) For $NUM = 0 to UBound($PageLinks) - 1 _ArrayAdd($Links,$PageLinks[$NUM]) Next _ArraySort($Links) $LastLink = UBound($Links) - 1 EndFunc ;========================================================================================== Func DeleteDuplicates() LoadLOG() Local $Link = 0 While $Link < $LastLink ; Check for duplicates if $Links[$Link] = $Links[$Link+1] Then _ArrayDelete ($Links, $Link+1) EndIf ; Load the LOG and check for duplicates msgbox(64,"Searching...",$Links[$Link]) _ArrayBinarySearch ($FileLinks, $Links[$Link]) If Not @error Then;If Found _ArrayDelete ($Links, $Link) msgbox(64,"Result:","FOUND") Else msgbox(64,"Result:","NOT FOUND") EndIf ;Update the Link and LastLink $Link = $Link +1 $LastLink = UBound($Links) - 1 Wend EndFunc ;========================================================================================== Func LoadLOG() FileOpen($LOGFILE,1) _FileReadToArray($LOGFILE,$FileLinks) _ArrayDelete ($FileLinks, $FileLinks[0]) _ArrayDelete ($FileLinks, 0) _ArrayDisplay($FileLinks,"FileLinks Array") EndFunc ;========================================================================================== Func LogLinks() FileOpen($LOGFILE,1) For $Link = 0 to $LastLink FileWriteLine($LOGFILE,$Links[$Link]) Next FileClose($LOGFILE) EndFunc ;========================================================================================== Func _INetGetSource($s_URL) $o_HTTP = ObjCreate ("winhttp.winhttprequest.5.1") $o_HTTP.open ("GET", $s_URL) $o_HTTP.send () return $o_HTTP.Responsetext EndFunc "; Load the LOG and check for duplicates" is where my trouble is. I use _ArrayDisplay to debug and view the arrays. The $Links and $FileLinks work as they should. I cant seem to get the binary search to find any matches between $Links and $FileLinks. Of course the first run sets the log up, and the 2nd run shouldnt log any links, because the links are already in the log. Any feedback is welcome Link to comment Share on other sites More sharing options...
ParoXsitiC Posted July 18, 2005 Author Share Posted July 18, 2005 Please help. Link to comment Share on other sites More sharing options...
blindwig Posted July 18, 2005 Share Posted July 18, 2005 (edited) What's the problem? I ran the program and I don't see any duplicate link in the output file.Edit:Oh, I get it, if you run it more than once, the file contains another list because you append it. Hmm... Let me look into this... Edited July 18, 2005 by blindwig My UDF Threads:Pseudo-Hash: Binary Trees, Flat TablesFiles: Filter by Attribute, Tree List, Recursive Find, Recursive Folders Size, exported to XMLArrays: Nested, Pull Common Elements, Display 2dSystem: Expand Environment Strings, List Drives, List USB DrivesMisc: Multi-Layer Progress Bars, Binary FlagsStrings: Find Char(s) in String, Find String in SetOther UDF Threads I Participated:Base64 Conversions Link to comment Share on other sites More sharing options...
blindwig Posted July 18, 2005 Share Posted July 18, 2005 OK, 2 problems I found: First, the _FileReadToArray() leaves the EOL markers on the end of the strings when it puts them in the array. You need to run through the array and run StringStripCR() on each element after you read the file. Second, your searching logic is flawed: If you find a duplicate link, you delete the current link and then move on to the next link. The problem is that now you just skipped the link that was the next link, because when you delete the current link, the next link becomes the current link. The solution is to either not move to the next link until the current link is not found, or move through the array backwards (from end to beginning) so that deleted elements don't effect where you are in the array. My UDF Threads:Pseudo-Hash: Binary Trees, Flat TablesFiles: Filter by Attribute, Tree List, Recursive Find, Recursive Folders Size, exported to XMLArrays: Nested, Pull Common Elements, Display 2dSystem: Expand Environment Strings, List Drives, List USB DrivesMisc: Multi-Layer Progress Bars, Binary FlagsStrings: Find Char(s) in String, Find String in SetOther UDF Threads I Participated:Base64 Conversions Link to comment Share on other sites More sharing options...
Developers Jos Posted July 18, 2005 Developers Share Posted July 18, 2005 First, the _FileReadToArray() leaves the EOL markers on the end of the strings when it puts them in the array. You need to run through the array and run StringStripCR() on each element after you read the file.<{POST_SNAPBACK}>Will be fixed in the next UDF version... tnx SciTE4AutoIt3 Full installer Download page - Beta files Read before posting How to post scriptsource Forum etiquette Forum Rules Live for the present, Dream of the future, Learn from the past. Link to comment Share on other sites More sharing options...
ParoXsitiC Posted July 19, 2005 Author Share Posted July 19, 2005 Yes. I found the problems before I looked here...Thanks anyhow!! I noticed the search was flawed too. Everything is working now Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now