zackrspv Posted March 5, 2008 Share Posted March 5, 2008 (edited) Hello, I'm writing a program (for personal use, to see if i can understand how to parse sources better using other sources, etc), this is not for commercial use, and I will not be using it against the tos of the site that it pulls info from, nor will i be using the information in anyway that violates their tos. I just want to make sure it works that I have a local copy of the dictioanry on MY system if their system goes down. The problem with the below code: 1. I wrote it, so of course it is very very basic and messy 2. I doubt i did any of the regexp's right lol 3. While the sources look the same for every $line that it grabs, it doesn't always grab the information, and often skips over information. What in the world am I missing? expandcollapse popup#include <INet.au3> #include <GUIConstants.au3> filedelete("defs.txt") filedelete("terms.txt") $line = "" $str = "" $source = "" GUICreate("Hello World", 600, 500) ;~ AutoItSetOption("GUICoordMode", "0") GUISetState(@SW_SHOW) ;~ $item = InputBox("Search", "Enter search phrase") ;~ $item = StringReplace($item," ","+") $item = "" Func getTerms() $source = (_INetGetSource("http://www.investopedia.com/terms/"&$item&"/")) ;~ MsgBox(0,"test",$source) $nOffset = 1 $str = "" while 1 $array = StringRegExp($source, '<(?i)a href="(.*?)">', 1, $nOffset) if @error = 0 Then $nOffset = @extended Else ExitLoop EndIf for $i = 0 to UBound($array) - 1 if StringLeft($array[$i],9) = "/terms/"&$item&"/" Then $testme = StringInStr($array[$i], ".asp") if $testme then $str = $str & $array[$i] & @CRLF & @CRLF Else endif Else EndIf Next WEnd filewrite("terms.txt", $str) ;~ GUICtrlCreateEdit($str, -1, 0,600,500,BitOR($WS_VSCROLL,$ES_READONLY)) ;~ Do ;~ $msg = GUIGetMsg() ;~ Until $msg = $GUI_EVENT_CLOSE EndFunc func startTerms() guictrlcreatelabel("Do: ",0,32,32,32) guictrlcreatelabel("1",32,32,32,32) $item = "1" call("getTerms") guictrlcreatelabel("a",32,32,32,32) $item = "a" call("getTerms") guictrlcreatelabel("b",32,32,32,32) $item = "b" call("getTerms") guictrlcreatelabel("c",32,32,32,32) $item = "c" call("getTerms") guictrlcreatelabel("d",32,32,32,32) $item = "d" call("getTerms") guictrlcreatelabel("e",32,32,32,32) $item = "e" call("getTerms") guictrlcreatelabel("f",32,32,32,32) $item = "f" call("getTerms") guictrlcreatelabel("g",32,32,32,32) $item = "g" call("getTerms") guictrlcreatelabel("h",32,32,32,32) $item = "h" call("getTerms") guictrlcreatelabel("i",32,32,32,32) $item = "i" call("getTerms") guictrlcreatelabel("j",32,32,32,32) $item = "j" call("getTerms") guictrlcreatelabel("k",32,32,32,32) $item = "k" call("getTerms") guictrlcreatelabel("l",32,32,32,32) $item = "l" call("getTerms") guictrlcreatelabel("m",32,32,32,32) $item = "m" call("getTerms") guictrlcreatelabel("n",32,32,32,32) $item = "n" call("getTerms") guictrlcreatelabel("o",32,32,32,32) $item = "o" call("getTerms") guictrlcreatelabel("p",32,32,32,32) $item = "p" call("getTerms") guictrlcreatelabel("q",32,32,32,32) $item = "q" call("getTerms") guictrlcreatelabel("r",32,32,32,32) $item = "r" call("getTerms") guictrlcreatelabel("s",32,32,32,32) $item = "s" call("getTerms") guictrlcreatelabel("t",32,32,32,32) $item = "t" call("getTerms") guictrlcreatelabel("u",32,32,32,32) $item = "u" call("getTerms") guictrlcreatelabel("v",32,32,32,32) $item = "v" call("getTerms") guictrlcreatelabel("w",32,32,32,32) $item = "w" call("getTerms") guictrlcreatelabel("x",32,32,32,32) $item = "x" call("getTerms") guictrlcreatelabel("y",32,32,32,32) $item = "y" call("getTerms") guictrlcreatelabel("z",32,32,32,32) $item = "z" call("getTerms") EndFunc Func getDefs() $source = "" $array = "" $str = "" $source = (_INetGetSource("http://www.investopedia.com/"&$line)) $nOffset = 1 while 1 $array = StringRegExp($source, 'dic_termdefs">(.*?)<', 1, $nOffset) if @error = 0 Then $nOffset = @extended Else ExitLoop EndIf for $i = 0 to UBound($array) - 1 ;~ msgbox(0,"INFO", "Array info for: $array["&$i&"]"&@LF&$array[$i]) if $array[$i] = "" Then msgbox(0,"error", "Array is blank for: $array["&$i&"]") Else filewrite("defs.txt", $line & "," & $array[$i] & @CRLF & @CRLF) EndIf Next WEnd EndFunc call("startTerms") $url = "http://www.investopedia.com/" $file = FileOpen("terms.txt", 0) while 1 $line = FileReadLine($file) if $line = "" then Else guictrlcreatelabel("Do: "&$line, 0, 32, 600, 32) call("getDefs") EndIf WEnd FileClose($file) Edited March 30, 2008 by zackrspv -_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë 맧ëñ§ë øƒ !ïƒë. Link to comment Share on other sites More sharing options...
zackrspv Posted March 6, 2008 Author Share Posted March 6, 2008 So, i've been going over this over and over and over, and I still can't seem to figure out why it keeps skpping over some of the links in the terms file. Anyone have any idea? -_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë 맧ëñ§ë øƒ !ïƒë. Link to comment Share on other sites More sharing options...
zackrspv Posted March 6, 2008 Author Share Posted March 6, 2008 Ha, i got it. It was just not processing properly. I changed the regexp to: $array = StringRegExp($source, '(?i)class="dic_termdefs">(.*?)\n', 1, $nOffset) and boom, it works. I did make some modifications tho, to the underlying script; removed the function and made it in the primary call loop; so it looks like: expandcollapse popup$url = "http://www.investopedia.com/" $file = FileOpen("terms.txt", 0) while 1 $line = FileReadLine($file) if $line = "" then Else guictrlsetdata($info, $line) $source = "" $array = "" $str = "" $source = (_INetGetSource("http://www.investopedia.com/"&$line)) guictrlsetdata($redit, "Grabbing source for: " & $line) ;~ sleep(4000) guictrlsetdata($edit, $source) $nOffset = 1 while 1 $array = StringRegExp($source, '(?i)class="dic_termdefs">(.*?)\n', 1, $nOffset) $nOffset = @extended if @error = "1" Then MsgBox(0, "error", "array didn't return result") Exit EndIf guictrlsetdata($redit, "Set array extended for offset for: " & $line) ;~ sleep(3000) for $i = 0 to UBound($array) - 1 guictrlsetdata($redit, "Going to write data for: " & $line) $str = $array[$i] $str = StringRegExpReplace($str, "&#(.*?);", "" ) $str = StringRegExpReplace($str, "<(.*?)>", "" ) $str = StringRegExpReplace($str, "</(.*?)>", "" ) $str = StringRegExpReplace($str, "&(.*?);", "" ) ;~ sleep(3000) filewrite("defs.txt", $line & "," & $str & @CRLF & @CRLF) guictrlsetdata($redit, $str) ;~ sleep(4000) Next ExitLoop WEnd EndIf WEnd FileClose($file) So, at least it is workin -_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë 맧ëñ§ë øƒ !ïƒë. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now