zackrspv Posted March 5, 2008 Posted March 5, 2008 (edited) Hello, I'm writing a program (for personal use, to see if i can understand how to parse sources better using other sources, etc), this is not for commercial use, and I will not be using it against the tos of the site that it pulls info from, nor will i be using the information in anyway that violates their tos. I just want to make sure it works that I have a local copy of the dictioanry on MY system if their system goes down. The problem with the below code: 1. I wrote it, so of course it is very very basic and messy 2. I doubt i did any of the regexp's right lol 3. While the sources look the same for every $line that it grabs, it doesn't always grab the information, and often skips over information. What in the world am I missing? expandcollapse popup#include <INet.au3> #include <GUIConstants.au3> filedelete("defs.txt") filedelete("terms.txt") $line = "" $str = "" $source = "" GUICreate("Hello World", 600, 500) ;~ AutoItSetOption("GUICoordMode", "0") GUISetState(@SW_SHOW) ;~ $item = InputBox("Search", "Enter search phrase") ;~ $item = StringReplace($item," ","+") $item = "" Func getTerms() $source = (_INetGetSource("http://www.investopedia.com/terms/"&$item&"/")) ;~ MsgBox(0,"test",$source) $nOffset = 1 $str = "" while 1 $array = StringRegExp($source, '<(?i)a href="(.*?)">', 1, $nOffset) if @error = 0 Then $nOffset = @extended Else ExitLoop EndIf for $i = 0 to UBound($array) - 1 if StringLeft($array[$i],9) = "/terms/"&$item&"/" Then $testme = StringInStr($array[$i], ".asp") if $testme then $str = $str & $array[$i] & @CRLF & @CRLF Else endif Else EndIf Next WEnd filewrite("terms.txt", $str) ;~ GUICtrlCreateEdit($str, -1, 0,600,500,BitOR($WS_VSCROLL,$ES_READONLY)) ;~ Do ;~ $msg = GUIGetMsg() ;~ Until $msg = $GUI_EVENT_CLOSE EndFunc func startTerms() guictrlcreatelabel("Do: ",0,32,32,32) guictrlcreatelabel("1",32,32,32,32) $item = "1" call("getTerms") guictrlcreatelabel("a",32,32,32,32) $item = "a" call("getTerms") guictrlcreatelabel("b",32,32,32,32) $item = "b" call("getTerms") guictrlcreatelabel("c",32,32,32,32) $item = "c" call("getTerms") guictrlcreatelabel("d",32,32,32,32) $item = "d" call("getTerms") guictrlcreatelabel("e",32,32,32,32) $item = "e" call("getTerms") guictrlcreatelabel("f",32,32,32,32) $item = "f" call("getTerms") guictrlcreatelabel("g",32,32,32,32) $item = "g" call("getTerms") guictrlcreatelabel("h",32,32,32,32) $item = "h" call("getTerms") guictrlcreatelabel("i",32,32,32,32) $item = "i" call("getTerms") guictrlcreatelabel("j",32,32,32,32) $item = "j" call("getTerms") guictrlcreatelabel("k",32,32,32,32) $item = "k" call("getTerms") guictrlcreatelabel("l",32,32,32,32) $item = "l" call("getTerms") guictrlcreatelabel("m",32,32,32,32) $item = "m" call("getTerms") guictrlcreatelabel("n",32,32,32,32) $item = "n" call("getTerms") guictrlcreatelabel("o",32,32,32,32) $item = "o" call("getTerms") guictrlcreatelabel("p",32,32,32,32) $item = "p" call("getTerms") guictrlcreatelabel("q",32,32,32,32) $item = "q" call("getTerms") guictrlcreatelabel("r",32,32,32,32) $item = "r" call("getTerms") guictrlcreatelabel("s",32,32,32,32) $item = "s" call("getTerms") guictrlcreatelabel("t",32,32,32,32) $item = "t" call("getTerms") guictrlcreatelabel("u",32,32,32,32) $item = "u" call("getTerms") guictrlcreatelabel("v",32,32,32,32) $item = "v" call("getTerms") guictrlcreatelabel("w",32,32,32,32) $item = "w" call("getTerms") guictrlcreatelabel("x",32,32,32,32) $item = "x" call("getTerms") guictrlcreatelabel("y",32,32,32,32) $item = "y" call("getTerms") guictrlcreatelabel("z",32,32,32,32) $item = "z" call("getTerms") EndFunc Func getDefs() $source = "" $array = "" $str = "" $source = (_INetGetSource("http://www.investopedia.com/"&$line)) $nOffset = 1 while 1 $array = StringRegExp($source, 'dic_termdefs">(.*?)<', 1, $nOffset) if @error = 0 Then $nOffset = @extended Else ExitLoop EndIf for $i = 0 to UBound($array) - 1 ;~ msgbox(0,"INFO", "Array info for: $array["&$i&"]"&@LF&$array[$i]) if $array[$i] = "" Then msgbox(0,"error", "Array is blank for: $array["&$i&"]") Else filewrite("defs.txt", $line & "," & $array[$i] & @CRLF & @CRLF) EndIf Next WEnd EndFunc call("startTerms") $url = "http://www.investopedia.com/" $file = FileOpen("terms.txt", 0) while 1 $line = FileReadLine($file) if $line = "" then Else guictrlcreatelabel("Do: "&$line, 0, 32, 600, 32) call("getDefs") EndIf WEnd FileClose($file) Edited March 30, 2008 by zackrspv -_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë ë§§ëñ§ë øƒ !ïƒë.
zackrspv Posted March 6, 2008 Author Posted March 6, 2008 So, i've been going over this over and over and over, and I still can't seem to figure out why it keeps skpping over some of the links in the terms file. Anyone have any idea? -_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë ë§§ëñ§ë øƒ !ïƒë.
zackrspv Posted March 6, 2008 Author Posted March 6, 2008 Ha, i got it. It was just not processing properly. I changed the regexp to: $array = StringRegExp($source, '(?i)class="dic_termdefs">(.*?)\n', 1, $nOffset) and boom, it works. I did make some modifications tho, to the underlying script; removed the function and made it in the primary call loop; so it looks like: expandcollapse popup$url = "http://www.investopedia.com/" $file = FileOpen("terms.txt", 0) while 1 $line = FileReadLine($file) if $line = "" then Else guictrlsetdata($info, $line) $source = "" $array = "" $str = "" $source = (_INetGetSource("http://www.investopedia.com/"&$line)) guictrlsetdata($redit, "Grabbing source for: " & $line) ;~ sleep(4000) guictrlsetdata($edit, $source) $nOffset = 1 while 1 $array = StringRegExp($source, '(?i)class="dic_termdefs">(.*?)\n', 1, $nOffset) $nOffset = @extended if @error = "1" Then MsgBox(0, "error", "array didn't return result") Exit EndIf guictrlsetdata($redit, "Set array extended for offset for: " & $line) ;~ sleep(3000) for $i = 0 to UBound($array) - 1 guictrlsetdata($redit, "Going to write data for: " & $line) $str = $array[$i] $str = StringRegExpReplace($str, "&#(.*?);", "" ) $str = StringRegExpReplace($str, "<(.*?)>", "" ) $str = StringRegExpReplace($str, "</(.*?)>", "" ) $str = StringRegExpReplace($str, "&(.*?);", "" ) ;~ sleep(3000) filewrite("defs.txt", $line & "," & $str & @CRLF & @CRLF) guictrlsetdata($redit, $str) ;~ sleep(4000) Next ExitLoop WEnd EndIf WEnd FileClose($file) So, at least it is workin -_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë ë§§ëñ§ë øƒ !ïƒë.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now