floodge Posted March 13, 2009 Share Posted March 13, 2009 (edited) Hello I am working on a dictionary program as a small coding project. At first, my application would liteally load a web page and use a mouse macro to paste the definition. This worked but was really quirky and often broke. Now, I am using Inetget to grab the html from the page. The problem arises in that I can get and view the html fine, yet I need to somehow specify what portion of the page is the definition and what part is html that isn't needed. (Above code is GUI stuff, heres where the meat / issue is) $mystr="http://dictionary.reference.com/browse/" & $str; InetGet($mystr, "C:\results.txt", 1, 0) $str=FileRead("C:\results.txt") $str=StringRegExpReplace($str,"""","") MsgBox(64,"Definition",$str) Looking for pointers, I don't have much regex experience. Edited March 13, 2009 by floodge Link to comment Share on other sites More sharing options...
Authenticity Posted March 13, 2009 Share Posted March 13, 2009 Hmm...? What sort of information are you trying to retrieve? I mean anything that is between <> should be the HTML part you're not interested in? And what part you're interested in? ;] Link to comment Share on other sites More sharing options...
floodge Posted March 14, 2009 Author Share Posted March 14, 2009 Hmm...? What sort of information are you trying to retrieve? I mean anything that is between <> should be the HTML part you're not interested in? And what part you're interested in? ;]Trying to retrieve the portion with the definitions in it, not sure how to define that single part out of the file Link to comment Share on other sites More sharing options...
Authenticity Posted March 14, 2009 Share Posted March 14, 2009 Something like this?: #include <INet.au3> Dim $sSource = _INetGetSource('http://www.autoitscript.com/') $sSource = StringRegExpReplace($sSource, '<[^>]++>', '') $sSource = StringRegExpReplace($sSource, '(\r\n){2,}', @CRLF) $sSource = StringRegExpReplace($sSource, '(?>[[:blank:]]+)\r\n', '') $sSource = StringStripWS($sSource, 3) ConsoleWrite($sSource & @LF) $hFile = FileOpen(@ScriptDir & '\TempHTML.txt', 2) If $hFile = -1 Then Exit FileWrite($hFile, $sSource) FileClose($hFile) Link to comment Share on other sites More sharing options...
floodge Posted March 18, 2009 Author Share Posted March 18, 2009 I just need a line of code that will start at "1." and end at the next period Link to comment Share on other sites More sharing options...
GEOSoft Posted March 18, 2009 Share Posted March 18, 2009 (edited) Not exactly sure if you want all the definitions returned or not. This returns the Full definition as shown on that page. You can parse the portion you want from the return.$Str = StringRegExp($Str, "(?i)<td width=.* class=\x22?dnindex\x22?>(1\..*)</table>", 1) If Not @Error Then $str = $Str[0] Else MsgBox(0, "Oooops!", "Houston, we have a problem") EndIfIf you don't need the whole page for other reasons, why not use _InetGetSource() instead of InetGet() as Authenticity shows? Edited March 18, 2009 by GEOSoft George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
floodge Posted March 18, 2009 Author Share Posted March 18, 2009 $mystr="http://dictionary.reference.com/browse/" & $str; $str = _InetGetSource($mystr) $Str = StringRegExp($Str, "(?i)<td width=.* class=\x22?dnindex\x22?>(1\..*)</table>", 1) ;StringRegExpReplace($str, "</td>", "") msgbox(48, "Definition", $str) My problem is at the commented out line. Excuse my noobiness but that should white out all af the "</td>" in the html, right? I am having trouble Link to comment Share on other sites More sharing options...
Authenticity Posted March 18, 2009 Share Posted March 18, 2009 And return the modified string. It's not modifying the string, only a copy of it. Link to comment Share on other sites More sharing options...
GEOSoft Posted March 18, 2009 Share Posted March 18, 2009 $mystr="http://dictionary.reference.com/browse/" & $str; $str = _InetGetSource($mystr) $Str = StringRegExp($Str, "(?i)<td width=.* class=\x22?dnindex\x22?>(1\..*)</table>", 1) ;StringRegExpReplace($str, "</td>", "") msgbox(48, "Definition", $str) My problem is at the commented out line. Excuse my noobiness but that should white out all af the "</td>" in the html, right? I am having troubleYou were close $Str = StringRegExpReplace($str, "</td>", "") George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
floodge Posted March 19, 2009 Author Share Posted March 19, 2009 (edited) Local $a $a = StringRegExp($str, "(.*)", 3) _ArrayDisplay($a, "Definition of - "& $str2)Trimmed down all the html (Woot!)Figuring out now how to line up all of the definitions into an arrayThe definitions go as 1. blah blah 2. blah blah etc, so is there certain arguements that I can place here$a = StringRegExp($str, "(.*)", 3)That will place each definition on a seperate column?EDIT: I am making progress, almost have it Edited March 19, 2009 by floodge Link to comment Share on other sites More sharing options...
martin Posted March 19, 2009 Share Posted March 19, 2009 Local $a $a = StringRegExp($str, "(.*)", 3) _ArrayDisplay($a, "Definition of - "& $str2) Trimmed down all the html (Woot!) Figuring out now how to line up all of the definitions into an array The definitions go as 1. blah blah 2. blah blah etc, so is there certain arguements that I can place here $a = StringRegExp($str, "(.*)", 3) That will place each definition on a seperate column?It might be easier to filter out the bit you want first, then when you have an array of the lines go through each line to remove the unwanted bits #include <array.au3> #include <string.au3> #include <INet.au3> $mystr="http://dictionary.reference.com/browse/search"; & $str; $str = _InetGetSource($mystr) $str = _stringbetween($str,'<td width="35" class="dnindex">1.</td> <td>',"Synonyms:") $lines = StringSplit($str[0],'</span></td> </tr> </table> <table class="luna-Ent"> <tr> <td width="35" class="dnindex">',1) _ArrayDisplay($lines) Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script. Link to comment Share on other sites More sharing options...
floodge Posted March 19, 2009 Author Share Posted March 19, 2009 (edited) That code filters well, but I cant do a $Str = StringRegExpReplace($str, "</td>", "") because it either freezes the app or says it is an undefined array variable Using the other code I have it filtered this far. attached is a picture af what that spits out right now, working on merging your code and removing the lines etc EDIT: Making some progress, the problem (dare I ask this many questions) is that I am having trouble filtering from this point Using the code: $mystr="http://dictionary.reference.com/browse/" & $str2; $str = _InetGetSource($mystr) $str = _stringbetween($str,'<td width="35" class="dnindex">1.</td> <td>',"Synonyms:") $lines = StringSplit($str[0],'</span></td> </tr> </table> <table class="luna-Ent"> <tr> <td width="35" class="dnindex">',1) _ArrayDisplay($lines, "Definition of "& $str2) I am soooo close!! Edited March 19, 2009 by floodge Link to comment Share on other sites More sharing options...
Szhlopp Posted March 20, 2009 Share Posted March 20, 2009 That code filters well, but I cant do a $Str = StringRegExpReplace($str, "</td>", "") because it either freezes the app or says it is an undefined array variable Using the other code I have it filtered this far. attached is a picture af what that spits out right now, working on merging your code and removing the lines etc EDIT: Making some progress, the problem (dare I ask this many questions) is that I am having trouble filtering from this point Using the code: $mystr="http://dictionary.reference.com/browse/" & $str2; $str = _InetGetSource($mystr) $str = _stringbetween($str,'<td width="35" class="dnindex">1.</td> <td>',"Synonyms:") $lines = StringSplit($str[0],'</span></td> </tr> </table> <table class="luna-Ent"> <tr> <td width="35" class="dnindex">',1) _ArrayDisplay($lines, "Definition of "& $str2) I am soooo close!! Just use StringRegExReplace($lines[$I], "<.*?>") RegEx/RegExRep Tester!Nerd Olympics - Community App!Login UDFMemory UDF - "Game.exe+753EC" - CE pointer to AU3Password Manager W/ SourceDataFiler - Include files in your au3!--- Was I helpful? Click the little green '+' Link to comment Share on other sites More sharing options...
floodge Posted March 20, 2009 Author Share Posted March 20, 2009 Just use StringRegExReplace($lines[$I], "<.*?>") Doesn't work. I have tried using this code $mystr="http://dictionary.reference.com/browse/" & $str2; $str = _InetGetSource($mystr) $str = _stringbetween($str,'<td width="35" class="dnindex">1.</td> <td>',"Synonyms:") $lines = StringSplit($str[0],'</span></td> </tr> </table> <table class="luna-Ent"> <tr> <td width="35" class="dnindex">',1) #Region HTML Filter $lines = StringRegExpReplace($lines, "</td>", "") $lines = StringRegExpReplace($lines, "<td>", "") $lines = StringRegExpReplace($lines, "<tr>", "") $lines = StringRegExpReplace($lines, "</tr>", "") $lines = StringRegExpReplace($lines, "<class=>", "") $lines = StringRegExpReplace($lines, "</table>", "") $lines = StringRegExpReplace($lines, "<span>", "") $lines = StringRegExpReplace($lines, "</span>", "") $lines = StringRegExpReplace($lines, "<table class=""luna-Ent"">", "") $lines = StringRegExpReplace($lines, "</div>", "") ETC ETC ETC ETC #EndRegion _ArrayDisplay($lines, "Definition of "& $str2) Nothing comes up when I press the button, window just stays there. Link to comment Share on other sites More sharing options...
GEOSoft Posted March 20, 2009 Share Posted March 20, 2009 That will never work because you have declared $lines as an array and didn't reference the elements. This will be close but it's untested. $mystr="http://dictionary.reference.com/browse/" & $str2; $str = _InetGetSource($mystr) $str = _stringbetween($str,'<td width="35" class="dnindex">1.</td> <td>',"Synonyms:") If IsArray($str) Then MsgBox(0, "Results", _StripHTML($str[0])) EndIf Func _StripHTML($sStr) $sStr = StringReplace($sStr, "<", "<") $sStr = StringReplace($sStr, ">", ">") $sStr = StringReplace($sStr, "<br />", @CRLF) $sStr = StringReplace($sStr, "<p>", @CRLF & @CRLF) $sStr = StringReplace($sStr, " ", " ") $sStr = StringReplace($sStr, "&", "&") $aStr = StringRegExp($sStr, "&#(\d+);", 3) If NOT @Error Then For $i = 0 To Ubound($aStr) -1 $sStr = StringReplace($sStr, "&#" & $aStr[$i] & ";", Chr($aStr[$i])) Next EndIf $sStr = StringRegExpReplace($sStr, "(?i)(?s)<.+?>", "") Return $sStr EndFunc George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
floodge Posted March 20, 2009 Author Share Posted March 20, 2009 (edited) Yeah it works similarly to mine in a Msgbox but it would be neat to have had it in an array. EDIT: I dont need an array, I am focusing on another method Will post results, thank you for all af your help Edited March 20, 2009 by floodge Link to comment Share on other sites More sharing options...
martin Posted March 20, 2009 Share Posted March 20, 2009 (edited) Yeah it works similarly to mine in a Msgbox but it would be neat to have had it in an array. EDIT: I dont need an array, I am focusing on another method Will post results, thank you for all af your helpTry this #include <array.au3> #include <string.au3> #include <INet.au3> $tofind = "hammer" $mystr = "http://dictionary.reference.com/browse/" & $tofind $str = _INetGetSource($mystr) $str = stringtrimleft($str,StringInStr($str,'<td width="35" class="dnindex">1.</td> <td>')-1) $str = StringReplace($str,'<div class="ety"> <b>Origin:','<span class="sectionLabel">Synonyms:') ConsoleWrite(@extended & @CRLF) $str = _StringBetween($str, '<td width="35" class="dnindex">1.</td> <td>','<span class="sectionLabel">Synonyms:') $lines = StringSplit($str[0], '<td width="35" class="dnindex">', 1) $lines[1] = "1. " & $Lines[1] _ArrayDisplay($lines) For $n = 1 To $lines[0] $lines[$n] = StringRegExpReplace($lines[$n], "(<.*?>)", "") Next _ArrayDisplay($lines);<--now gives 14 results ;version II $lines = "1. " & StringReplace($str[0], '<td width="35" class="dnindex">', @CRLF) $lines = StringRegExpReplace($lines, "(<.*?>)", "") MsgBox(262144, "result ", $lines) EDIT: changed because not all words searched have Synonyms. Edited March 20, 2009 by martin Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script. Link to comment Share on other sites More sharing options...
floodge Posted March 20, 2009 Author Share Posted March 20, 2009 Try this #include <array.au3> #include <string.au3> #include <INet.au3> $tofind = "hammer" $mystr = "http://dictionary.reference.com/browse/" & $tofind $str = _INetGetSource($mystr) $str = stringtrimleft($str,StringInStr($str,'<td width="35" class="dnindex">1.</td> <td>')-1) $str = StringReplace($str,'<div class="ety"> <b>Origin:','<span class="sectionLabel">Synonyms:') ConsoleWrite(@extended & @CRLF) $str = _StringBetween($str, '<td width="35" class="dnindex">1.</td> <td>','<span class="sectionLabel">Synonyms:') $lines = StringSplit($str[0], '<td width="35" class="dnindex">', 1) $lines[1] = "1. " & $Lines[1] _ArrayDisplay($lines) For $n = 1 To $lines[0] $lines[$n] = StringRegExpReplace($lines[$n], "(<.*?>)", "") Next _ArrayDisplay($lines);<--now gives 14 results ;version II $lines = "1. " & StringReplace($str[0], '<td width="35" class="dnindex">', @CRLF) $lines = StringRegExpReplace($lines, "(<.*?>)", "") MsgBox(262144, "result ", $lines) EDIT: changed because not all words searched have Synonyms. It works!!! Added to the program, which is almost done Link to comment Share on other sites More sharing options...
floodge Posted March 20, 2009 Author Share Posted March 20, 2009 Alright, curious if there is a way to automatically resize the array window. $str2 = IniRead("dictionary.ini", "words", "word2", "NotFound") if $str2 = "" Then exit EndIf $mystr="http://dictionary.reference.com/browse/" & $str2; $str = _InetGetSource($mystr) $str = _stringbetween($str,'<td width="35" class="dnindex">1.</td> <td>',"</td> ") _process($str[0]) $str2 = IniRead("dictionary.ini", "words", "word3", "NotFound") if $str2 = "" Then exit EndIf $mystr="http://dictionary.reference.com/browse/" & $str2; $str = _InetGetSource($mystr) $str = _stringbetween($str,'<td width="35" class="dnindex">1.</td> <td>',"</td> ") _process($str[0]) I am pulling from an ini now (10 itterations of this), and I am looking for a method to still continue, or somehow ignore the code when no word is entered in the ini. Right now I just use an if ="" then exit, which sort of works Link to comment Share on other sites More sharing options...
martin Posted March 20, 2009 Share Posted March 20, 2009 Alright, curious if there is a way to automatically resize the array window.Do you mean the _ArrayDisplay window? $str2 = IniRead("dictionary.ini", "words", "word2", "NotFound") if $str2 = "" Then exit EndIf $mystr="http://dictionary.reference.com/browse/" & $str2; $str = _InetGetSource($mystr) $str = _stringbetween($str,'<td width="35" class="dnindex">1.</td> <td>',"</td> ") _process($str[0]) $str2 = IniRead("dictionary.ini", "words", "word3", "NotFound") if $str2 = "" Then exit EndIf $mystr="http://dictionary.reference.com/browse/" & $str2; $str = _InetGetSource($mystr) $str = _stringbetween($str,'<td width="35" class="dnindex">1.</td> <td>',"</td> ") _process($str[0]) I am pulling from an ini now (10 itterations of this), and I am looking for a method to still continue, or somehow ignore the code when no word is entered in the ini. Right now I just use an if ="" then exit, which sort of works If you have a default of "NotFound" then shouldn't you have If $str2 = "NotFound" Then ? Serial port communications UDF Includes functions for binary transmission and reception.printing UDF Useful for graphs, forms, labels, reports etc.Add User Call Tips to SciTE for functions in UDFs not included with AutoIt and for your own scripts.Functions with parameters in OnEvent mode and for Hot Keys One function replaces GuiSetOnEvent, GuiCtrlSetOnEvent and HotKeySet.UDF IsConnected2 for notification of status of connected state of many urls or IPs, without slowing the script. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now