dantay9 Posted May 7, 2009 Share Posted May 7, 2009 I know there's already a good dictionary out there, but I wanted to make my own. I am fairly new at StringRegExp and I am having trouble with it. I used Expresso and it seemed to turn out ok there, but it doesn't seem to work in my script. I am trying to keep all lines with: 1. a single digit number followed by a period 2. a two digit number followed by a period 3. a letter followed by a period (for subdefinitions) 4. the first two characters are "--" (for the part of speech) Please help me point out the problem here. #include <Array.au3> $Word = "test" $IE = ObjCreate("InternetExplorer.Application") If Not IsObj($IE) Then MsgBox(0, "ERROR", "Object is not a variable.") Exit EndIf $IE.navigate("http://dictionary.reference.com/browse/" & $Word) Do Sleep(500) Until $IE.document.readyState = "complete" $text = $IE.document.body.innertext $text = StringTrimLeft($text, StringInStr($text, "Show IPA") + 7) $text = StringTrimRight($text, StringLen($text) - StringInStr($text, "Dictionary.com Unabridged") + 1) $Array = StringSplit($text, @CR) $x = 2 While 1 If $x = UBound($Array) Then ExitLoop $Temp = StringStripWS($Array[$x], 8) If Not StringRegExp($Temp, "^(--|\d\.|\d\d\.|[a-zA-Z])") Then _ArrayDelete($Array, $x) Else $x += 1 EndIf WEnd _ArrayDisplay($Array) Link to comment Share on other sites More sharing options...
Aceguy Posted May 7, 2009 Share Posted May 7, 2009 could you post a snippet of your text so we can test stringregexp plze [u]My Projects.[/u]Launcher - not just for games & Apps (Mp3's & Network Files)Mp3 File RenamerMy File Backup UtilityFFXI - Realtime to Vana time Clock Link to comment Share on other sites More sharing options...
dantay9 Posted May 7, 2009 Author Share Posted May 7, 2009 So far, that is the whole script. The text comes from the body of the website. Just change the word to change the output. The text is basically the source from the website. Link to comment Share on other sites More sharing options...
Skizmata Posted May 7, 2009 Share Posted May 7, 2009 MsgBox(0,"",StringRegExp("1.",'[0-9A-Za-z][0-9.]\.?|^--')) MsgBox(0,"",StringRegExp("12.",'[0-9A-Za-z][0-9.]\.?|^--')) MsgBox(0,"",StringRegExp("A.",'[0-9A-Za-z][0-9.]\.?|^--')) MsgBox(0,"",StringRegExp("--",'[0-9A-Za-z][0-9.]\.?|^--')) This matches all your cases but I expect is actually a little sloppy. If you want exactly what you asked for I think the above has it covered but would also match things your didn't ask for. The -- has to be at the start of the input that's what the ^ denotes before it but as for the rest of them you didn't say anything about them being at the start of the line. If the regex don't make sense let me know I would be happy to help break it down. If they are too sloppy you will have to get us some better example cases with some more specific rules. AutoIt changed my life. Link to comment Share on other sites More sharing options...
Aceguy Posted May 7, 2009 Share Posted May 7, 2009 try. "(\d{1,2}|\--)\.\s.*\r" [u]My Projects.[/u]Launcher - not just for games & Apps (Mp3's & Network Files)Mp3 File RenamerMy File Backup UtilityFFXI - Realtime to Vana time Clock Link to comment Share on other sites More sharing options...
Skizmata Posted May 7, 2009 Share Posted May 7, 2009 "(\d{1,2}|\--)\.\s.*\r" This would require the . even after the -- I guess I really have no idea what he is after with no examples but --. was not in the 4 rules he gave. Also the \r would require the CRLF causing it to not work if the line was the last line on a page $ is end of line char and might be more appropriate for web parsing. But again I dont know... AutoIt changed my life. Link to comment Share on other sites More sharing options...
Moderators SmOke_N Posted May 7, 2009 Moderators Share Posted May 7, 2009 (edited) 1. The thing you think is a hyphen before the part of speech is actually decimal 150 for ascii, and some of the "periods" are decimal 183. 2. I never had more than one char 150, but made an exception in the code below. You could shorten everything quite a bit I think:#include <Array.au3> #include <IE.au3> Global $s_word = "test" Global $o_ie = _IECreate("http://dictionary.reference.com/browse/" & $s_word, 0, 0) Global $s_text = StringRegExpReplace(_IEBodyReadText($o_ie), _ "(?i)(?s)(.*?Show IPA)(.*?)(Dictionary\.com Unabridged.*?)\z", "\2") _IEQuit($o_ie) Global $a_result = StringRegExp($s_text, "(?:\A|\v)((?:(?:–|-)+\w|\d+(?:\xB7|\.)|[a-zA-Z](?:\xB7|\.)).+?)\v", 3) _ArrayDisplay($a_result) Edit: BTW, for some odd reason, I couldn't get \x96 to work for decimal 150! Edited May 7, 2009 by SmOke_N Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer. Link to comment Share on other sites More sharing options...
dantay9 Posted May 7, 2009 Author Share Posted May 7, 2009 Thanks everyone. I learned a little more about StringRegExp now. Nice work SmOkeN. That worked great. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now