Sign in to follow this  
Followers 0
dantay9

StringRegExp

8 posts in this topic

I know there's already a good dictionary out there, but I wanted to make my own. I am fairly new at StringRegExp and I am having trouble with it. I used Expresso and it seemed to turn out ok there, but it doesn't seem to work in my script. I am trying to keep all lines with:

1. a single digit number followed by a period

2. a two digit number followed by a period

3. a letter followed by a period (for subdefinitions)

4. the first two characters are "--" (for the part of speech)

Please help me point out the problem here.

#include <Array.au3>
$Word = "test"
$IE = ObjCreate("InternetExplorer.Application")
If Not IsObj($IE) Then
    MsgBox(0, "ERROR", "Object is not a variable.")
    Exit
EndIf
$IE.navigate("http://dictionary.reference.com/browse/" & $Word)
Do
    Sleep(500)
Until $IE.document.readyState = "complete"
$text = $IE.document.body.innertext
$text = StringTrimLeft($text, StringInStr($text, "Show IPA") + 7)
$text = StringTrimRight($text, StringLen($text) - StringInStr($text, "Dictionary.com Unabridged") + 1)
$Array = StringSplit($text, @CR)
$x = 2

While 1
    If $x = UBound($Array) Then ExitLoop
    $Temp = StringStripWS($Array[$x], 8)
    If Not StringRegExp($Temp, "^(--|\d\.|\d\d\.|[a-zA-Z])") Then
        _ArrayDelete($Array, $x)
    Else
        $x += 1
    EndIf
WEnd

_ArrayDisplay($Array)

[font="Verdana"] [size="2"]"[/size][/font]Failure is not an option -- it comes packaged with Windows"[font="Verdana"][size="2"] Gecko Web Browser[/size][/font][font="Verdana"][size="2"], [/size][/font][font="Verdana"][size="2"]Yahtzee![/size][/font][font="Verdana"][size="2"], Toolbar Launcher (like RocketDock)[/size][/font][font="Verdana"][size="2"]Internet Blocker, Simple Calculator, Local Weather, Easy GDI+ GUI [/size][/font][font="Verdana"][size="2"]Triangle Solver, TCP File Transfer, [/size][/font][font="Verdana"][size="2"]Valuater's Autoit Wrappers[/size][/font][font="Verdana"][size="3"][size="2"][size="2"]OOP In AutoIt[/size][/size][/size][/font][font="Verdana"][size="2"][size="1"]Using Windows XP SP3, 1GB RAM, AMD Athlon Processor @ 2.1 GHzCheck me out at gadgets.freehostrocket.com[/size][/size][/font]

Share this post


Link to post
Share on other sites



So far, that is the whole script. The text comes from the body of the website. Just change the word to change the output. The text is basically the source from the website.


[font="Verdana"] [size="2"]"[/size][/font]Failure is not an option -- it comes packaged with Windows"[font="Verdana"][size="2"] Gecko Web Browser[/size][/font][font="Verdana"][size="2"], [/size][/font][font="Verdana"][size="2"]Yahtzee![/size][/font][font="Verdana"][size="2"], Toolbar Launcher (like RocketDock)[/size][/font][font="Verdana"][size="2"]Internet Blocker, Simple Calculator, Local Weather, Easy GDI+ GUI [/size][/font][font="Verdana"][size="2"]Triangle Solver, TCP File Transfer, [/size][/font][font="Verdana"][size="2"]Valuater's Autoit Wrappers[/size][/font][font="Verdana"][size="3"][size="2"][size="2"]OOP In AutoIt[/size][/size][/size][/font][font="Verdana"][size="2"][size="1"]Using Windows XP SP3, 1GB RAM, AMD Athlon Processor @ 2.1 GHzCheck me out at gadgets.freehostrocket.com[/size][/size][/font]

Share this post


Link to post
Share on other sites

MsgBox(0,"",StringRegExp("1.",'[0-9A-Za-z][0-9.]\.?|^--'))
MsgBox(0,"",StringRegExp("12.",'[0-9A-Za-z][0-9.]\.?|^--'))
MsgBox(0,"",StringRegExp("A.",'[0-9A-Za-z][0-9.]\.?|^--'))
MsgBox(0,"",StringRegExp("--",'[0-9A-Za-z][0-9.]\.?|^--'))

This matches all your cases but I expect is actually a little sloppy. If you want exactly what you asked for I think the above has it covered but would also match things your didn't ask for.

The -- has to be at the start of the input that's what the ^ denotes before it but as for the rest of them you didn't say anything about them being at the start of the line. If the regex don't make sense let me know I would be happy to help break it down. If they are too sloppy you will have to get us some better example cases with some more specific rules.


AutoIt changed my life.

Share this post


Link to post
Share on other sites

"(\d{1,2}|\--)\.\s.*\r"
This would require the . even after the -- I guess I really have no idea what he is after with no examples but --. was not in the 4 rules he gave. Also the \r would require the CRLF causing it to not work if the line was the last line on a page $ is end of line char and might be more appropriate for web parsing.

But again I dont know...


AutoIt changed my life.

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

1. The thing you think is a hyphen before the part of speech is actually decimal 150 for ascii, and some of the "periods" are decimal 183.

2. I never had more than one char 150, but made an exception in the code below.

You could shorten everything quite a bit I think:

#include <Array.au3>
#include <IE.au3>

Global $s_word = "test"
Global $o_ie = _IECreate("http://dictionary.reference.com/browse/" & $s_word, 0, 0)
Global $s_text = StringRegExpReplace(_IEBodyReadText($o_ie), _
                    "(?i)(?s)(.*?Show IPA)(.*?)(Dictionary\.com Unabridged.*?)\z", "\2")
_IEQuit($o_ie)
Global $a_result = StringRegExp($s_text, "(?:\A|\v)((?:(?:–|-)+\w|\d+(?:\xB7|\.)|[a-zA-Z](?:\xB7|\.)).+?)\v", 3)
_ArrayDisplay($a_result)

Edit:

BTW, for some odd reason, I couldn't get \x96 to work for decimal 150!

Edited by SmOke_N

[center]Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.[/center]

Share this post


Link to post
Share on other sites

Thanks everyone. I learned a little more about StringRegExp now. Nice work SmOkeN. That worked great.


[font="Verdana"] [size="2"]"[/size][/font]Failure is not an option -- it comes packaged with Windows"[font="Verdana"][size="2"] Gecko Web Browser[/size][/font][font="Verdana"][size="2"], [/size][/font][font="Verdana"][size="2"]Yahtzee![/size][/font][font="Verdana"][size="2"], Toolbar Launcher (like RocketDock)[/size][/font][font="Verdana"][size="2"]Internet Blocker, Simple Calculator, Local Weather, Easy GDI+ GUI [/size][/font][font="Verdana"][size="2"]Triangle Solver, TCP File Transfer, [/size][/font][font="Verdana"][size="2"]Valuater's Autoit Wrappers[/size][/font][font="Verdana"][size="3"][size="2"][size="2"]OOP In AutoIt[/size][/size][/size][/font][font="Verdana"][size="2"][size="1"]Using Windows XP SP3, 1GB RAM, AMD Athlon Processor @ 2.1 GHzCheck me out at gadgets.freehostrocket.com[/size][/size][/font]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0