Sign in to follow this  
Followers 0
Scottswan

Parsing text

14 posts in this topic

Hi,

I'm new here and with AutoIt, but quite interested in it all.

I've been thumbing though the incredibility well written help file and help section of the main site so far, but haven't begun to write anything yet.

One thing I am wondering is, if AutoIt can parse text?

I'm thinking of writing a script that can search through a page of text looking for keywords, and then copy the entire sentance or paragraph containing a given keyword to the clipboard (or a notepad file).

Any thoughts will be appriciated.

-Scott

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

This can be done quite easy, and there are some nice scraps on Regular Expression with AutoIt as well. (gives better searching).

As far as your question, it makes a difference if it is a HTML page, or a text file. You can set up a quick fast reader if all your sentances were on one line. If you could sort by the period, you could get the line as well.

Ex: (off the top of my head so might have a mistype or something).

Highlight the above and paste into clipboard.

$lines=StringSplit(Clipget(),".")
$word = InputBox("Question", "What word?", "Question" )

$found=0
for $i=1 to $lines[0]
    If StringInStr($lines[$i],$word)>0 Then 
        MsgBox(1,"Sentance is",$lines[$i]&".")
        $found=1
    EndIf
Next
    
    If $found=0 Then MsgBox(1,"Error","No Sentance with "&$word&" Found")

This is just an example, Sentances end with all sorts of things, and you would have to factor all those into it.

Edited by scriptkitty

AutoIt3, the MACGYVER Pocket Knife for computers.

Share this post


Link to post
Share on other sites

Excellent, thanks! :D

That should give me a good start.

What I'm going to attempt to make is a search assistant....

I am a Google junkie (who isn't?), but all too often I find myself wasting time trying to find a quick answer to something that I know is out there. Usually buried down in a blog or message board somewhere. Sometimes I even use Googles cached pages because they highlite my search terms, rather than hitting ctrl + f and searching a page myself.

Google is great, and fast, but I'm thinking I could automate the task of finding the text for a quick answer to a question that I know has an answer.

So AutoIt will do the work for me, and probably a lot faster. :)

Thinking out loud here....

I start with a Input box for my search string (ctrl + g will pop up a jpg that shows the Google operands in case I need a more specific search). That gets passed to Mozilla's url bar which I have set to do a Google search. When the Google results load I jump down to the first actual result link and load that page. To avoid weeding though html, javascript, css and whatnot (which I also might be searching for), I simply run a macro of ctrl + a and ctrl + c to snipe all of the page text without the formatting into the clip board. I then run ScriptKitty's parser to suck out the sentance or paragraph that has my search term and paste that to a blank notepad file. Then go back to Mozilla, hit the back button (or just open the Google links in a new tab each time so I will have them in front of me if I need) and go on to the next result link. Do that about 5 times and then prompt to do 5 more.

When done I will have a notepad file open with 5 blocks of text that will hopefully either have the answer to my question or enough info to ask a better version of the question. Not sure if I want copy the URL's over to the notepad file.... might take too long.

This might have already been done or even posted here, but I need this 'search assistant' to parse this message board in order to find it! :)

I welcome any and all thoughts on this idea.

As I was reading elsewhere on this board.... even though I'm new at scripting with AutoIt, having a project that you really desire is one of the best ways to get started. ;)

-Scott

Share this post


Link to post
Share on other sites

depending on the browser you are using, after you do your ^a^c copy it

$mytext=clipget()

you could send a !d to it(address bar) then ^a^c copy it

$bar=clipget()

Then parse your file.

For starters if you are going to use this, I would do a few substitutions:

$replacements=StringSplit('! |? |. ','|')
for $i=1 to $replacements[0]
$mytext=StringReplace($mytext,$replacements[$i],'|')
next
; then stringsplit by |
; this defines a sentance as having a space after the period, questionmark, or !
; good for numbers or ip addresses.

just a thought.


AutoIt3, the MACGYVER Pocket Knife for computers.

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

Looks like a good thought, I'll let you know after I can understand it. :)

I'm having trouble writing to the same active notepad twice.... I've tried a few things but it keeps ignoring them.

If ProcessExists("notepad.exe") Then 
WinSetState("new file - ", "", @SW_SHOW)
; WinActivate("new file - ", "")
Else
; Run Notepad
Run("notepad.exe")

; Wait for Notepad become active.
WinWaitActive("new file - ")
EndIf

Side note:

Anybody use Ultra Edit as their editor?

Anybody re-tune their Wordfile.txt to correctly highlite AutoIt's syntax?

Here's what I have so far...

;Search Assistant
;11-10-04 by Scott Swanson

Do

  ; Ask the question.
   $lines=StringSplit(Clipget(),".")
   $word = InputBox("Search Assistant", "Enter search string", "" )
   If @Error = 1 Then 
      MsgBox(0, "AutoIt Example", "OK... C-ya!")
      Exit
   EndIf

  ; Parse the clipboard.
   $found=0
   for $i=1 to $lines[0]
      If StringInStr($lines[$i],$word)>0 Then
   
         If ProcessExists("notepad.exe") Then 
            WinSetState("new file - ", "", @SW_SHOW)
           ; WinActivate("new file - ", "")
         Else
           ; Run Notepad
            Run("notepad.exe")

           ; Wait for Notepad become active.
            WinWaitActive("new file - ")
         EndIf

        ; Send the answer to Notepad...
         Send($lines[$i]&".")
         $found=1
      EndIf
   Next

  ; ...unless we didn't find anything.   
   If $found=0 Then MsgBox(1,"Error","No Sentance with "&$word&" Found")

  ;Try again? If no then end Do-Until loop.
   $answer = MsgBox(4, "Search Assistant", "Search again?")
Until $answer = 7

I'm not exactly a coder, but I'm finding this quite easy to learn...and fun!

-Scott

edit: code indent fixed by Larry

Edited by Larry

Share this post


Link to post
Share on other sites

Yea, I will post up my UltraEdit files


AutoIt3, the MACGYVER Pocket Knife for computers.

Share this post


Link to post
Share on other sites

Re-looking at my notepad/Exists problem code, I nested an 'If' statement within an 'If' statement didn't I.

Prolly can't do that huh?

Error checking didn't bark at me!

-Scott

Share this post


Link to post
Share on other sites

Hokay,

after some very patient testing I found the bug with not being able to re-focus notepad.

Fact #1: I replaced notepad with metapad some years ago... I don't even have a copy of the original.

Fact #2: Any edits made in metapad will change the window name from

"new file - metapad" to

"* new file - metapad"

until the file is saved. Pretty common among more advanced editors.

Fact #3: This is what had me stumped for quite awhile.... Non of these functions will allow the use of an asterisk in the parameters:

MsgBox(1, "", "Ready to focus notepad?")

; ControlGetFocus("text.txt - metapad")

WinActivate("* text.txt - metapad", "")

; ControlFocus("* new file - metapad", "", "metapad")

; ControlShow("* new file - metapad", "", "metapad")

Send("If you can read this then it worked")

MsgBox(1, "", "If it didn't work (highly likely) then see if it exists")

If WinExists("* text.txt - metapad") Then
    MsgBox(0, "", "Window exists")
EndIf

Excuse the mess, thats where I left off when I realized that the asterisk was definately the problem.

So,

Perl has an issue like that with an apostrophe, one has to add a / in front of any apostrophe's in a quoted text line or perl will ignor everything after it.

Is there a work around for this in AutoIt? Or am I going to have to code around it by saving the files so the metapad window name drops the asterisk? (I suppose I could just get the original MS Notepad and call it a day, but I like a challange, plus I need to know these little quirks)

Oh, why the asterisk? I couldn't find anything on this in the help file.

-Scott

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

I would suggest using the handle, or the class. Ex:

sleep(1000)
$title=WingetTitle("","")
Opt("WinTitleMatchMode", 4);advanced mode that supports handles
$h = WinGetHandle($title)
WinSetTitle("handle=" & $h, "", "New Title")
sleep(1000)
WinSetTitle("handle=" & $h, "", "Another New Title")
sleep(1000)
WinSetTitle("handle=" & $h, "",$title)

The handle is the unique name for the window that is opened, the handle doesn't change like the title, so you don't have to wait for the name. You may want to monitor the name to see if it is done doing what it was doing however. :)

Edited by scriptkitty

AutoIt3, the MACGYVER Pocket Knife for computers.

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

WinSetTitle("handle=" & $h, "", "New Title")

Seems like a bandaid to me but hey, it works!

Now on to figure out how to get control of the URL bar in the browser....

Edit: Actually, ScriptKitty showed how to do that, but I'm having trouble finding out how he knew that. Works like a charm tho... :)

Send("!d")

Here's my updated working script so far...

; Run Notepad
Run("notepad.exe")
; Wait for Notepad become active.
WinWaitActive("new file - ")
Send("Here are your results:{Enter 2}")
WinSetTitle("", "", "Search Results")

; Start the loop
Do

; Ask for a search string.
$lines=StringSplit(Clipget(),".")
$word = InputBox("Search Assistant", "Enter search string", "" )
   If @Error = 1 Then 
    MsgBox(0, "Search Assistant", "OK... C-ya!")
    Exit
   EndIf

; Still To do...    
; Launch browser.
; Run google search.
; Return first result page.
; Copy whole page to clipboard. 
     
; Parse the clipboard.
$found=0
for $i=1 to $lines[0]
   If StringInStr($lines[$i],$word)>0 Then
   
; Send the answer to Notepad...
WinActivate("Search Results", "")

    Send($lines[$i]&".{Enter 2}")
       $found=1
   EndIf
Next

; Change window title to drop the asterisk so we can find it next time.
WinSetTitle("", "", "Search Results")

; ...unless we didn't find anything.   
   If $found=0 Then MsgBox(1,"Error","No Sentance with "&$word&" Found")

;Try again? If no then end Do-Until loop.
$answer = MsgBox(4, "Search Assistant", "Search again?")
Until $answer = 7

-Scott

Edited by Scottswan

Share this post


Link to post
Share on other sites

edit: OMG, they did leave it as "metapad"... goofy

Lar.

<{POST_SNAPBACK}>

I think the asterisk bug was specific to metapad. I replaced the non-standard metapad with Notepad2 which also puts an asterisk in the title, but AutoIt is able to WinActivate("* Untitled - ") it just fine.

Notepad2 looks like a pretty handy editor.

-Scott

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

Seems like a bandaid to me but hey, it works!

Now on to figure out how to get control of the URL bar in the browser....

Edit: Actually, ScriptKitty showed how to do that, but I'm having trouble finding out how he knew that.  Works like a charm tho... :)

Send("!d")

Here's my updated working script so far...

; Run Notepad
Run("notepad.exe")
; Wait for Notepad become active.
WinWaitActive("new file - ")
Send("Here are your results:{Enter 2}")
WinSetTitle("", "", "Search Results")

; Start the loop
Do

; Ask for a search string.
$lines=StringSplit(Clipget(),".")
$word = InputBox("Search Assistant", "Enter search string", "" )
   If @Error = 1 Then 
    MsgBox(0, "Search Assistant", "OK... C-ya!")
    Exit
   EndIf

; Still To do...    
; Launch browser.
; Run google search.
; Return first result page.
; Copy whole page to clipboard. 
     
; Parse the clipboard.
$found=0
for $i=1 to $lines[0]
   If StringInStr($lines[$i],$word)>0 Then
   
; Send the answer to Notepad...
WinActivate("Search Results", "")

    Send($lines[$i]&".{Enter 2}")
       $found=1
   EndIf
Next

; Change window title to drop the asterisk so we can find it next time.
WinSetTitle("", "", "Search Results")

; ...unless we didn't find anything.   
   If $found=0 Then MsgBox(1,"Error","No Sentance with "&$word&" Found")

;Try again? If no then end Do-Until loop.
$answer = MsgBox(4, "Search Assistant", "Search again?")
Until $answer = 7

-Scott

<{POST_SNAPBACK}>

He knew that from experience. Also in Firefox you can press F6 is the focus is right :-P so that one is tricky. I have fogotten the one for explorer as I havent used it in a while.

Edit: I have also started using the !d it works much better.

JS

Edited by JSThePatriot

AutoIt Links

File-String Hash Plugin Updated! 04-02-2008 Plugins have been discontinued. I just found out.

ComputerGetInfo UDF's Updated! 11-23-2006

External Links

Vortex Revolutions Engineer / Inventor (Web, Desktop, and Mobile Applications, Hardware Gizmos, Consulting, and more)

Share this post


Link to post
Share on other sites

Done with ver 1 :)

It has some unpredictability, like forgetting to stop writing to notepad once it hits a period, but I think that ads charactor. :">

Some of my most interesting results were from:

"d-360" Gave me the camera info I expected.

"revelations" Wrote a book on this one!... I need to add a cancel hot key.

"bin laden" I didn't know he was left handed.

I added all my testing snippits along with the research_v1.au3 script in the file manager here... http://www.autoitscript.com/fileman/index.php?dir=ScottSwan

I think it would very helpful to have more runable snippets in with the Examples included with AutoIt3. The help file is incredible, but looking at code, and then running the code.... thats the ticket!

Much thanks for the help!

In only 3 lunch hours (give or take) I was able to learn enough to make this script do exactly what I wanted it to do. :)

I look forward to improving on this one and working on about 500 others I have in mind.

AutoIt Rocks!

-Scottresearch_v1.au3

Share this post


Link to post
Share on other sites

Hey Scottswan,

Make sure you go post this in the Scripts and Scraps section. It looks really good. I am thinking of making some modifications to it. :) I will let you know what I do.

JS


AutoIt Links

File-String Hash Plugin Updated! 04-02-2008 Plugins have been discontinued. I just found out.

ComputerGetInfo UDF's Updated! 11-23-2006

External Links

Vortex Revolutions Engineer / Inventor (Web, Desktop, and Mobile Applications, Hardware Gizmos, Consulting, and more)

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0