Sign in to follow this  
Followers 0
Sticky

Multiple Word Find

18 posts in this topic

I wanted to be able to search through text and find all results, and for more than one word.

This does that for 1 to 5 words, and displays the results in a webpage.

After you run it, you must hit CTRL+ALT+F for the GUI to show up.

Let me know what you think!

multiWord.au3

Share this post


Link to post
Share on other sites



I wanted to be able to search through text and find all results, and for more than one word.

This does that for 1 to 5 words, and displays the results in a webpage.

After you run it, you must hit CTRL+ALT+F for the GUI to show up.

Let me know what you think!

Sounds interesting. How would the search differ from, say a regular expression like "(Word|AnotherWord|MyText)", besides the obvious webpage results (which is cool)

Share this post


Link to post
Share on other sites

How would you use the results from the regular expression to form the best matched original strings that include all the search terms.... basically what I'm asking is how would you use regular expressions to generate the same webpage?

Sounds interesting. How would the search differ from, say a regular expression like "(Word|AnotherWord|MyText)", besides the obvious webpage results (which is cool)

Share this post


Link to post
Share on other sites

Actually I guess I don't understand what is the whole intent of the program. You could use 'StringReplace' on files and get the count from @extended if you are just counting each string..

Share this post


Link to post
Share on other sites

Actually I guess I don't understand what is the whole intent of the program. You could use 'StringReplace' on files and get the count from @extended if you are just counting each string..

I guess that'd be my fault for not explaining the whole thing, lol. Basically sometimes I find myself opening a webpage, or document, or just anything with text, and I want to find where it mentions a topic to help with research for example. So if I have a 120 page document, and I'm looking for where it mentions: yellow, green, and blue, then I can't search for all three at once, instead I'd hit CTRL+F and type one of the three and continuously hit Find Next and personally look in the area of the paragraph that the word was found, in the end, this is quite time consuming.

So this is where my program comes in, it copies all the text in the currently focused window, and then searches for 1 to 5 words of your choice and displays the results all at once in order from the best to least matches. So it allows me to search for words around the same area, not exact order. And it also shows all results at the same time, instead of one at a time. Make sense?

Share this post


Link to post
Share on other sites

Make sense?

Not yet. What results will it return? Just the words or the line numbers that they appear on or some portion of the text?

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

It should, if I understand correctly, find each of a maximum of five / four words to search on the current page and write it's HTML's structure as it was but will replace the found words with "<font color=red><b>...</b></font>" tags. The whole great work you did with your script and the idea is quite handful but it's better to use StringRegExpReplace instead, globally.

Share this post


Link to post
Share on other sites

That makes some more sense. I forgot about the HTML


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

I feel I'm misleading you, it's just ctrl+a, ctrl+c make generic "<html><title></title><body></body></html> with the string clipped and not read the entire HTML, though it would be probably simple and nicer but matching tags and change their structure is not safe or useful, so you need sort of anything not '<' and it's the word or match but don't capture '<' until it's closing one. Anyway, I'll try to make it simpler, few mins. ^^

Share this post


Link to post
Share on other sites

It's okay, I have the general concept figured out. Just don't know when or if I'll need it. Might be handy in the future.


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

#include <WindowsConstants.au3>
Opt('GuiOnEventMode', 1)

HotKeySet('^+f', '_Find')
HotKeySet('{ESC}', '_EXIT')

Dim $hGUI = GUICreate('Find multiple words', 300, 250, Default, Default, BitOR($WS_POPUP, $WS_CAPTION), BitOR($WS_EX_TOOLWINDOW, $WS_EX_TOPMOST))
Dim $Inputs[4]
GUICtrlCreateButton('&Search', 115, 200, 70, 23)
GUICtrlSetOnEvent(-1, '_Search')

For $i = 0 To 3
    $Inputs[$i] = GUICtrlCreateInput('', 30, $i*30+25, 230, 25)
Next


While 1
    Sleep(100)
WEnd


Func _EXIT()
    GUIDelete($hGUI)
    Exit
EndFunc


Func _Find()
    GUISetState()
EndFunc


Func _Search()
    
    GUISetState(@SW_HIDE)
    
    Local $Clipboard
    Local $sFile = @ScriptDir & '\HTML' & @HOUR & '-' & @MIN & '-' & @SEC & '.html'
    Local $sTitle = WinGetTitle('[ACTIVE]')
    
    $sTitle = StringRegExpReplace($sTitle, '(.*) -.*', '\1')
    
    Send('^a^c')
    $Clipboard = ClipGet()
    
    For $i = 0 To 3
        If GUICtrlRead($Inputs[$i]) = '' Then ContinueLoop
        $Clipboard = StringRegExpReplace($Clipboard, '(?i)(\Q' & GUICtrlRead($Inputs[$i]) & _
           '\E)', '<span style="background-color:#BCD0ED;color:#0000FF;font-size:22;font-weight:bold;font-style:italic;"><b>\1</b></span>')
    Next
       
    $Clipboard = StringRegExpReplace($Clipboard, '\r\n', '<BR>')
    $Clipboard = '<html><head><title>' & $sTitle & '</title></head><body bgcolor=#E4EAF2>' & _
                            '<span style="font-face:verdana">' & $Clipboard & '</span></body></html>'
    
    FileWrite($sFile, $Clipboard)
    ShellExecute($sFile)
    Sleep(1000)
    FileDelete($sFile)
    ClipPut('')
EndFunc

It'd be much handful to work directly with the HTML source but it's sufficient too, just need a little make-up like background color etc... (have no idea) ;].

Edit: Added some nice things, much nicer.

Edit2: Thanks GEOSoft, ;]

Edited by Authenticity

Share this post


Link to post
Share on other sites

#12 ·  Posted (edited)

Agreed. I was thinking more along the lines of reading a large text file. I would just convert it to HTML after the word search using something like

$Word = "something"
$Word2 = "more"
$sStr = FileRead($File)
$sStr = StringReplace($sStr, $Word, "<font color=red><b>" & $Word & "</b></font>")
$sStr = StringReplace($sStr, $Word2, "<font color=blue><b>" & $Word2 & "</b></font>")
$Html = <html><head></head><body>" & $sStr & "</body></html>"
FileWrite($hFile, $Html)
ShellExecute($hFile)
FileDelete($hFile)

Of course the html might need a bit more formatting. This is just a concept done on the fly.

EDIT: Since <font> is considered outdated, the replace strings would be more along the lines of

'<span style="color:blue;"><b>" & $Word2 & "</b></span>'

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

#13 ·  Posted (edited)

Hmm..., you said about parsing the html source from a file, but how would you get it in first place if you're not already integrated with the browser? I may think that it's not a great task to attach to the active IE browser but in the case of FF it'd require that the user got the MozRepl add-on, otherwise, you need to integrate into it (not fun). ;[ It seems like I'm referring only to browser as to find multiple words but actually I can hardly believe it's intended to editors in first place.

Edit: lol thanks for the span tag, it's much versatile than font, etc..

Edited by Authenticity

Share this post


Link to post
Share on other sites

Hmm..., you said about parsing the html source from a file, but how would you get it in first place if you're not already integrated with the browser? I may think that it's not a great task to attach to the active IE browser but in the case of FF it'd require that the user got the MozRepl add-on, otherwise, you need to integrate into it (not fun). ;[ It seems like I'm referring only to browser as to find multiple words but actually I can hardly believe it's intended to editors in first place.

Edit: lol thanks for the span tag, it's much versatile than font, etc..

I was refering to reading from a text file and then creating the html from that. Yes, the span tag is far better to use than the font tag is. <span style="color:#0000ff;font-weight:bold;font-style:italic;">some text</span>

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

#15 ·  Posted (edited)

O.K.

I wish I knew why 'content.document.location.href="java script: content.document=(\"Something\")"' doesn't work or why FF doesn't have _FFWriteHTML lol. Anyway, it works great with IE and anything else ;]

#include <FF.au3>
#include <IE.au3>
#include <WindowsConstants.au3>

Opt('GuiOnEventMode', 1)

$_FF_COM_TRACE = False  ; No Console debug output

HotKeySet('^+f', '_Find')
HotKeySet('{ESC}', '_EXIT')

Dim $hDLL = DllOpen('user32.dll')
Dim $sIEClassName = 'IEFrame'
Dim $sFFClassName = 'MozillaUIWindowClass'

Dim $hGUI = GUICreate('Find multiple words', 300, 250, Default, Default, BitOR($WS_POPUP, $WS_CAPTION), BitOR($WS_EX_TOOLWINDOW, $WS_EX_TOPMOST))
Dim $Inputs[4]
GUICtrlCreateButton('&Search', 115, 200, 70, 23)
GUICtrlSetOnEvent(-1, '_Search')

For $i = 0 To 3
    $Inputs[$i] = GUICtrlCreateInput('', 30, $i*30+25, 230, 25)
Next


While 1
    Sleep(100)
WEnd


Func _EXIT()
    GUIDelete($hGUI)
    DllClose($hDLL)
    Exit
EndFunc


Func _Find()
    GUISetState()
EndFunc


Func _Search()
    
    GUISetState(@SW_HIDE)
    Local $hActiveWnd = WinGetHandle('[ACTIVE]')
    Local $sWndClassName = _GetClassName($hActiveWnd)
    
    Switch $sWndClassName
        Case $sIEClassName
            _SearchIE($hActiveWnd)
            
        Case $sFFClassName
            _SearchFF($hActiveWnd)
        
        Case Else
            _SearchTxt()
    EndSwitch
    
EndFunc


Func _SearchIE($hwndIE)

    Local $o_IE = _IEAttach($hwndIE, 'HWND')
    If Not IsObj($o_IE) Then Return
    
    Local $sTitle = _IEPropertyGet($o_IE, 'title')
    Local $sHTML = _IEBodyReadHTML($o_IE)
    
    For $i = 0 To 3
        If GUICtrlRead($Inputs[$i]) <> '' Then
            $sHTML = StringRegExpReplace($sHTML, '(?i)(?:(<[^>]*+>)|(\Q' & GUICtrlRead($Inputs[$i]) & _
            '\E))', '\1<span style="background-color:#08FF58;font-style:italic;">\2</span>')
        EndIf
    Next
    
    
    _IEBodyWriteHTML($o_IE, $sHTML)
EndFunc


Func _SearchFF($hwndFF)
    Local $FF_Socket = _FFStart(Default, Default, 2)
        If $FF_Socket = -1 Then Return

    Local $sFile = @ScriptDir & '\HTML(tmp).html'
    Local $sTitle = WinGetTitle($hwndFF)
    Local $sHTML = _FFReadHTML($FF_Socket)
    
    $sTitle = StringRegExpReplace($sTitle, '(.*) -.*', '\1')
    
    For $i = 0 To 3
        If GUICtrlRead($Inputs[$i]) <> '' Then
            $sHTML = StringRegExpReplace($sHTML, '(?i)(?:(<[^>]*+>)|(\Q' & GUICtrlRead($Inputs[$i]) & _
            '\E))', '\1<span style="background-color:#38D878;font-style:italic;">\2</span>')
        EndIf
    Next
    $sHTML = StringRegExpReplace($sHTML, '(?i)<title>([^<]*+)', $sTitle, 1)
    
    FileWrite($sFile, $sHTML)
    $sFile = StringRegExpReplace($sFile, '\\', '/')
    _FFTabAdd($FF_Socket, 'file:///' & $sFile)
    Sleep(1000)
    FileDelete($sFile)
    _FFDisConnect($FF_Socket)
EndFunc


Func _SearchTxt()
    
    Local $Clipboard
    Local $sFile = @ScriptDir & '\HTML(tmp).html'
    Local $sTitle = WinGetTitle('[ACTIVE]')
    
    $sTitle = StringRegExpReplace($sTitle, '(.*) -.*', '\1')
    
    Send('^a^c')
    $Clipboard = ClipGet()
    
    For $i = 0 To 3
        If GUICtrlRead($Inputs[$i]) = '' Then ContinueLoop
        $Clipboard = StringRegExpReplace($Clipboard, '(?i)(\Q' & GUICtrlRead($Inputs[$i]) & _
           '\E)', '<span style="background-color:#BCD0ED;color:#0000FF;font-weight:bold' & _
           ';font-style:italic;"><b>\1</b></span>')
    Next
       
    $Clipboard = StringRegExpReplace($Clipboard, '\r\n', '<BR>')
    $Clipboard = '<html><head><title>' & $sTitle & '</title></head><body bgcolor=#E4EAF2>' & _
                 '<span style="font-face:verdana">' & $Clipboard & '</span></body></html>'
    
    FileWrite($sFile, $Clipboard)
    ShellExecute($sFile)
    Sleep(1000)
    FileDelete($sFile)
    ClipPut('')
EndFunc


Func _GetClassName($hWnd)
    If Not IsHWnd($hWnd) Then Return ''
    Local $Ret = DllCall($hDLL, 'int', 'GetClassName', 'hwnd', $hWnd, 'str', '', _
                         'int', 80)
    
    If Not @error Then Return $Ret[2]
    Return ''
EndFunc
Edited by Authenticity

Share this post


Link to post
Share on other sites

@Authenticity:

You can write into FF, too. E.g.

_FFOpenURL($Socket,"about:blank")
_FFSetGet($Socket,".body.innerHTML = '<b>test</b>';")

Share this post


Link to post
Share on other sites

Actually I guess I don't understand what is the whole intent of the program. You could use 'StringReplace' on files and get the count from @extended if you are just counting each string..

Have you tried running the program and doing a search on more than word?

Share this post


Link to post
Share on other sites

To clarify what this program does:

The main reason I came up with this is because of an open book and notes test. I had my notes for this class in a 60 page word document and found that a simple CTRL+F would require many "Find Next" clicks and reading around the general area found to see if the notes deal specifically with what I'm looking for. So I thought of making this multiple word finder. I also wanted it to work with any text, which is why it has nothing with finding things in HTML, just plain text. Here's an example:

Say I browsed to: http://en.wikipedia.org/wiki/Car

Then in my program I searched for: motor, vehicle, car

Here's what the top 4 results would be:

Result 1, 47.22% Match:

motor car is a wheeled motor vehicle

Result 2, 56.67% Match:

car is a wheeled motor vehicle

Result 3, 21.52% Match:

vehicle for transporting passengers, which also carries its own engine or motor

Result 4, 1.63% Match:

carts) powered by clumsy internal combustion engines.[8]In November 1881 French inventor Gustave Trouvé demonstrated a working three-wheeled automobile that was powered by electricity. This was at the International Exhibition of Electricity in Paris.[9]Although several other German engineers (including Gottlieb Daimler, Wilhelm Maybach, and Siegfried Marcus) were working on the problem at about the same time, Karl Benz generally is acknowledged as the inventor of the modern automobile.[8]An automobile powered by his own four-stroke cycle gasoline engine was built in Mannheim, Germany by Karl Benz in 1885 and granted a patent in January of the following year under the auspices of his major company, Benz & Cie., which was founded in 1883. It was an integral design, without the adaptation of other existing components and including several new technological elements to create a new concept. This is what made it worthy of a patent. He began to sell his production vehicles in 1888.Karl BenzA photograph of the original Benz Patent motor

Does it all make sense now? My program shows in what context all searched words are most closely together, it has some glitches with carts being a result of car, but that can be fixed.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0