Jump to content

Retrieving certain text from a bunch of text


Recommended Posts

Just curious if this CAN be done.

Say my example is:

Bob: Hey there how are you doing today?

Don: Great-o!

Is there any way of getting JUST those names? Because using WinGetText would get the entire thing... So how would you separate it then to get down to just those names IF it doesn't know what its looking for (It doesn't know the names Bob and Don, it wants to find that out)

(Of course the text would be much larger and more complicated, but if I could get the base idea down, I can branch off of that.)

Link to comment
Share on other sites

You wanna get info from messenger windows ? You can use the ":" as a referencing point.

I'd get the whole text, split it to individual lines, and check each one to find ":" or something.

EDIT: Yeah, kinda like Authenticity posted below, but line wrapping can screw things up.

Edited by Inverted
Link to comment
Share on other sites

Depending on the expected data. If the names are anchored to the beginning of a new line and are follow by a colon then you can match them like this:

Dim $sText = "Bob: Hey there how are you doing today?" & @CRLF & _
             "Don: Great-o!"

Dim $sPattern = "(?m)^([^:]*+):"
Dim $aNames = StringRegExp($sText, $sPattern, 3)

If IsArray($aNames) Then
    For $i = 0 To UBound($aNames)-1
        ConsoleWrite($aNames[$i] & @CRLF)
    Next
EndIf

But if the data is not persistent or something like this occurs then you'll need to find a unique pattern that otherwise won't match false matches:

Bob: Hey there how are you doing today?
Don: Great-o! Bob, can you see this nice ->
colon:

:)

Link to comment
Share on other sites

Just curious if this CAN be done.

Say my example is:

Is there any way of getting JUST those names? Because using WinGetText would get the entire thing... So how would you separate it then to get down to just those names IF it doesn't know what its looking for (It doesn't know the names Bob and Don, it wants to find that out)

(Of course the text would be much larger and more complicated, but if I could get the base idea down, I can branch off of that.)

What exactly do you mean with "Get" those names? If you just want to find certain text, for example Bob, try using:

$Something = StringRegExp($Text, "Bob", 1)

If @error == 0 Then

MsgBox( 0, "", $Something [0] )

EndIf

Link to comment
Share on other sites

(sorry for the delayed response... Been playing Battlefield 1943 :) )

The names I need to get is on a huge list from this one company I'm doing some work for. I need to send them all an email, but typing or even copying and pasting EACH AND EVERY one name would take hours... Thats why I wanted to try making something that would get those names for me.

Theres 2 ways to do it:

1. Somehow get the names off the page itself. It has the people's names and you can click on them and it will bring you to an email page, blah blah... Kinda like this sites Members list sorta. But not sure how you could read a column straight down, and etc.

2. Go to View > Page Source on your browser... Then you would be at the html coding... Then you can get the exact same coding for each name on the page. Heres an actual example of a snippet of coding for one persons name on the page:

</tr><tr align="center">
    <td class="alt1Active" align="left" id="u94878">
        <a href="member.php?u=94878">! Fear !</a>
        <div class="smallfont">Registered User</div>

    </td>



--------- And another person... (exact same coding)


</tr><tr align="center">
    <td class="alt1Active" align="left" id="u115966">
        <a href="member.php?u=115966">! Deeplay !</a>
        <div class="smallfont">Registered User</div>

    </td>

But there might a problem... The html 'tags' in those surely can be found throughout the rest of the pages coding. So just searching for </a> would get you mixed results. The only other way I can think of is read the entire paragraph of coding, THEN go from there. (Because the member.php also wouldn't work because the #'s are different.)

Link to comment
Share on other sites

Read the help file for examples of using StringRegExp() function or any of the string manipulation functions.

Dim $sHTML = '</tr><tr align="center">' & @CRLF & _
                '<td class="alt1Active" align="left" id="u94878">' & @CRLF & _
                    '<a href="member.php?u=94878">! Fear !</a>' & @CRLF & _
                    '<div class="smallfont">Registered User</div>' & @CRLF & _
                '</td>' & @CRLF & @CRLF & _
            '</tr><tr align="center">' & @CRLF & _
                '<td class="alt1Active" align="left" id="u115966">' & @CRLF & _
                    '<a href="member.php?u=115966">! Deeplay !</a>' & @CRLF & _
                '<div class="smallfont">Registered User</div>' & @CRLF & _
                '</td>'

Dim $sPatt = '(?s)(?i)<td.*?id="([^"]*).*?<a.*?href="([^"]*)[^>]*>([^<]*)'
Dim $aMatch = StringRegExp($sHTML, $sPatt, 3)

If IsArray($aMatch) Then
    For $i = 0 To UBound($aMatch)-1 Step 3
        Local $ID = $aMatch[$i]
        Local $Profile = $aMatch[$i+1]
        Local $Username = $aMatch[$i+2]
        
        MsgBox(0x40, 'User Info:', 'Link: ' & $Profile & @CRLF & 'User ID: ' & $ID & @CRLF & 'User Name: ' & $Username)
    Next
EndIf
Link to comment
Share on other sites

Read the help file for examples of using StringRegExp() function or any of the string manipulation functions.

Dim $sHTML = '</tr><tr align="center">' & @CRLF & _
                '<td class="alt1Active" align="left" id="u94878">' & @CRLF & _
                    '<a href="member.php?u=94878">! Fear !</a>' & @CRLF & _
                    '<div class="smallfont">Registered User</div>' & @CRLF & _
                '</td>' & @CRLF & @CRLF & _
            '</tr><tr align="center">' & @CRLF & _
                '<td class="alt1Active" align="left" id="u115966">' & @CRLF & _
                    '<a href="member.php?u=115966">! Deeplay !</a>' & @CRLF & _
                '<div class="smallfont">Registered User</div>' & @CRLF & _
                '</td>'

Dim $sPatt = '(?s)(?i)<td.*?id="([^"]*).*?<a.*?href="([^"]*)[^>]*>([^<]*)'
Dim $aMatch = StringRegExp($sHTML, $sPatt, 3)

If IsArray($aMatch) Then
    For $i = 0 To UBound($aMatch)-1 Step 3
        Local $ID = $aMatch[$i]
        Local $Profile = $aMatch[$i+1]
        Local $Username = $aMatch[$i+2]
        
        MsgBox(0x40, 'User Info:', 'Link: ' & $Profile & @CRLF & 'User ID: ' & $ID & @CRLF & 'User Name: ' & $Username)
    Next
EndIf

Seems like this is what I need... I'm not quite this 'advanced' in my coding yet though so I do have some questions :) .

Could you explain in 'noob terms' what this means?

For $i = 0 To UBound($aMatch)-1 Step 3

Why do you have a '-1' there?

And also... Why is it Step 3? Whats the 3's significance? (Probably a noob question ><)

And one last one if you don't mind xD.

Local $ID = $aMatch[$i]
        Local $Profile = $aMatch[$i+1]
        Local $Username = $aMatch[$i+2]

Why do you do $i, $i+1, $i+2 ? >< I understand thats in the array, but how is the username set to $i+2 from $aMatch that was defined up above in the script?

[i won't even bother asking how you got $sPatt... I did look in the helpfile and its all there it seems like... But it's too much of a jumble of letters/symbols and everything xD]

Correct me if I'm wrong... I would now need to do a WinGetText and have ALL the text it gets stored into $sHtml correct? *Edit out*

Thanks a bunch :)

Edited by UnknownWarrior
Link to comment
Share on other sites

Ok. A lot of your questions will be answered after I tell you what Ubound() is. Ubound tells you the number of elements in an array. So...

$A[0] = "test0"
$A[1] = "test1"
$A[2] = "test2"
$A[3] = "test3"
$A[4] = "test4"
$A[5] = "test5"
$A[6] = "test6"

For $x = 0 To Ubound($A) - 1

The -1 is used because we want to use each element in the array but we can't exceed that number. Ubound will return 7 because there are 7 elements but we have to use 6 because if $A[7] was used it would return an error. The Step 3 means that $i will increase by 3 every time it loops. $i+1 and $i+2 are just used to cover the parts of the array that aren't covered by $i by itself. Sorry if I didn't explain this very well. Look in the help file. It is all there. Ubound is a function. Step is a keyword.

Link to comment
Share on other sites

Nope, if I'm not wrong the data is already string formatted and thus the @CRLF which are usually \r\n in almost every Microsoft window application so you can expect the data to be a long string with carriage-return and line-feed here and there without having to deal with these technical subtlety.

About the $i+1 and $i+2 thing I think you may want to start with AutoIt 1-2-3 because I see you aren't yet accustomed with the language and it's features to jump right into these "advanced" things. If you'll read the names of the variable and understand the StringRegExp() function you'll understand what are capturing groups and where everything is going in relation to a match or not. :) ...sorry for long boring post.

Link to comment
Share on other sites

Nope, if I'm not wrong the data is already string formatted and thus the @CRLF which are usually \r\n in almost every Microsoft window application so you can expect the data to be a long string with carriage-return and line-feed here and there without having to deal with these technical subtlety.

About the $i+1 and $i+2 thing I think you may want to start with AutoIt 1-2-3 because I see you aren't yet accustomed with the language and it's features to jump right into these "advanced" things. If you'll read the names of the variable and understand the StringRegExp() function you'll understand what are capturing groups and where everything is going in relation to a match or not. :) ...sorry for long boring post.

Not if you use WinGetText. It would just reply to you with the text of the window (der xD)... Why would it write @CRLF and the ' ' in there for you? ><

And I've just never got into this string stuff, that's why I have no idea how to use StringRegExp... I see what all those characters mean from the HelpFile, but it's just over-complicated having them all mushed together in one line ><.

Link to comment
Share on other sites

Heh, there is difference between strings and strings literal but it's not important for the RegExp pattern because it's not relying on these things. Typing '1' and string which is '1' in SciTE editor means different things because both interpreted into two different things but it's the basics so I guess you should read the AutoIt 1-2-3 link I've posted before, or go to the Examples Forum, it's sticky.

Link to comment
Share on other sites

Go to SciTE editor and type:

Dim $sString1 = '1'
Dim $sString2 = "'1'"

MsgBox(0, '', $sString1 & @TAB & $sString2)

and watch the difference. If the data you're reading into a variable has " or ' in it it won't break the functionality shown so far because both are part of the strings, it's not by any means interpreted into something special other than a string by AutoIt engine, with that specific script.

You can save the original data you've had posted before mine and read it into variable and use the remainder of the script and see that the original data doesn't need to include any special @CRLF or '.

Link to comment
Share on other sites

Go to SciTE editor and type:

Dim $sString1 = '1'
Dim $sString2 = "'1'"

MsgBox(0, '', $sString1 & @TAB & $sString2)

and watch the difference. If the data you're reading into a variable has " or ' in it it won't break the functionality shown so far because both are part of the strings, it's not by any means interpreted into something special other than a string by AutoIt engine, with that specific script.

You can save the original data you've had posted before mine and read it into variable and use the remainder of the script and see that the original data doesn't need to include any special @CRLF or '.

Ok ok I getcha... So the ' ' and the @CRLF wouldn't even been needed (I tested it myself as well before I saw this).

Now I am running into the last and final problem (I hope ><)...

I searched it up, and not even SmOke_N could even figure it out as it seems.

http://www.autoitscript.com/forum/index.php?showtopic=29989&st=0&p=214280&hl=WinGetText%20problem&fromsearch=1&#entry214280

I'm having the EXACT same problem as that guy. I can get the Window name in every form I want... But whenever I try getting the text from it... I get nothing... I tried a MsgBox, its blank... I tried writing the text in an .ini, its blank.

I also copied and pasted the entire page of coding that I'm reading the names from and put it in a notepad file to check the size of it (So it wasn't over 64 KB)... And it wasn't.

Maybe you guys can spot something SmOke couldn't? ><

Link to comment
Share on other sites

Some controls or windows cannot be automated because it's not responding or handling the messages a generic control usually handles. Like owner-drawn controls or it's sufficient just to subclass the control and return nothing when handling the WM_TEXT or WM_GETTEXT messages without calling the default control procedure. If the function call was perfectly correct, then it's not possible to automate the control in this way (it may be automate-able in other phases though) and it's expected that AutoIt Window Info would fail too.

Anyway, this approach is also not optimal because you can use _INetGetSource() or HTTP request (internally by _INetGetSource) to request the page content.

Edit: Here, a simple example, hope you understand anything to get the idea :)

#include <Constants.au3>
#include <WinAPI.au3>
#include <WindowsConstants.au3>

Global Const $tagPAINTSTRUCT = _
    'hwnd hdc;' & _
    'int fErase;' & _
    $tagRECT & _
    ';int fRestore;' & _
    'int fIncUpdate;' & _
    'byte rgbReserved[32];'

Global $hGUI, $Label
Global $hFunc, $pFunc, $hWndProc
Global $hLabel

$hGUI = GUICreate('AutoIt YAY', 100, 100)
$Label = GUICtrlCreateLabel('', 0, 0, 100, 100)
$hLabel = GUICtrlGetHandle($Label)

GUISetBkColor(0xFFFFFF)
GUISetState()
$hFunc = DllCallbackRegister('LabelProc', 'lresult', 'hwnd;uint;wparam;lparam')
$pFunc = DllCallbackGetPtr($hFunc)
$hWndProc = _WinAPI_SetWindowLong($hLabel, $GWL_WNDPROC, $pFunc)
_WinAPI_InvalidateRect($hLabel)

Do
Until GUIGetMsg() = -3
GUIDelete()
DllCallbackFree($hFunc)
Exit

Func LabelProc($hWnd, $iMsg, $iwParam, $ilParam)
    Local $tBuff, $pBuff, $iBuff
    Local $hDC, $tPS, $pPS
    
    Switch $iMsg
        Case $WM_PAINT
            $tPS = DllStructCreate($tagPAINTSTRUCT)
            $pPS = DllStructGetPtr($tPS)
            $hDC = _WinAPI_BeginPaint($hWnd, $pPS)
            If $hDC Then
                $tBuff = DllStructCreate('wchar[10]')
                $pBuff = DllStructGetPtr($tBuff)
                DllStructSetData($tBuff, 1, '123456789')
                $iLen = 9
        
                DllCall('gdi32.dll', 'int', 'TextOutW', 'hwnd', $hDC, 'int', 3, 'int', 3, 'ptr', $pBuff, 'int', $iLen)
                _WinAPI_EndPaint($hWnd, $pPS)
            EndIf
            
        Case $WM_GETTEXTLENGTH
            Return 0
            
        Case $WM_GETTEXT
            Return 0
    EndSwitch
    
    Return _WinAPI_CallWindowProc($hWndProc, $hWnd, $iMsg, $iwParam, $ilParam)
EndFunc

Func _WinAPI_BeginPaint($hWnd, $pPS)
    Local $aResult
    
    $aResult = DllCall('user32.dll', 'hwnd', 'BeginPaint', 'hwnd', $hWnd, 'ptr', $pPS)
    If @error Then Return SetError(1, 0, 0)
    Return $aResult[0]
EndFunc

Func _WinAPI_EndPaint($hWnd, $pPS)
    Local $aResult
    
    $aResult = DllCall('user32.dll', 'hwnd', 'EndPaint', 'hwnd', $hWnd, 'ptr', $pPS)
    If @error Then Return SetError(1, 0, 0)
    Return $aResult[0]
EndFunc
Edited by Authenticity
Link to comment
Share on other sites

Some controls or windows cannot be automated because it's not responding or handling the messages a generic control usually handles. Like owner-drawn controls or it's sufficient just to subclass the control and return nothing when handling the WM_TEXT or WM_GETTEXT messages without calling the default control procedure. If the function call was perfectly correct, then it's not possible to automate the control in this way (it may be automate-able in other phases though) and it's expected that AutoIt Window Info would fail too.

Anyway, this approach is also not optimal because you can use _INetGetSource() or HTTP request (internally by _INetGetSource) to request the page content.

Edit: Here, a simple example, hope you understand anything to get the idea :)

#include <Constants.au3>
#include <WinAPI.au3>
#include <WindowsConstants.au3>

Global Const $tagPAINTSTRUCT = _
    'hwnd hdc;' & _
    'int fErase;' & _
    $tagRECT & _
    ';int fRestore;' & _
    'int fIncUpdate;' & _
    'byte rgbReserved[32];'

Global $hGUI, $Label
Global $hFunc, $pFunc, $hWndProc
Global $hLabel

$hGUI = GUICreate('AutoIt YAY', 100, 100)
$Label = GUICtrlCreateLabel('', 0, 0, 100, 100)
$hLabel = GUICtrlGetHandle($Label)

GUISetBkColor(0xFFFFFF)
GUISetState()
$hFunc = DllCallbackRegister('LabelProc', 'lresult', 'hwnd;uint;wparam;lparam')
$pFunc = DllCallbackGetPtr($hFunc)
$hWndProc = _WinAPI_SetWindowLong($hLabel, $GWL_WNDPROC, $pFunc)
_WinAPI_InvalidateRect($hLabel)

Do
Until GUIGetMsg() = -3
GUIDelete()
DllCallbackFree($hFunc)
Exit

Func LabelProc($hWnd, $iMsg, $iwParam, $ilParam)
    Local $tBuff, $pBuff, $iBuff
    Local $hDC, $tPS, $pPS
    
    Switch $iMsg
        Case $WM_PAINT
            $tPS = DllStructCreate($tagPAINTSTRUCT)
            $pPS = DllStructGetPtr($tPS)
            $hDC = _WinAPI_BeginPaint($hWnd, $pPS)
            If $hDC Then
                $tBuff = DllStructCreate('wchar[10]')
                $pBuff = DllStructGetPtr($tBuff)
                DllStructSetData($tBuff, 1, '123456789')
                $iLen = 9
        
                DllCall('gdi32.dll', 'int', 'TextOutW', 'hwnd', $hDC, 'int', 3, 'int', 3, 'ptr', $pBuff, 'int', $iLen)
                _WinAPI_EndPaint($hWnd, $pPS)
            EndIf
            
        Case $WM_GETTEXTLENGTH
            Return 0
            
        Case $WM_GETTEXT
            Return 0
    EndSwitch
    
    Return _WinAPI_CallWindowProc($hWndProc, $hWnd, $iMsg, $iwParam, $ilParam)
EndFunc

Func _WinAPI_BeginPaint($hWnd, $pPS)
    Local $aResult
    
    $aResult = DllCall('user32.dll', 'hwnd', 'BeginPaint', 'hwnd', $hWnd, 'ptr', $pPS)
    If @error Then Return SetError(1, 0, 0)
    Return $aResult[0]
EndFunc

Func _WinAPI_EndPaint($hWnd, $pPS)
    Local $aResult
    
    $aResult = DllCall('user32.dll', 'hwnd', 'EndPaint', 'hwnd', $hWnd, 'ptr', $pPS)
    If @error Then Return SetError(1, 0, 0)
    Return $aResult[0]
EndFunc

Bleh, your right, I did not understand your script too well... But I did find this and read it up (The INetGetSource)

Still though, no reply back, just a blank.

#include <INet.au3>

Sleep(5000)
Global $Title = WinGetTitle("[active]")

Dim $sHTML = _INetGetSource($Title)

MsgBox(0, "", $Title)

The MsgBox gives me the title correctly as it should. When I switch $Title with $sHTML in the MsgBox... Comes out blank again :).

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...