Jump to content

find a match with stringregexp


Recommended Posts

I am trying to grab "NOLNB01" from "<TD vAlign=top width="8%"><FONT face=Arial size=2>CN=NOLNB01/O=CAAL</FONT></TD>"

and "2422C134" from "<TD vAlign=top width="10%"><FONT face=Arial size=2>2422C134</FONT></TD>"

"NOLNB01" and "2422C134" will vary in values but i think they will always be the 3rd and 4th instances of when "<TD vAlign=top..." is mentioned

heres what i have so far-which doesnt quite work (tried to just get the values themselves not sure how to get # instances):

$server_array = StringRegExp($lines, '(?i)(?s)(<TD vAlign=top width=\W.\d*%\W\W<FONT face=Arial size=2>)', 3)
$pin_array = StringRegExp($lines, '(?i)(?s)(<TD vAlign=top width=\W.\d*%\W\W<FONT face=Arial size=2>)', 3)

$server = $server_array[2]
$pin = $pin_array[3]

<FORM name=_DominoForm action=/pro.nsf/bessearch?SearchView&amp;Query=FIELD%20UserInits%3D&amp;Seq=1 method=post><INPUT type=hidden value=0 name=__Click>

<TABLE>

<TBODY>

<TR>

<TD width="80%"><LINK href="http://sn1.roup.com/sime/stlinks/stlinks.css" type=text/css rel=stylesheet>

<script src="http://sn1.roup.com/sime/stlinks/stlinks.js"></SCRIPT>

<script>setSTLinksURL("http://sn1.roup.com/sime/stlinks");</SCRIPT>

<script src="http://sn1.roup.com/sime/stlinks/en/res.js"></SCRIPT>

<script src="http://sn1.roup.com/sime/stlinks/hostInfo.js"></SCRIPT>

<script>writeSTLinksApplet("", ""); </SCRIPT>

<SPAN style="LEFT: 0px; VISIBILITY: hidden; POSITION: absolute; TOP: 0px"><APPLET codeBase=http://sn1.roup.com/sime/stlinks height=10 width=10 code=com.lotus.sime.stlinks.client.STLinksApplet.class name=STLinksApp MAYSCRIPT><PARAM NAME="_cx" VALUE="265"><PARAM NAME="_cy" VALUE="265"></APPLET></SPAN> </TD>

<TD align=middle width="20%"><BUTTON onclick=window.print()>Print</BUTTON>&nbsp;<BUTTON onclick=window.close()>Close</BUTTON> </TD></TR></TBODY></TABLE>

<TABLE>

<TBODY>

<TR>

<TD width="10%"><B><FONT face=Arial size=2>Initials</FONT></B></TD>

<TD width="8%"><B><FONT face=Arial size=2>Username</FONT></B></TD>

<TD width="8%"><B><FONT face=Arial size=2>Server</FONT></B></TD>

<TD width="15%"><B><FONT face=Arial size=2>PIN</FONT></B></TD>

<TD width="20%"><B><FONT face=Arial size=2>Company Name</FONT></B></TD>

<TD width="34%"><B><FONT face=Arial size=2>Department</FONT></B></TD></TR>

<P>

<TR>

<TD vAlign=top width="10%"><FONT face=Arial size=2><A href="/pro.nsf/crswebsearchl/1CA2A393C8C686F0882576E100478535?OpenDocument" target=_top>MKHG</A></FONT></TD>

<TD vAlign=top width="8%"><FONT face=Arial size=2>CN=Mike Ike/OU=CDS/OU=CG/O=CAAL</FONT></TD>

<TD vAlign=top width="8%"><FONT face=Arial size=2>CN=NOLNB01/O=CAAL</FONT></TD>

<TD vAlign=top width="10%"><FONT face=Arial size=2>2422C134</FONT></TD>

<TD vAlign=top width="15%"><FONT face=Arial size=2>DEC</FONT></TD>

<TD vAlign=top width="34%"><FONT face=Arial size=2>MaDelivery</FONT></TD></TR></P></TBODY></TABLE><BR>

<META http-equiv=Pragma content=no-cache>

<META http-equiv=Expires content=-1></FORM>

Edited by gcue
Link to comment
Share on other sites

Ok, so this is dry coded, so bear with any errors that will inevitably be in here:

$sourceCode ;assumed to have your source code text in here

$sourceCode = StringReplace ($sourceCode, @CRLF, @LF)
Local $sourceArray = StringSplit ($sourceCode , @LF)

Local $firstVal, $secondVal
; Implement a simple state based machine
Local $state = 0
; state -1 = error
; state 0 = seeking
; state 1,2 = found first and second "<TD vAlign"
; state 3 = found third value of "<TD vAlign", which should be the first value (NOLNB01 in the example)
; state 4 = found fourth value of "<TD vAlign", which should be the second value (2422C134 in the example)
; state 5 = sucessfull Exit
For $line = 1 to $sourceArray[0]
    if $state = 0 then
        if StringInStr ($sourceArray[$line], "<TD vAlign") Then state = 1 ;flag as found and ignore this line
    ElseIf $state = 1 or $state = 2 Then
        if StringInStr ($sourceArray[$line], "<TD vAlign") Then 
            state += 1 ;flag as found and ignore this line, incrementing the count
        else
            state = 0 ;flag as end of possible input and continue seeking
        EndIf
    ElseIf $state = 3 Then
        if StringInStr ($sourceArray[$line], "<TD vAlign") Then 
            $firstVal = StringRegExpReplace ($sourceArray[$line], "<[^>]*>CN=([/]+).*", "\1") ; extract the text we want
            state += 1 ;flag as found, incrementing the count
        else
            state = 0 ;flag as end of possible input and continue seeking
        EndIf
    ElseIf $state = 4 Then
        if StringInStr ($sourceArray[$line], "<TD vAlign") Then 
            $secondVal = StringRegExpReplace ($sourceArray[$line], "<[^>]*>([/]+).*", "\1") ; extract the text we want
            state += 1 ;flag as found, incrementing the count
        else
            state = -1 ;flag as Error and Bail
        EndIf
    Else
        ExitLoop
    EndIf
Next

If $state = 5 Then
    MsgBox (0, "Found Values", $firstVal & @LF & $secondVal)
ElseIf $state = -1 Then
    MsgBox (0, "ERROR", $firstVal & "found, but matching value not found")
Else
    MsgBox (0, "ERROR", "Unknown error")
EndIf

#fgpkerw4kcmnq2mns1ax7ilndopen (Q, $0); while ($l = <Q>){if ($l =~ m/^#.*/){$l =~ tr/a-z1-9#/Huh, Junketeer's Alternate Pro Ace /; print $l;}}close (Q);[code] tag ninja!

Link to comment
Share on other sites

Ok, so this is dry coded, so bear with any errors that will inevitably be in here:

You'll have to troubleshoot this one, I'm not really anywhere I can troubleshoot this one atm.

#fgpkerw4kcmnq2mns1ax7ilndopen (Q, $0); while ($l = <Q>){if ($l =~ m/^#.*/){$l =~ tr/a-z1-9#/Huh, Junketeer's Alternate Pro Ace /; print $l;}}close (Q);[code] tag ninja!

Link to comment
Share on other sites

Avoid trying to match the exact values in TDs. The website might change some detail (e.g. font name or size, or centering, width...) and you're off sync.

Also there are many sites that will insert spurious whitespaces (CR and/or LF included), mostly between markups. I've found that it's good to allow he possibility, even if it's remote.

BTW I don't know how or why what results of a typical SQL query will sometimes cause whitespaces to be inserted more or less randomly. This is bit of a pain.

Try these patterns, assuming that CN= is a constant:

$lines = '<FORM name=_DominoForm action=/pro.nsf/bessearch?SearchView&amp;Query=FIELD%20UserInits%3D&amp;Seq=1 method=post><INPUT type=hidden value=0 name=__Click>' & _
'<TABLE><TBODY><TR><TD width="80%"><LINK href="http://sn1.roup.com/sime/stlinks/stlinks.css" type=text/css rel=stylesheet><script src="http://sn1.roup.com/sime/stlinks/stlinks.js"></SCRIPT>' & _
'<script>setSTLinksURL("http://sn1.roup.com/sime/stlinks");</SCRIPT><script src="http://sn1.roup.com/sime/stlinks/en/res.js"></SCRIPT><script src="http://sn1.roup.com/sime/stlinks/hostInfo.js"></SCRIPT>' & _
'<script>writeSTLinksApplet("", ""); </SCRIPT><SPAN style="LEFT: 0px; VISIBILITY: hidden; POSITION: absolute; TOP: 0px"><APPLET codeBase=http://sn1.roup.com/sime/stlinks height=10 width=10 code=com.lotus.sime.stlinks.client.STLinksApplet.class name=STLinksApp MAYSCRIPT><PARAM NAME="_cx" VALUE="265"><PARAM NAME="_cy" VALUE="265"></APPLET></SPAN> </TD>' & _
'<TD align=middle width="20%"><BUTTON onclick=window.print()>Print</BUTTON>&nbsp;<BUTTON onclick=window.close()>Close</BUTTON> </TD></TR></TBODY></TABLE><TABLE><TBODY><TR>' & _
'<TD width="10%"><B><FONT face=Arial size=2>Initials</FONT></B></TD><TD width="8%"><B><FONT face=Arial size=2>Username</FONT></B></TD><TD width="8%"><B><FONT face=Arial size=2>Server</FONT></B></TD>' & _
'<TD width="15%"><B><FONT face=Arial size=2>PIN</FONT></B></TD><TD width="20%"><B><FONT face=Arial size=2>Company Name</FONT></B></TD><TD width="34%"><B><FONT face=Arial size=2>Department</FONT></B></TD></TR>' & _
'<P><TR><TD vAlign=top width="10%"><FONT face=Arial size=2><A href="/pro.nsf/crswebsearchl/1CA2A393C8C686F0882576E100478535?OpenDocument" target=_top>MKHG</A></FONT></TD>' & _
'<TD vAlign=top width="8%"><FONT face=Arial size=2>CN=Mike Ike/OU=CDS/OU=CG/O=CAAL</FONT></TD><TD vAlign=top width="8%"><FONT face=Arial size=2>CN=NOLNB01/O=CAAL</FONT></TD>' & _
'<TD vAlign=top width="10%"><FONT face=Arial size=2>2422C134</FONT></TD><TD vAlign=top width="15%"><FONT face=Arial size=2>DEC</FONT></TD><TD vAlign=top width="34%"><FONT face=Arial size=2>MaDelivery</FONT></TD></TR></P></TBODY></TABLE><BR>' & _
'<META http-equiv=Pragma content=no-cache><META http-equiv=Expires content=-1></FORM>'
Local $ofs = 0, $server, $pin
$server = StringRegExp($lines, '(?i)(?s)</TABLE>.*</TR>.*</TD>.*</TD>\s*<TD [^>]*>\s*<FONT [^>]*>CN=([^/]*)/', 1, $ofs)
If Not @error Then
    $ofs = @extended
    $server = $server[0]
    $pin = StringRegExp($lines, '(?i)(?s)<TD [^>]*>\s*<FONT [^>]*>([^<]*)<', 1, $ofs)
    If Not @error Then
        $pin = $pin[0]
        ConsoleWrite("Computer name = " & $server & " pin = " & $pin & @LF)
    Else            ; error processing
    EndIf
Else                ; error processing
EndIf

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Of course it works ... today. The problem with matching such text in an ocean of markups is that the design may change a little (or worse, depend on the server answering you) and defeat your previous analysis/regexp without any notice. That's why staying away of trying to match specific parts make this kind of search a little less fragile (hint: I don't write "more robust"!) So don't omit serious error checking and further validation od the extracted data.

I guess I should launch yet another GIYF (Google is your friend), but look for a wonderful tool called RegExCoach (my favorite), Regex Guru site (advanced), here also and zillions other places.

EDIT: one mostly need to practice regexps, above all.

Edited by jchd

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...