Jump to content

StringRegExp and pulling from same HTML, different results


Zehnpai
 Share

Recommended Posts

I need a little help figuring out why my expression doesn't work...

I have three same model switches from two clients.  Different firmware versions but the HTML is the same.

Here's from Customer A's switch.

<tr>
<td class="head">Software Version</td>
<td class="data">PB.03.04</td>
</tr>

And the two from Customber B (same switches though one is on a different firmware version)

 
Switch 1:
<tr>
<td class="head">Software Version</td>
<td class="data">PB.03.04</td>
</tr>

Switch 2:

<tr>
<td class="head">Software Version</td>
<td class="data">PA.03.04</td>
</tr>
Long story short each one is buried in a separate larger HTML document and I need to pull the version number, the "PA.03.04" or "PB.03.04" in this case.
 
I was using
$Arr = StringRegExp(_IEBodyReadHTML($oIE), '(?i)Software Version.*\n<td class=.*>(\S*)<',1)
$ipArray[$i][1] = $Arr[0]

to grab the firmware version for the first client and it works fine.  However, when I use it for the two switches for my second client, it returns empty white space.  It doesn't error out so I know it's grabbing something, that something just appears to be nothing.  I tried changing the flag to '2' and it returns everything from "Software Version" on to the end of the page so I know it's at least starting in the right place.  It also didn't show any of the quotation marks in the HTML source.

So I figured there was just something wrong with my expression.  I tried this instead:

local $Arr = StringRegExp(_IEBodyReadHTML($oIE), '(?i)(?s)Software.*class=data>(\S*)</td>',1)
$ipArray[$i][1] = $Arr[0]

And it gives me the MAC address from over a dozen lines lower.

<td class="head">Software Version</td>
<td class="data">PA.03.04</td>
</tr>
<tr>
<td class="head">Serial Number</td>
<td class="data">XXXXXXXX</td>
</tr>
<tr><td>&nbsp;</td></tr>
<tr>
<td class="mainhead" colspan="2">Address Information</td>
</tr>
<tr>
<td class="head">Management VLAN</td>
<td class="data">1</td>
</tr>
<tr>
<td class="head">IP Address</td>
<td class="data">XXX.XXX.XXX.XXX</td>
</tr>
<tr>
<td class="head">Subnet Mask</td>
<td class="data">XXX.XXX.XXX.XXX</td>
</tr>
<tr>
<td class="head">Gateway IP Address</td>
<td class="data">XXX.XXX.XXX.XXX</td>
</tr>
<tr>
<td class="head">MAC Address</td>
<td class="data">XX-XX-XX-XX-XX-XX</td>

Any idea what I'm doing wrong?  Is it the expression?  Am I going about this the wrong way?  These switches are cheap and don't support CLI but I have to go through roughly 100+ switches pulling the firmware version every few weeks.  I've figured out every switch type except these last three procurves and it's driving me batty.

I can post more of the code or HTML if necessary.  Post was just getting long enough as is.

Thanks in advance!

 
Edited by Zehnpai
Link to comment
Share on other sites

This ?

#Include <Array.au3>

$txt = StringRegExpReplace(FileRead(@scriptfullpath), '(?s).+#cs(.+)#ce.*', "$1")   ; reads the comment

local $Arr = StringRegExp($txt, '(?is)Software\h+Version.+?data">([^<]+)', 3)
_ArrayDisplay($Arr)

;===================================
#cs
<td class="head">Software Version</td>
<td class="data">PA.03.04</td>
</tr>
<tr>
<td class="head">Serial Number</td>
<td class="data">XXXXXXXX</td>
</tr>
<tr><td>&nbsp;</td></tr>
<tr>
<td class="mainhead" colspan="2">Address Information</td>
</tr>
<tr>
<td class="head">Management VLAN</td>
<td class="data">1</td>
</tr>
<tr>
<td class="head">IP Address</td>
<td class="data">XXX.XXX.XXX.XXX</td>
</tr>
<tr>
<td class="head">Subnet Mask</td>
<td class="data">XXX.XXX.XXX.XXX</td>
</tr>
<tr>
<td class="head">Gateway IP Address</td>
<td class="data">XXX.XXX.XXX.XXX</td>
</tr>
<tr>
<td class="head">MAC Address</td>
<td class="data">XX-XX-XX-XX-XX-XX</td>
#ce
Link to comment
Share on other sites

I had to remove the " from after the word data because _ieGetBodyHTML strips the quoatation marks for some reason but other then that it works perfect.

I don't suppose you feel up to educating me on why my string grabs nothing?  I've just started learning regular expressions (my wife is the linguist, not me!) so I'm guessing I'm just not understanding the differences between whitespace characters.

Thanks either way, this saves me a lot of heart ache.

Link to comment
Share on other sites

You would need to invert the "greediness" (?U)

local $aData = StringRegExp($html, '(?U)(?i)(?s)Software.*class=.?data.?>(\S*)</td>',1)
_ArrayDisplay($aData)

mikell's is great, but to modify yours, i just blindly allow some char to be possible surrounding the 'data'.

Edited by jdelaney
IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...