Sign in to follow this  
Followers 0
orange

RegExp --- cannot figure this out...

10 posts in this topic

<td nowrap class="crn" onclick="show_title(2);">12345</td>

that line is repeated about 30 times in an html that i have to parse.

I can't for the life of me get this regexp to work

I need to get an array of all the string between the > and the </td> It can be a 5 digit number, or a letter followed by 4 numbers: 12345 or A1234

For whatever reason I can't do this.

Is there anyone who knows this better than I who can write me this line.

By the way, I need the entire line to be a search parameter, and the show_title(2) changes from 0-30.

Can anyone help me with this?

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

here is somewhat of an example:

#include <file.au3>
#include <array.au3>
Local $aLines
_FileReadToArray("myhtml.txt", $aLines)
Local $aResults[$aLines[0]+1] = [$aLines[0]]
For $i = 1 to $aLines[0]
    $aResults[$i] = _ParseLine($aLines[$i])
Next
_ArrayDisplay($aResults, "")
Func _ParseLine($sLine)
    $aSplit1 = StringSplit($sLine, '>')
    $aSplit2 = StringSplit($aSplit1[2], '<')
    Return $aSplit2[1]
EndFunc

and the myhtml.txt file:

<td nowrap class="crn" onclick="show_title(2);">1234A</td>
<td nowrap class="crn" onclick="show_title(2);">1234B</td>
<td nowrap class="crn" onclick="show_title(2);">1234C</td>
<td nowrap class="crn" onclick="show_title(2);">1234D</td>
<td nowrap class="crn" onclick="show_title(2);">1234E</td>
<td nowrap class="crn" onclick="show_title(2);">1234F</td>
Edited by CHRIS95219

Share this post


Link to post
Share on other sites

This pattern may help

$string = '<td nowrap class="crn" onclick="show_title(2);">12345</td>' & @CRLF & _
        '<td nowrap class="crn" onclick="show_title(3);">123456</td>' & @CRLF & _
        '<td nowrap class="crn" onclick="show_title(4);">A12345</td>'
$pattern = '<td nowrap class="crn" onclick=".*;">(.*)</td>'
$result = StringRegExp($string, $pattern, 3)
If Not @error Then
    For $i = 0 To UBound($result) -1
        MsgBox(0, '', $result[$i])
    Next
EndIf

:lmao:

Share this post


Link to post
Share on other sites

This pattern may help

$string = '<td nowrap class="crn" onclick="show_title(2);">12345</td>' & @CRLF & _
  '<td nowrap class="crn" onclick="show_title(3);">123456</td>' & @CRLF & _
  '<td nowrap class="crn" onclick="show_title(4);">A12345</td>'
$pattern = '<td nowrap class="crn" onclick=".*;">(.*)</td>'
$result = StringRegExp($string, $pattern, 3)
If Not @error Then
 For $i = 0 To UBound($result) -1
  MsgBox(0, '', $result[$i])
 Next
EndIf

:lmao:

I'm not in a place to test this right now, but the RegExp is what I was going for. Thanks for the quick response.

Share this post


Link to post
Share on other sites

This pattern may help

$string = '<td nowrap class="crn" onclick="show_title(2);">12345</td>' & @CRLF & _
        '<td nowrap class="crn" onclick="show_title(3);">123456</td>' & @CRLF & _
        '<td nowrap class="crn" onclick="show_title(4);">A12345</td>'
$pattern = '<td nowrap class="crn" onclick=".*;">(.*)</td>'
$result = StringRegExp($string, $pattern, 3)
If Not @error Then
    For $i = 0 To UBound($result) -1
        MsgBox(0, '', $result[$i])
    Next
EndIf
oÝ÷ Ûú®¢×Ê'£¬±çS+,r¸©·
+Çâì!zr#ºËlj÷¢¶)Úµë-jíßWºÜ!zr-¯&§uªi(­¶azfØZ´Z½ëaz·­º¹â³]÷ß}÷ß}÷ßr¢{pj{Z­ën®z×}ÚºÚ"µÍÌÍÚYWÛØXÝHÒQPÜX]H
    ][ÝÝÝYÙK[   ][ÝÈ
BÌÍÚ[HÝ[ÜÝÜÊÒQPÙTXY[
    ÌÍÚYWÛØXÝ
K
BÌÍÜ]H   ÌÎNÉÝÝÜÛÜÏI][ÝØÜ][ÝÈÛÛXÚÏI][ÝËÉ][ÝÉÝÊIËÝ ÝÉÌÎNÂÌÍÜÝ[HÝ[ÔYÑ^
    ÌÍÚ[ ÌÍÜ]Ê

$result = 1 always.

any ideas?

Share this post


Link to post
Share on other sites

StringStripWS(..., 8) has stripped all whitespace out of the string so no wonder it fails as the pattern is looking for 3 spaces within the string for a match. :lmao:

Share this post


Link to post
Share on other sites

StringStripWS(..., 8) has stripped all whitespace out of the string so no wonder it fails as the pattern is looking for 3 spaces within the string for a match. :lmao:

well, that was a problem. Fixed now, but still nothing. Return value is 1....

Share this post


Link to post
Share on other sites

Can you attach or pm the testpage.html and I'll look further into it?

Share this post


Link to post
Share on other sites

Thanks for attaching "Copy_of_class.html" via PM to me.

Your source remains intact if you use FileRead(), but I may assume this html page maybe on the net so using _IEBodyReadhtml() is perhaps prefered. _IEBodyReadhtml() is stripping quotes and rearranging html tags with the return string which makes the regex fail correctly.

This is the modified pattern selected and I noticed the case of the characters where different, so I added case insensitivity by using "(?i)".

#include <IE.au3>

$ie_object = _IECreate(@ScriptDir & "\Copy_of_class.html", 0, 0)
$string = StringStripWS(_IEBodyReadHTML($ie_object), 3)

; Actual Line to catch: "<TD class=crn onclick=show_course(0); noWrap>47738</TD>"
$pattern = '(?i)<td class=crn onclick=.*; noWrap>(.*)</td>'
$result = StringRegExp($string, $pattern, 3)
If Not @error Then
    For $i = 0 To UBound($result) - 1
        MsgBox(0x40000, $i, $result[$i])
    Next
EndIf

:lmao:

Share this post


Link to post
Share on other sites

Thanks for attaching "Copy_of_class.html" via PM to me.

Your source remains intact if you use FileRead(), but I may assume this html page maybe on the net so using _IEBodyReadhtml() is perhaps prefered. _IEBodyReadhtml() is stripping quotes and rearranging html tags with the return string which makes the regex fail correctly.

This is the modified pattern selected and I noticed the case of the characters where different, so I added case insensitivity by using "(?i)".

#include <IE.au3>

$ie_object = _IECreate(@ScriptDir & "\Copy_of_class.html", 0, 0)
$string = StringStripWS(_IEBodyReadHTML($ie_object), 3)

; Actual Line to catch: "<TD class=crn onclick=show_course(0); noWrap>47738</TD>"
$pattern = '(?i)<td class=crn onclick=.*; noWrap>(.*)</td>'
$result = StringRegExp($string, $pattern, 3)
If Not @error Then
    For $i = 0 To UBound($result) - 1
        MsgBox(0x40000, $i, $result[$i])
    Next
EndIf

:lmao:

thanks very much, it looks like everything is in order!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0