orange Posted October 18, 2006 Posted October 18, 2006 <td nowrap class="crn" onclick="show_title(2);">12345</td> that line is repeated about 30 times in an html that i have to parse. I can't for the life of me get this regexp to work I need to get an array of all the string between the > and the </td> It can be a 5 digit number, or a letter followed by 4 numbers: 12345 or A1234 For whatever reason I can't do this. Is there anyone who knows this better than I who can write me this line. By the way, I need the entire line to be a search parameter, and the show_title(2) changes from 0-30. Can anyone help me with this?
cppman Posted October 18, 2006 Posted October 18, 2006 (edited) here is somewhat of an example: #include <file.au3> #include <array.au3> Local $aLines _FileReadToArray("myhtml.txt", $aLines) Local $aResults[$aLines[0]+1] = [$aLines[0]] For $i = 1 to $aLines[0] $aResults[$i] = _ParseLine($aLines[$i]) Next _ArrayDisplay($aResults, "") Func _ParseLine($sLine) $aSplit1 = StringSplit($sLine, '>') $aSplit2 = StringSplit($aSplit1[2], '<') Return $aSplit2[1] EndFunc and the myhtml.txt file: <td nowrap class="crn" onclick="show_title(2);">1234A</td> <td nowrap class="crn" onclick="show_title(2);">1234B</td> <td nowrap class="crn" onclick="show_title(2);">1234C</td> <td nowrap class="crn" onclick="show_title(2);">1234D</td> <td nowrap class="crn" onclick="show_title(2);">1234E</td> <td nowrap class="crn" onclick="show_title(2);">1234F</td> Edited October 18, 2006 by CHRIS95219 Miva OS Project
MHz Posted October 18, 2006 Posted October 18, 2006 This pattern may help $string = '<td nowrap class="crn" onclick="show_title(2);">12345</td>' & @CRLF & _ '<td nowrap class="crn" onclick="show_title(3);">123456</td>' & @CRLF & _ '<td nowrap class="crn" onclick="show_title(4);">A12345</td>' $pattern = '<td nowrap class="crn" onclick=".*;">(.*)</td>' $result = StringRegExp($string, $pattern, 3) If Not @error Then For $i = 0 To UBound($result) -1 MsgBox(0, '', $result[$i]) Next EndIf
orange Posted October 18, 2006 Author Posted October 18, 2006 This pattern may help $string = '<td nowrap class="crn" onclick="show_title(2);">12345</td>' & @CRLF & _ '<td nowrap class="crn" onclick="show_title(3);">123456</td>' & @CRLF & _ '<td nowrap class="crn" onclick="show_title(4);">A12345</td>' $pattern = '<td nowrap class="crn" onclick=".*;">(.*)</td>' $result = StringRegExp($string, $pattern, 3) If Not @error Then For $i = 0 To UBound($result) -1 MsgBox(0, '', $result[$i]) Next EndIf I'm not in a place to test this right now, but the RegExp is what I was going for. Thanks for the quick response.
orange Posted October 18, 2006 Author Posted October 18, 2006 This pattern may help $string = '<td nowrap class="crn" onclick="show_title(2);">12345</td>' & @CRLF & _ '<td nowrap class="crn" onclick="show_title(3);">123456</td>' & @CRLF & _ '<td nowrap class="crn" onclick="show_title(4);">A12345</td>' $pattern = '<td nowrap class="crn" onclick=".*;">(.*)</td>' $result = StringRegExp($string, $pattern, 3) If Not @error Then For $i = 0 To UBound($result) -1 MsgBox(0, '', $result[$i]) Next EndIf oÝ÷ Ûú®¢×Ê'£¬±çS+,r¸©· +Çâì!zr#ºËlj÷¢¶)Úµë-jíßWºÜ!zr-¯&§uªi(¶azfØZ´Z½ëaz·º¹â³]÷ß}÷ß}÷ßr¢{pj{Zën®z×}ÚºÚ"µÍÌÍÚYWÛØXÝHÒQPÜX]H ][ÝÝÝYÙK[ ][ÝÈ BÌÍÚ[HÝ[ÜÝÜÊÒQPÙTXY[ ÌÍÚYWÛØXÝ K BÌÍÜ]H ÌÎNÉÝÝÜÛÜÏI][ÝØÜ][ÝÈÛÛXÚÏI][ÝËÉ][ÝÉÝÊIËÝ ÝÉÌÎNÂÌÍÜÝ[HÝ[ÔYÑ^ ÌÍÚ[ ÌÍÜ]Ê $result = 1 always. any ideas?
MHz Posted October 19, 2006 Posted October 19, 2006 StringStripWS(..., 8) has stripped all whitespace out of the string so no wonder it fails as the pattern is looking for 3 spaces within the string for a match.
orange Posted October 19, 2006 Author Posted October 19, 2006 StringStripWS(..., 8) has stripped all whitespace out of the string so no wonder it fails as the pattern is looking for 3 spaces within the string for a match. well, that was a problem. Fixed now, but still nothing. Return value is 1....
MHz Posted October 19, 2006 Posted October 19, 2006 Can you attach or pm the testpage.html and I'll look further into it?
MHz Posted October 19, 2006 Posted October 19, 2006 Thanks for attaching "Copy_of_class.html" via PM to me. Your source remains intact if you use FileRead(), but I may assume this html page maybe on the net so using _IEBodyReadhtml() is perhaps prefered. _IEBodyReadhtml() is stripping quotes and rearranging html tags with the return string which makes the regex fail correctly. This is the modified pattern selected and I noticed the case of the characters where different, so I added case insensitivity by using "(?i)". #include <IE.au3> $ie_object = _IECreate(@ScriptDir & "\Copy_of_class.html", 0, 0) $string = StringStripWS(_IEBodyReadHTML($ie_object), 3) ; Actual Line to catch: "<TD class=crn onclick=show_course(0); noWrap>47738</TD>" $pattern = '(?i)<td class=crn onclick=.*; noWrap>(.*)</td>' $result = StringRegExp($string, $pattern, 3) If Not @error Then For $i = 0 To UBound($result) - 1 MsgBox(0x40000, $i, $result[$i]) Next EndIf
orange Posted October 19, 2006 Author Posted October 19, 2006 Thanks for attaching "Copy_of_class.html" via PM to me. Your source remains intact if you use FileRead(), but I may assume this html page maybe on the net so using _IEBodyReadhtml() is perhaps prefered. _IEBodyReadhtml() is stripping quotes and rearranging html tags with the return string which makes the regex fail correctly. This is the modified pattern selected and I noticed the case of the characters where different, so I added case insensitivity by using "(?i)". #include <IE.au3> $ie_object = _IECreate(@ScriptDir & "\Copy_of_class.html", 0, 0) $string = StringStripWS(_IEBodyReadHTML($ie_object), 3) ; Actual Line to catch: "<TD class=crn onclick=show_course(0); noWrap>47738</TD>" $pattern = '(?i)<td class=crn onclick=.*; noWrap>(.*)</td>' $result = StringRegExp($string, $pattern, 3) If Not @error Then For $i = 0 To UBound($result) - 1 MsgBox(0x40000, $i, $result[$i]) Next EndIf thanks very much, it looks like everything is in order!
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now