orange Posted October 18, 2006 Share Posted October 18, 2006 <td nowrap class="crn" onclick="show_title(2);">12345</td> that line is repeated about 30 times in an html that i have to parse. I can't for the life of me get this regexp to work I need to get an array of all the string between the > and the </td> It can be a 5 digit number, or a letter followed by 4 numbers: 12345 or A1234 For whatever reason I can't do this. Is there anyone who knows this better than I who can write me this line. By the way, I need the entire line to be a search parameter, and the show_title(2) changes from 0-30. Can anyone help me with this? Link to comment Share on other sites More sharing options...
cppman Posted October 18, 2006 Share Posted October 18, 2006 (edited) here is somewhat of an example: #include <file.au3> #include <array.au3> Local $aLines _FileReadToArray("myhtml.txt", $aLines) Local $aResults[$aLines[0]+1] = [$aLines[0]] For $i = 1 to $aLines[0] $aResults[$i] = _ParseLine($aLines[$i]) Next _ArrayDisplay($aResults, "") Func _ParseLine($sLine) $aSplit1 = StringSplit($sLine, '>') $aSplit2 = StringSplit($aSplit1[2], '<') Return $aSplit2[1] EndFunc and the myhtml.txt file: <td nowrap class="crn" onclick="show_title(2);">1234A</td> <td nowrap class="crn" onclick="show_title(2);">1234B</td> <td nowrap class="crn" onclick="show_title(2);">1234C</td> <td nowrap class="crn" onclick="show_title(2);">1234D</td> <td nowrap class="crn" onclick="show_title(2);">1234E</td> <td nowrap class="crn" onclick="show_title(2);">1234F</td> Edited October 18, 2006 by CHRIS95219 Miva OS Project Link to comment Share on other sites More sharing options...
MHz Posted October 18, 2006 Share Posted October 18, 2006 This pattern may help $string = '<td nowrap class="crn" onclick="show_title(2);">12345</td>' & @CRLF & _ '<td nowrap class="crn" onclick="show_title(3);">123456</td>' & @CRLF & _ '<td nowrap class="crn" onclick="show_title(4);">A12345</td>' $pattern = '<td nowrap class="crn" onclick=".*;">(.*)</td>' $result = StringRegExp($string, $pattern, 3) If Not @error Then For $i = 0 To UBound($result) -1 MsgBox(0, '', $result[$i]) Next EndIf Link to comment Share on other sites More sharing options...
orange Posted October 18, 2006 Author Share Posted October 18, 2006 This pattern may help $string = '<td nowrap class="crn" onclick="show_title(2);">12345</td>' & @CRLF & _ '<td nowrap class="crn" onclick="show_title(3);">123456</td>' & @CRLF & _ '<td nowrap class="crn" onclick="show_title(4);">A12345</td>' $pattern = '<td nowrap class="crn" onclick=".*;">(.*)</td>' $result = StringRegExp($string, $pattern, 3) If Not @error Then For $i = 0 To UBound($result) -1 MsgBox(0, '', $result[$i]) Next EndIf I'm not in a place to test this right now, but the RegExp is what I was going for. Thanks for the quick response. Link to comment Share on other sites More sharing options...
orange Posted October 18, 2006 Author Share Posted October 18, 2006 This pattern may help $string = '<td nowrap class="crn" onclick="show_title(2);">12345</td>' & @CRLF & _ '<td nowrap class="crn" onclick="show_title(3);">123456</td>' & @CRLF & _ '<td nowrap class="crn" onclick="show_title(4);">A12345</td>' $pattern = '<td nowrap class="crn" onclick=".*;">(.*)</td>' $result = StringRegExp($string, $pattern, 3) If Not @error Then For $i = 0 To UBound($result) -1 MsgBox(0, '', $result[$i]) Next EndIf oÝ÷ Ûú®¢×Ê'£¬±çS+,r¸©· +Çâì!zr#ºËlj÷¢¶)Úµë-jíßWºÜ!zr-¯&§uªi(¶azfØZ´Z½ëaz·º¹â³]÷ß}÷ß}÷ßr¢{pj{Zën®z×}ÚºÚ"µÍÌÍÚYWÛØXÝHÒQPÜX]H ][ÝÝÝYÙK[ ][ÝÈ BÌÍÚ[HÝ[ÜÝÜÊÒQPÙTXY[ ÌÍÚYWÛØXÝ K BÌÍÜ]H ÌÎNÉÝÝÜÛÜÏI][ÝØÜ][ÝÈÛÛXÚÏI][ÝËÉ][ÝÉÝÊIËÝ ÝÉÌÎNÂÌÍÜÝ[HÝ[ÔYÑ^ ÌÍÚ[ ÌÍÜ]Ê $result = 1 always. any ideas? Link to comment Share on other sites More sharing options...
MHz Posted October 19, 2006 Share Posted October 19, 2006 StringStripWS(..., 8) has stripped all whitespace out of the string so no wonder it fails as the pattern is looking for 3 spaces within the string for a match. Link to comment Share on other sites More sharing options...
orange Posted October 19, 2006 Author Share Posted October 19, 2006 StringStripWS(..., 8) has stripped all whitespace out of the string so no wonder it fails as the pattern is looking for 3 spaces within the string for a match. well, that was a problem. Fixed now, but still nothing. Return value is 1.... Link to comment Share on other sites More sharing options...
MHz Posted October 19, 2006 Share Posted October 19, 2006 Can you attach or pm the testpage.html and I'll look further into it? Link to comment Share on other sites More sharing options...
MHz Posted October 19, 2006 Share Posted October 19, 2006 Thanks for attaching "Copy_of_class.html" via PM to me. Your source remains intact if you use FileRead(), but I may assume this html page maybe on the net so using _IEBodyReadhtml() is perhaps prefered. _IEBodyReadhtml() is stripping quotes and rearranging html tags with the return string which makes the regex fail correctly. This is the modified pattern selected and I noticed the case of the characters where different, so I added case insensitivity by using "(?i)". #include <IE.au3> $ie_object = _IECreate(@ScriptDir & "\Copy_of_class.html", 0, 0) $string = StringStripWS(_IEBodyReadHTML($ie_object), 3) ; Actual Line to catch: "<TD class=crn onclick=show_course(0); noWrap>47738</TD>" $pattern = '(?i)<td class=crn onclick=.*; noWrap>(.*)</td>' $result = StringRegExp($string, $pattern, 3) If Not @error Then For $i = 0 To UBound($result) - 1 MsgBox(0x40000, $i, $result[$i]) Next EndIf Link to comment Share on other sites More sharing options...
orange Posted October 19, 2006 Author Share Posted October 19, 2006 Thanks for attaching "Copy_of_class.html" via PM to me. Your source remains intact if you use FileRead(), but I may assume this html page maybe on the net so using _IEBodyReadhtml() is perhaps prefered. _IEBodyReadhtml() is stripping quotes and rearranging html tags with the return string which makes the regex fail correctly. This is the modified pattern selected and I noticed the case of the characters where different, so I added case insensitivity by using "(?i)". #include <IE.au3> $ie_object = _IECreate(@ScriptDir & "\Copy_of_class.html", 0, 0) $string = StringStripWS(_IEBodyReadHTML($ie_object), 3) ; Actual Line to catch: "<TD class=crn onclick=show_course(0); noWrap>47738</TD>" $pattern = '(?i)<td class=crn onclick=.*; noWrap>(.*)</td>' $result = StringRegExp($string, $pattern, 3) If Not @error Then For $i = 0 To UBound($result) - 1 MsgBox(0x40000, $i, $result[$i]) Next EndIf thanks very much, it looks like everything is in order! Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now