Trystian Posted November 10, 2005 Share Posted November 10, 2005 For some reason I just can't seem to get StringRegExpReplace to work right. As far as I can tell, I'm following the info from the help file correctly, but it still doesn't seem to work. The following Regular Expression I'm using pegs my CPU @ 100% when I run this. I'm using AutoIT (v3.1.1.87).$strInput = "[misc html code] 1. 1- 2 101 1 Apr 00 <a target="visit" href="http://www.someplace.com/Titleinfo.html">Title</a>[more misc html code]" $strTitle = StringRegExpReplace($strInput,"\.\s*1-\s*2.+<a.+>(.*)</a>","\1") ConsoleWrite($strTitle)Breakdown of the Regular Expression:\. = Matches a period\s* = Matches 0 or more whitespaces1 = Matches the number 1- = Matches a dash\s* = Matches 0 or more whitespaces2 = Matches the number 2.+ = Matches 1 or more characters<a = Matches <a.+ = Matches 1 or more characters> = Matches >(.*) = Matches and captures 0 or more characters</a> = Matches </a>Sample source input code @ http://epguides.com/NCIS/Once again, any help would be greatly appreciated.Thank you in advance,-Trystian Link to comment Share on other sites More sharing options...
jpm Posted November 10, 2005 Share Posted November 10, 2005 I hope Nutster is working on all this StringRegExp issues. He bad bad time with his computer for a while. I hope he recover now. Link to comment Share on other sites More sharing options...
Trystian Posted November 11, 2005 Author Share Posted November 11, 2005 I'm just wondering if it's a problem with my regular expression, or the StringRegExpReplace function. Link to comment Share on other sites More sharing options...
HighGuy Posted November 11, 2005 Share Posted November 11, 2005 I'm just wondering if it's a problem with my regular expression, or the StringRegExpReplace function. Well, first of all, you're using multiple " in $strInput. Set the string to $strInput = '[misc html code]1. 1- 2 101 1 Apr 00 <a target="visit" href="http://www.someplace.com/Titleinfo.html">Title</a>[more misc html code]'Second, if I'm right that you want to extract the title of that string, why not using$strTitle = StringRegExpReplace($strInput,'.*">(.*)</a>.*',"\1")Hope that helps. Link to comment Share on other sites More sharing options...
Trystian Posted November 11, 2005 Author Share Posted November 11, 2005 Sorry, I copied and pasted the $input sample out of a webpage without reformatting the quotes. I gave a bad example.As for the regular expression, I am attempting to extract the "Title", but in order to do that, it first has to find the 1- 2 that proceeds the <a href. The 1, 2 would be a set of variables grabbed from user input, so the Regular expression string would probably look something like this:$intSeason = "1"$intEpisode = "2"$strTitle = StringRegExpReplace($strInput,"\.\s*" & $intSeason & "-" & $intEpisode & ".+<a.+>(.*)</a>","\1")Of course it would look a little different, since this doesn't work. -Trystian Link to comment Share on other sites More sharing options...
jefhal Posted November 11, 2005 Share Posted November 11, 2005 For some reason I just can't seem to get StringRegExpReplace to work right.Why not just use:$strInput = '[misc html code] 1. 1- 2 101 1 Apr 00 <a target="visit" href="http://www.someplace.com/Titleinfo.html">Title</a>[more misc html code]"' $right = StringTrimLeft($strInput,StringInStr($strInput,'.html">')+6) $TITLE = StringLeft($right,StringInStr($right,"</a>")-1) MsgBox(64,"Here's the title:",$TITLE)It returns just the word "Title" from your inputstring. ...by the way, it's pronounced: "JIF"... Bob Berry --- inventor of the GIF format Link to comment Share on other sites More sharing options...
Trystian Posted November 12, 2005 Author Share Posted November 12, 2005 Why not just use:$strInput = '[misc html code] 1. 1- 2 101 1 Apr 00 <a target="visit" href="http://www.someplace.com/Titleinfo.html">Title</a>[more misc html code]"' $right = StringTrimLeft($strInput,StringInStr($strInput,'.html">')+6) $TITLE = StringLeft($right,StringInStr($right,"</a>")-1) MsgBox(64,"Here's the title:",$TITLE)It returns just the word "Title" from your inputstring. This is good, but I need it to get the title, given a certain substring, ie: "1- 2", somewhere prior to the "<a ...". And I was really hoping to be able to do it with the StringRegExp or StringRegExpReplace functionality. Thank you though for this alternative. I'll hold on to this just in case there are no RegEx solutions.-Trystian Link to comment Share on other sites More sharing options...
jefhal Posted November 12, 2005 Share Posted November 12, 2005 Thank you though for this alternative. I'll hold on to this just in case there are no RegEx solutions.-TrystianYou bet! You are quite welcome... ...by the way, it's pronounced: "JIF"... Bob Berry --- inventor of the GIF format Link to comment Share on other sites More sharing options...
PaulDG Posted November 12, 2005 Share Posted November 12, 2005 (edited) Maybe the problem you are having is because .+ and .* are greedy, they will keep on matching until they cant match anymore. I think "<a.+>" will match all the way to the final > even if there is multiple ">" Also < might be being read as a control try \< Edited November 12, 2005 by PaulGX Link to comment Share on other sites More sharing options...
Trystian Posted November 12, 2005 Author Share Posted November 12, 2005 (edited) I also tried using the "?" after the repeating matches to make it find the smallest match, but the script just goes into a perpetual loop when I call the StringRegExpReplace. $strOutput = StringRegExpReplace($strInput,"1-\s*2.+?\<a.+?>(.*)\</A>","\1") I've tried a lot of different permutations, but apparently just not the RIGHT one. =). I think I'm going to give up on RegEx right now. I've spent 3 days on this issue, and getting nowhere. So it's back to the old string manipulation (StringinStr, StringMid, StringSplit, Etc.) Thank you all for your efforts, -Trystian PS: I'll post my workaround here when I finish it. Edited November 12, 2005 by TrystianSky Link to comment Share on other sites More sharing options...
Trystian Posted November 13, 2005 Author Share Posted November 13, 2005 (edited) Ok, finally finished with my alternative (NON-StringRegExpReplace) solution. I've also included my own type of '_INetGetSource' function so this actually works as is. (Yes, I love reinventing the wheel. I like them square, makes for a more interesting ride.) So here it is: expandcollapse popupOpt("TCPTimeout",10000) Dim $strIP,$strHeader,$strSocketID,$intSe,$intEp,$strData,$strTitle Dim $strServer,$intPort,$strMethod,$strURI $strServer = "epguides.com"; Server Name $intPort = 80 ; Port Number $strMethod = "GET";Request Method (GET,POST,Etc.) $strURI = "andromeda/"; Path to Target Destination $intSe = "4" $intEp = "1" $strSE = $intSe & "-" & StringFormat("%2s",String($intEp)) $strData = fcnGetWebData($strServer,$intPort,$strURI,$strMethod) $strTitle = fcnGetShowName() ConsoleWrite($strURI & ": S" & $intSe & "E" & $intEp & " - " & $strTitle & @CRLF) ConsoleWrite("URL: http://www." & $strServer & "/" & $strURI & @CRLF) Func fcnGetShowName() Dim $intRow,$intPointer,$intPointer2 $intPointer = fcnSearchTarget($strData,$strSE,0,1) if @error = 0 Then $intPointer = fcnSearchTarget($strData,">",$intPointer,0) if @error = 0 Then $intPointer2 = fcnSearchTarget($strData,"</a>",$intPointer,0)-1 if @error = 0 Then $strTitle = StringMid($strData,$intPointer,$intPointer2-$intPointer) Return $strTitle endif endif endif Return "[Not Found]" EndFunc Func fcnSearchTarget($strString,$strTarget,$intPointerIn,$bitAfter) Dim $intPointerTemp,$intPointerOut $intPointerTemp = StringInStr(StringMid($strString,$intPointerIn),$strTarget,0) if $intPointerTemp > 0 then $intPointerOut = $intPointerTemp $intPointerOut = $intPointerOut + $intPointerIn; Adds given optional offset to Pointer location if $bitAfter = 1 then $intPointerOut = $intPointerOut + StringLen($strTarget) + 1; Sets Pointer location AFTER Target string endif else ;Target not found SetError(1) endif Return $intPointerOut EndFunc Func fcnGetWebData($strServer,$intPort,$strURI,$strMethod) ; GetWebData v0.1b Coded by Trystian Sky (trystiansky.[at].gmail.[d0t].com) ; 15 September 2005 ; This function is used to get raw data from a web source (www), ; and return it as a string for later processing. ; Parameters: ; 1 = Server address (www.somewhere.com) ; 2 = Port number (80) ; 3 = URI/Path (directory/file.htm) ; 4 = Method (GET,POST) ; @error codes: ; 1 = No Data/Bad Response ; 2 = Client error/Not found (4xx) ; 3 = Internal Server error (5xx) ; 4 = Unknown error ; This is still in beta, so it doesn't handle failures well YET. ; This code is provided to you "AS IS" without warranty of any kind, ; either expressed or implied. Trystian Sky assumes no responsibility of the ; functionality or use of this software. ; Please give me credit if you use my code. Thanks. Dim $strIP,$intSocketID,$strHeader,$strData,$strDataChunk,$intTemp TCPStartup() $strIP = TCPNameToIP($strServer) $intSocketID = TCPConnect($strIP,$intPort) $strHeader = StringUpper($strMethod) & " /" & $strURI & " HTTP/1.1" & @CRLF & _ "Host: " & $strServer & @CRLF & _ "Connection: close" & @CRLF & _; close, keep-alive "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8" & @CRLF & _ "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" & @CRLF & @CRLF; User-Agent string: IE 6 on XP TCPSend($intSocketID,$strHeader) Sleep(100) for $intTemp = 1 to 2000; Increase this number for large files $strDataChunk = TCPRecv($intSocketID,1024) $strData = $strData & $strDataChunk if StringInStr($strDataChunk,"</html>",0) <> 0 then; if it finds an </html> tag, it stops retrieving data ExitLoop endif Next TCPCloseSocket($intSocketID) TCPShutdown() $intStatus = int(StringMid($strData,10,3)) Switch $intStatus Case 200 ;Page Found Return $strData Case 0 ;No Data or Bad Response SetError(1) Case 400 To 410 ;Client Error 4xx SetError(2) Case 500 To 505 ;Internal Server Error 5xx SetError(3) Case Else ; Unknown SetError(4) EndSwitch EndFunc Edited November 13, 2005 by TrystianSky Link to comment Share on other sites More sharing options...
jamesband Posted November 16, 2005 Share Posted November 16, 2005 where: $html = ' 6. 1- 6 30 Apr 05 <a href="http://www.tv.com/dalek/episode/407897/summary.html">Dalek</a>' $ret = stringregexp($html,'<a.*?summary.html">(.*?)</a>', 3) if (ubound($ret) > 0) then $epname = $ret[0] else $epname = "[no name]" endif Give this a try.... I think it will work. Link to comment Share on other sites More sharing options...
jamesband Posted November 16, 2005 Share Posted November 16, 2005 Ok, finally finished with my alternative (NON-StringRegExpReplace) solution. I've also included my own type of '_INetGetSource' function so this actually works as is. (Yes, I love reinventing the wheel. I like them square, makes for a more interesting ride.) Another thought... StringRegExp can pull all the show titles into an array. Just download the page, pull it into a variable and then use StringRegExp on it. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now