Jump to content

New StringRegExpReplace issue


Trystian
 Share

Recommended Posts

For some reason I just can't seem to get StringRegExpReplace to work right. As far as I can tell, I'm following the info from the help file correctly, but it still doesn't seem to work. The following Regular Expression I'm using pegs my CPU @ 100% when I run this. I'm using AutoIT (v3.1.1.87).

$strInput = "[misc html code]  1.   1- 2       101    1 Apr 00   <a target="visit" href="http://www.someplace.com/Titleinfo.html">Title</a>[more misc html code]"

$strTitle = StringRegExpReplace($strInput,"\.\s*1-\s*2.+<a.+>(.*)</a>","\1")

ConsoleWrite($strTitle)

Breakdown of the Regular Expression:

\. = Matches a period

\s* = Matches 0 or more whitespaces

1 = Matches the number 1

- = Matches a dash

\s* = Matches 0 or more whitespaces

2 = Matches the number 2

.+ = Matches 1 or more characters

<a = Matches <a

.+ = Matches 1 or more characters

> = Matches >

(.*) = Matches and captures 0 or more characters

</a> = Matches </a>

Sample source input code @ http://epguides.com/NCIS/

Once again, any help would be greatly appreciated.

Thank you in advance,

-Trystian

Link to comment
Share on other sites

I'm just wondering if it's a problem with my regular expression, or the StringRegExpReplace function. B)

Well, first of all, you're using multiple " in $strInput. Set the string to

$strInput = '[misc html code]1. 1- 2 101 1 Apr 00 <a target="visit" href="http://www.someplace.com/Titleinfo.html">Title</a>[more misc html code]'

Second, if I'm right that you want to extract the title of that string, why not using

$strTitle = StringRegExpReplace($strInput,'.*">(.*)</a>.*',"\1")

Hope that helps.
Link to comment
Share on other sites

Sorry, I copied and pasted the $input sample out of a webpage without reformatting the quotes. I gave a bad example.

As for the regular expression, I am attempting to extract the "Title", but in order to do that, it first has to find the 1- 2 that proceeds the <a href. The 1, 2 would be a set of variables grabbed from user input, so the Regular expression string would probably look something like this:

$intSeason = "1"

$intEpisode = "2"

$strTitle = StringRegExpReplace($strInput,"\.\s*" & $intSeason & "-" & $intEpisode & ".+<a.+>(.*)</a>","\1")

Of course it would look a little different, since this doesn't work. B)

-Trystian

Link to comment
Share on other sites

For some reason I just can't seem to get StringRegExpReplace to work right.

Why not just use:

$strInput = '[misc html code]  1.   1- 2       101    1 Apr 00   <a target="visit" href="http://www.someplace.com/Titleinfo.html">Title</a>[more misc html code]"'
$right = StringTrimLeft($strInput,StringInStr($strInput,'.html">')+6)
$TITLE = StringLeft($right,StringInStr($right,"</a>")-1)
MsgBox(64,"Here's the title:",$TITLE)

It returns just the word "Title" from your inputstring. B)

...by the way, it's pronounced: "JIF"... Bob Berry --- inventor of the GIF format
Link to comment
Share on other sites

Why not just use:

$strInput = '[misc html code]  1.   1- 2       101    1 Apr 00   <a target="visit" href="http://www.someplace.com/Titleinfo.html">Title</a>[more misc html code]"'
$right = StringTrimLeft($strInput,StringInStr($strInput,'.html">')+6)
$TITLE = StringLeft($right,StringInStr($right,"</a>")-1)
MsgBox(64,"Here's the title:",$TITLE)

It returns just the word "Title" from your inputstring. B)

This is good, but I need it to get the title, given a certain substring, ie: "1- 2", somewhere prior to the "<a ...". And I was really hoping to be able to do it with the StringRegExp or StringRegExpReplace functionality. Thank you though for this alternative. I'll hold on to this just in case there are no RegEx solutions.

-Trystian

Link to comment
Share on other sites

Thank you though for this alternative. I'll hold on to this just in case there are no RegEx solutions.

-Trystian

You bet! You are quite welcome... B)
...by the way, it's pronounced: "JIF"... Bob Berry --- inventor of the GIF format
Link to comment
Share on other sites

Maybe the problem you are having is because .+ and .* are greedy, they will keep on matching until they cant match anymore.

I think "<a.+>" will match all the way to the final > even if there is multiple ">"

Also < might be being read as a control try \<

Edited by PaulGX
Link to comment
Share on other sites

I also tried using the "?" after the repeating matches to make it find the smallest match, but the script just goes into a perpetual loop when I call the StringRegExpReplace.

$strOutput = StringRegExpReplace($strInput,"1-\s*2.+?\<a.+?>(.*)\</A>","\1")

I've tried a lot of different permutations, but apparently just not the RIGHT one. =). I think I'm going to give up on RegEx right now. I've spent 3 days on this issue, and getting nowhere. So it's back to the old string manipulation (StringinStr, StringMid, StringSplit, Etc.)

Thank you all for your efforts,

-Trystian

PS: I'll post my workaround here when I finish it.

Edited by TrystianSky
Link to comment
Share on other sites

Ok, finally finished with my alternative (NON-StringRegExpReplace) solution.

I've also included my own type of '_INetGetSource' function so this actually works as is. (Yes, I love reinventing the wheel. I like them square, makes for a more interesting ride.) B)

So here it is:

Opt("TCPTimeout",10000)
  
  Dim $strIP,$strHeader,$strSocketID,$intSe,$intEp,$strData,$strTitle
  Dim $strServer,$intPort,$strMethod,$strURI
  
  $strServer = "epguides.com"; Server Name
  $intPort = 80  ; Port Number
  $strMethod = "GET";Request Method (GET,POST,Etc.)
  $strURI = "andromeda/"; Path to Target Destination
  
  $intSe = "4"
  $intEp = "1"
  
  $strSE = $intSe & "-" & StringFormat("%2s",String($intEp))
  
  $strData = fcnGetWebData($strServer,$intPort,$strURI,$strMethod)
  
  $strTitle = fcnGetShowName()
  
  ConsoleWrite($strURI & ": S" & $intSe & "E" & $intEp & " - " & $strTitle & @CRLF)
  ConsoleWrite("URL: http://www." & $strServer & "/" & $strURI & @CRLF)
 
  Func fcnGetShowName()
      Dim $intRow,$intPointer,$intPointer2
      $intPointer = fcnSearchTarget($strData,$strSE,0,1)
      if @error = 0 Then
          $intPointer = fcnSearchTarget($strData,">",$intPointer,0)
          if @error = 0 Then
              $intPointer2 = fcnSearchTarget($strData,"</a>",$intPointer,0)-1
              if @error = 0 Then
                     $strTitle = StringMid($strData,$intPointer,$intPointer2-$intPointer)
                      Return $strTitle
              endif
          endif
      endif
      Return "[Not Found]"
  EndFunc
      
  Func fcnSearchTarget($strString,$strTarget,$intPointerIn,$bitAfter)
      Dim $intPointerTemp,$intPointerOut
      $intPointerTemp = StringInStr(StringMid($strString,$intPointerIn),$strTarget,0)
      if $intPointerTemp > 0 then
          $intPointerOut = $intPointerTemp
         $intPointerOut = $intPointerOut + $intPointerIn; Adds given optional offset to Pointer location
          if $bitAfter = 1 then
             $intPointerOut = $intPointerOut + StringLen($strTarget) + 1; Sets Pointer location AFTER Target string
          endif
      else
        ;Target not found
          SetError(1)
      endif
      Return $intPointerOut
  EndFunc
  
  Func fcnGetWebData($strServer,$intPort,$strURI,$strMethod)
    ; GetWebData v0.1b Coded by Trystian Sky (trystiansky.[at].gmail.[d0t].com)
    ; 15 September 2005
    ; This function is used to get raw data from a web source (www),
    ; and return it as a string for later processing.
    ; Parameters:
    ;   1 = Server address (www.somewhere.com)
    ;   2 = Port number (80)
    ;   3 = URI/Path (directory/file.htm)
    ;   4 = Method (GET,POST)
    ; @error codes:
    ;   1 = No Data/Bad Response
    ;   2 = Client error/Not found (4xx)
    ;   3 = Internal Server error (5xx)
    ;   4 = Unknown error
    ; This is still in beta, so it doesn't handle failures well YET. 
    ; This code is provided to you "AS IS" without warranty of any kind,
    ; either expressed or implied. Trystian Sky assumes no responsibility of the
    ; functionality or use of this software.
    ; Please give me credit if you use my code. Thanks.
      Dim $strIP,$intSocketID,$strHeader,$strData,$strDataChunk,$intTemp
      TCPStartup()
      $strIP = TCPNameToIP($strServer)
      $intSocketID = TCPConnect($strIP,$intPort)
      $strHeader = StringUpper($strMethod) & " /" & $strURI & " HTTP/1.1" & @CRLF & _
      "Host: " & $strServer & @CRLF & _
      "Connection: close" & @CRLF & _; close, keep-alive
     "Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8" & @CRLF & _
     "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" & @CRLF & @CRLF; User-Agent string: IE 6 on XP
      TCPSend($intSocketID,$strHeader)
      Sleep(100)
      for $intTemp = 1 to 2000; Increase this number for large files
          $strDataChunk = TCPRecv($intSocketID,1024)
          $strData = $strData & $strDataChunk
         if StringInStr($strDataChunk,"</html>",0) <> 0 then; if it finds an </html> tag, it stops retrieving data
              ExitLoop
          endif
      Next
      TCPCloseSocket($intSocketID)
      TCPShutdown()
      $intStatus = int(StringMid($strData,10,3))
      Switch $intStatus
          Case 200
            ;Page Found
              Return $strData
          Case 0
            ;No Data or Bad Response
              SetError(1)
          Case 400 To 410
            ;Client Error 4xx
              SetError(2)
          Case 500 To 505
            ;Internal Server Error 5xx
              SetError(3)
          Case Else
            ; Unknown
              SetError(4)
      EndSwitch
  EndFunc
Edited by TrystianSky
Link to comment
Share on other sites

where:

 $html = '  6.   1- 6               30 Apr 05   <a href="http://www.tv.com/dalek/episode/407897/summary.html">Dalek</a>'

$ret = stringregexp($html,'<a.*?summary.html">(.*?)</a>', 3)
if (ubound($ret) > 0) then
    $epname = $ret[0]
else
    $epname = "[no name]"
endif

Give this a try.... I think it will work.

Link to comment
Share on other sites

Ok, finally finished with my alternative (NON-StringRegExpReplace) solution.

I've also included my own type of '_INetGetSource' function so this actually works as is. (Yes, I love reinventing the wheel. I like them square, makes for a more interesting ride.) B)

Another thought... StringRegExp can pull all the show titles into an array. Just download the page, pull it into a variable and then use StringRegExp on it.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...