Sign in to follow this  
Followers 0
drever44

StringRegExp in a loop

24 posts in this topic

I am fairly new to auto it and have been reading up on this for several weeks now. Some time ago I created a neat little script using Aldos Macro Recorder. All worked rather nicely.

So I am trying to convert it over to autoit for many reasons. More functionality and the ability to compile my script into an exe file are just a few.

I have read all that I can get my hands on when it comes to StringRegExp for autoit and well I must say I am still in the dark. Spent sever days playing around with it and can get it to do what I want most of the time but still dont seam to quite have it.

I have a network server with a ton of html pages on it that I have created. I want to move things around and reorganize them and of cause this breaks the links. I am able to get the script to go threw the list of files one at a time however I cant get it to find links and fix them, well sort of, it will work the first time but if I try to loop it the second time around things get weird.

So I am using a page from imdb as an example which has several links in it which is similar to what I am trying to do on my server.

I want to find all similar instances of this link and change it to a different link

From: <a href=/name/nm0000001/>Fred Astaire</a>

To: <a href=Fred Astaire.html>Fred Astaire</a>

Now that sounds simple I know but hers the catch the nm numbers are all different and the names are as well. Each number is assigned to that name which is the reason for the loop.

Here is what I have written so far

CODE
Do

$Tfile = "temp.txt"

$Tread = FileRead($Tfile)

$glnum = stringregexp($Tread, '<a href="/name/*.*/">', 1, 0)

$glnum = StringTrimLeft($glnum[0], 15)

$glnum = StringTrimRight($glnum, 3)

MsgBox(0, 'Link Number Result', $glnum, 1)

$glname = StringRegExp($Tread, <a href="/name/& $glnum &'/">*.*</a>_', 1, 0)

$glname = StringTrimLeft($glname[0], 3)

$glname = StringTrimRight($glname, 4)

MsgBox(0, 'Link Name Result', $glname, 1)

_ReplaceStringInFile("temp.txt", '<a href="/name/'& $glnum &'/">'& $glname &'</a>', '<a href="'& $glname &'.html/">'& $glname &'</a>',0 ,1)

until $glnum = ""

Any incite as to what I am doing wrong here would be much appreciated for I am truly here as a last ditch effort. And cant find the answers I am looking for an other threads, some are close and I have tried them but with no luck.

Thanks for your time.

Share this post


Link to post
Share on other sites



I am fairly new to auto it and have been reading up on this for several weeks now. Some time ago I created a neat little script using Aldo's Macro Recorder. All worked rather nicely.

So I am trying to convert it over to autoit for many reasons. More functionality and the ability to compile my script into an exe file are just a few.

I have read all that I can get my hands on when it comes to "StringRegExp" for autoit and well I must say I am still in the dark. Spent sever days playing around with it and can get it to do what I want most of the time but still don't seam to quite have it.

I have a network server with a ton of html pages on it that I have created. I want to move things around and reorganize them and of cause this breaks the links. I am able to get the script to go threw the list of files one at a time however I cant get it to find links and fix them, well sort of, it will work the first time but if I try to loop it the second time around things get weird.

So I am using a page from imdb as an example which has several links in it which is similar to what I am trying to do on my server.

I want to find all similar instances of this link and change it to a different link

From: <a href="/name/nm0000001/">Fred Astaire</a>

To: <a href="Fred Astaire.html">Fred Astaire</a>

Now that sounds simple I know but hers the catch the nm numbers are all different and the names are as well. Each number is assigned to that name which is the reason for the loop.

Here is what I have written so far

CODE
Do

$Tfile = "temp.txt"

$Tread = FileRead($Tfile)

$glnum = stringregexp($Tread, '<a href="/name/*.*/">', 1, 0)

$glnum = StringTrimLeft($glnum[0], 15)

$glnum = StringTrimRight($glnum, 3)

MsgBox(0, 'Link Number Result', $glnum, 1)

$glname = StringRegExp($Tread, '<a href="/name/'& $glnum &'/">*.*</a>_', 1, 0)

$glname = StringTrimLeft($glname[0], 3)

$glname = StringTrimRight($glname, 4)

MsgBox(0, 'Link Name Result', $glname, 1)

_ReplaceStringInFile("temp.txt", '<a href="/name/'& $glnum &'/">'& $glname &'</a>', '<a href="'& $glname &'.html/">'& $glname &'</a>',0 ,1)

until $glnum = ""

Any incite as to what I am doing wrong here would be much appreciated for I am truly here as a last ditch effort. And cant find the answers I am looking for an other threads, some are close and I have tried them but with no luck.

Thanks for your time.

If you upload 'temp.txt', i will test and modify, later~

Share this post


Link to post
Share on other sites

If you upload 'temp.txt', i will test and modify, later~

I am unable to upload for some reason so this will retrieve the file that I am using and rename it temp.txt

CODE

Thanks

Share this post


Link to post
Share on other sites

I am unable to upload for some reason so this will retrieve the file that I am using and rename it temp.txt

CODE

Thanks

You want to this ?

For $i = +1 to +3 Step +1
    $Tindex = "nm" & StringRight( "000000" & $i , 7 )
    $Tfile = "C:\TEMP\" & $Tindex & ".html"

    InetGet( "[url="http://www.imdb.com/name/"]http://www.imdb.com/name/[/url]" & $Tindex & "/bio" , $Tfile , 1 )

    MsgBox( 0 , $Tindex , "Saved " & FileGetSize( $Tfile ) & " bytes to " & $Tfile , 1 )
Next

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

You want to this ?

For $i = +1 to +3 Step +1
    $Tindex = "nm" & StringRight( "000000" & $i , 7 )
    $Tfile = "C:\TEMP\" & $Tindex & ".html"

    InetGet( "[url="http://www.imdb.com/name/"]http://www.imdb.com/name/[/url]" & $Tindex & "/bio" , $Tfile , 1 )

    MsgBox( 0 , $Tindex , "Saved " & FileGetSize( $Tfile ) & " bytes to " & $Tfile , 1 )
Next
Well that is part of it I am trying to find and change all similar instances of the links that are like this in the entire page

<a href="/name/nm0000001/">Fred Astaire</a>

And change them to this

<a href="Fred Astaire.html">Fred Astaire</a>

Which is why I was looping it till it did not find any more so I was first getting the nm number and assigning it to $glnum and then getting the name associated with that number and assigning it to $glname so that I could then find the string that contained both num and name to replace it with my string/link <a href="Fred Astaire.html">Fred Astaire</a>. But in the short I am trying to replace <a href="/name/nm0000001/">Fred Astaire</a> with <a href="Fred Astaire.html">Fred Astaire</a> and all other similar links, only the numbers and name change from link to link and the number of links vary from page to page so it seemed to be a little tricky to me, perhaps I was over thinking it.

Edited by drever44

Share this post


Link to post
Share on other sites

Well that is part of it I am trying to find and change all similar instances of the links that are like this in the entire page

<a href="/name/nm0000001/">Fred Astaire</a>

And change them to this

<a href="Fred Astaire.html">Fred Astaire</a>

Which is why I was looping it till it did not find any more so I was first getting the nm number and assigning it to $glnum and then getting the name associated with that number and assigning it to $glname so that I could then find the string that contained both num and name to replace it with my string/link <a href="Fred Astaire.html">Fred Astaire</a>. But in the short I am trying to replace <a href="/name/nm0000001/">Fred Astaire</a> with <a href="Fred Astaire.html">Fred Astaire</a> and all other similar links, only the numbers and name change from link to link and the number of links vary from page to page so it seemed to be a little tricky to me, perhaps I was over thinking it.

Try this, friend :-)

#Include <Array.au3>

;===================================================================================================

=================
; AutoIt3 Forum, GoodMan
; E-Mail to ChangMin,Yang<[email="year1969@naver.com"]year1969@naver.com[/email]> Republic of Korea
; Reply to [url="http://www.autoitscript.com/forum/index.php?s=&showtopic=85645&view=findpost&p=614352"]http://www.autoitscript.com/forum/index.php?s=&showtopic=85645&view=findpost&p=614352[/url]
;===================================================================================================

=================

Local $Tindex = ""
Local $Tpath = "C:\TEMP\"
Local $Tfile = ""
Local $Turl = ""
Local $Tread = ""
Local $Tname = ""
Local $Tsave = ""

Local $TflagSpaceToDash = 0; [0]=Save name with SPACE , [1]=Save name without SPACE (replaced to under-dash)

Local $TworkBeg = 1; Save from ...
Local $TworkEnd = 3; Save to ...

For $i = $TworkBeg to $TworkEnd Step +1
    $Tindex = "nm" & StringRight( "000000" & $i , 7 )
    $Tfile = $Tpath & $Tindex & ".html"

    $Turl = "[url="http://www.imdb.com/name/"]http://www.imdb.com/name/[/url]" & $Tindex & "/bio"

    InetGet( $Turl , $Tfile , 1 )

    If @Error Then
        MsgBox( 0 , $Tindex , "INet Get Error at " & $Turl , 1 )

        If $i = 1 Then
            ExitLoop
        EndIf
    Else
        $Tread = FileRead( $Tfile )

        $Tname = StringRegExp( $Tread , '(<meta name="title" content=")([[:ascii:]]+)([ ]+\-[ ]+Biography">)' , 1 , 1 )

        If IsArray( $Tname ) = 1 AND StringLen( $Tname[1] ) >= 1 Then
            If $TflagSpaceToDash Then
                $Tsave = $Tpath & StringReplace( $Tname[1] , " " , "_" , 0 , 2 ) & ".html"
            Else
                $Tsave = $Tpath & $Tname[1] & ".html"
            EndIf

            If FileExists( $Tsave ) = 1 Then
                FileDelete( $Tsave )
            EndIf

            $Tread = StringRegExpReplace( $Tread , "(<iframe [^>]+>)(<[/]iframe>|)", "" )

            FileWrite( $Tsave , '<BASE href="[url="http://www.imdb.com/"]'">http://www.imdb.com/">'[/url] & @LF & $Tread )

            MsgBox( 0 , $Tindex & " => " & $Tname[1] , "Saved SRC: " & FileGetSize( $Tfile ) & " bytes" & @CRLF & "Saved DST: " & FileGetSize( $Tsave ) & " bytes " & $Tsave , 1 )

            If FileExists( $Tfile ) = 1 Then
                FileDelete( $Tfile )
            EndIf
        EndIf
    EndIf
Next

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

From: <a href=/name/nm0000001/>Fred Astaire</a>

To: <a href=Fred Astaire.html>Fred Astaire</a>

#include <Inet.au3>
$sURL = _InetGetSource ("whatever page");; This could also be a FileRead() if they are local files.
$aRegExp = StringRegExp($sURL, "<a\s.+\s?=.?/.+</a>", 3);; Get the links into an array.
If NOT @Error Then
 For $i = 0 To Ubound($aRegExp) -1
;;   Now we will replace them
   $sURL = StringRegExpReplace($sURL, "(?i)<a\s.+\s?=.?/.+>(.+)</a>", '<a href="$1\.html">$1</a>')
 Next
EndIf

Edit: Removed a capturing group that it didn't need and changed the back-reference number.

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

Wow from the looks of this I was not even close.

Must be the downfall of knowing too many different programming languages or something for I thought I could pick this up rather easily but after much reading and digging around the list of terms and commands are quite overwhelming but I will definitely continue to absorb this a little at a time. I will be able to pick this apart tonight and see if I can get this to work with my server thanks

Oh and one more thing am I understanding this correctly, at the top of the script there is a list of local strings. Are these to be filled in with the defaults? or are you just clearing them in preporation for the rest of the script?

Many Thanks

Edited by drever44

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

Oh and one more thing am I understanding this correctly, at the top of the script there is a list of local strings. Are these to be filled in with the defaults? or are you just clearing them in preporation for the rest of the script?

Many Thanks

He is just declaring the variables for the rest of the script.

Do you already have an array of files? Are there local copies of the files?

If it's only from the server then just put the code I gave you inside another loop. Shorter and faster. I can give you another example for that. If they are local then it's still shorter to use the RegExp and I would use it as a separate function in that case.

EDIT: BTW: Don't forget that if you have images etc. where the html code will contain something like src="/images/inage1.jpg" then you will have to also update all of those and most of everything you want could be done in a single function.

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

And thanks to you GEOSoft I will also utilize your example too. It really helps me if I have some thing to work from or an example to learn from. Some of the terminology is overwhelming in the help definitions. I see that I will be up late tonight going over all this.

Many Thanks

Share this post


Link to post
Share on other sites

And thanks to you GEOSoft I will also utilize your example too. It really helps me if I have some thing to work from or an example to learn from. Some of the terminology is overwhelming in the help definitions. I see that I will be up late tonight going over all this.

Many Thanks

No problem but make sure you check the edit on my last post.

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

He is just declaring the variables for the rest of the script.

Do you already have an array of files? Are there local copies of the files?

If it's only from the server then just put the code I gave you inside another loop. Shorter and faster. I can give you another example for that. If they are local then it's still shorter to use the RegExp and I would use it as a separate function in that case.

Well I am trying to accomplish several things here, I am an avid movie lover and I have created a large array of storage for my Store Bought Movies (over 14 terabytes) and also created several databases with Movie info and what not, actors and stuff. I wanted to cross reference them so if I click on an actor I get a list of the movies that the actor is in that I have. Then I can click on the movie and get that info and even watch it from any computer in my house, even sending it to my projector room for a family viewing.

Most of my files are local and also I will be gathering some info such as biographies and stuff of the actors and plots for the movies. So my dataset is local and the data is not, if that makes sense. After all is done all info will be local for my server.

And I would love to look at any examples that you would like to show me. The more the better. This is the best way for me to learn, I know that there is more then one way to skin a cat.

Many Thanks.

Share this post


Link to post
Share on other sites

If they are local then you don't need to use _InetGetSource()

Assuming that you already have an array of the files including the path, we will call that array $aFiles.

For $i = 1 To Ubound($aFiles) -1;;  If the array is 0 based then change "$i = 1" to  $i= 0
   $sNewCode = _ModLinks($aFiles[$i])
   If NOT @Error Then
      $oFile = FileOpen($aFiles[$i], 2)
      FileWrite($oFile, $sNewCode)
      FileClose($oFile)
   EndIf
Next

Func _ModLinks($sStr)
If FileExists($sStr) Then $sStr = FileRead($sStr)
$aRegExp = StringRegExp($sStr, "<a\s.+\s?=.?/.+</a>", 3);; Get the links into an array.
If NOT @Error Then
   For $i = 0 To Ubound($aRegExp) -1
     ;;   Now we will replace them
      $sStr = StringRegExpReplace($sStr, "(?i)<a\s.+\s?=.?/.+>(.+)</a>", '<a href="$1\.html">$1</a>')
   Next
   Return $sStr
EndIf
   Return SetError(1);; The array could not be created so set @Error to 1
EndFunc

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

If they are local then you don't need to use _InetGetSource()

Assuming that you already have an array of the files including the path, we will call that array $aFiles.

For $i = 1 To Ubound($aFiles) -1;;  If the array is 0 based then change "$i = 1" to  $i= 0
   $sNewCode = _ModLinks($aFiles[$i])
   If NOT @Error Then
      $oFile = FileOpen($aFiles[$i], 2)
      FileWrite($oFile, $sNewCode)
      FileClose($oFile)
   EndIf
Next

Func _ModLinks($sStr)
If FileExists($sStr) Then $sStr = FileRead($sStr)
$aRegExp = StringRegExp($sStr, "<a\s.+\s?=.?/.+</a>", 3);; Get the links into an array.
If NOT @Error Then
   For $i = 0 To Ubound($aRegExp) -1
    ;;   Now we will replace them
      $sStr = StringRegExpReplace($sStr, "(?i)<a\s.+\s?=.?/.+>(.+)</a>", '<a href="$1\.html">$1</a>')
   Next
   Return $sStr
EndIf
   Return SetError(1);; The array could not be created so set @Error to 1
EndFunc
I have been trying to understand the many handles for this function and I must say I have had little luck. Perhaps something a little more simple like returning the title of a page between the <title> and </title> in the <head> could you provide me with links that would perhaps better explain this. Or maybe I am totally using the wrong function for doing this seemingly simple task. Oh and I have picked your examples apart and fiddled with them quite extensively and just can’t understand why it works.

Share this post


Link to post
Share on other sites

#15 ·  Posted (edited)

I have been trying to understand the many handles for this function and I must say I have had little luck. Perhaps something a little more simple like returning the title of a page between the <title> and </title> in the <head> could you provide me with links that would perhaps better explain this. Or maybe I am totally using the wrong function for doing this seemingly simple task. Oh and I have picked your examples apart and fiddled with them quite extensively and just cant understand why it works.

Can you post the method you use for getting the file list into an array and then perhaps I can add more comments. The comments in the function itself should be pretty sufficient to understand whats happening there.

$sStr is either a block of text or a file path and name. File name alone will seldom work on file functions.

$aRegEx is an array of the links found in the the block or file ($sStr).

Then it checks to be sure that the array was created, if not it will return an error.

If no error then it replaces the found links in the text with the proper string by looping through the array.

I wouldn't exactly call 2 handles "many". 5 if you count those in the code block before the function itself, still not "many".

I really suspect that your issue may be related to not sending the full path to the function.

Test it yourself by sending the file contents (use fileread) of the 1st element of your file array to to the clipboard and then pasting it into a new text editor window. Did you get what you expected?

You could also test my function by simply copying one of your html pages to the clipboard and then call the function as below.

ClipPut (_ModLinks(ClipGet()))

Then paste the new clip contents into a new blank editor page.

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

Sorry when I mentioned handles I was referring to the list below matching characters in the function help for stringregexp but I am getting to understand them a little better, I have been messing around with them and came up with this but I am not shire that it is exactly correct.

So I have sort of taken a step back from what you posted for me last and were trying to do something a little easier so I can better understand the functionality of stringregexp

so for the learning experience I am trying to extract the name of a HTML page to a string and also there is an ocasional &#xxx; html code that I will have to work out later.

FileOpen(c:/html/0012.html, 0)
$myfile = FileRead(c:/html/0012.html)
FileClose(c:/html/0012.html)
$name = StringRegExp($myfile, '<title>(.*?)</title>', 1)
MsgBox(0, "Output", $name[0], 20)

I have declared the strings before using them and I am using a msgbox for testing the string just to see if it is outputting what I want. However I am not understanding why I have to have [0] after the string name to view it in the message box. Or even use it in combination with other strings for that matter. I have searched the help files and not finding any definition of usage for this type of command if you call it that. How can I get this to work without using [0] for some times it errors out for seemingly no reason with a return error of ==> Subscript used with non-Array variable.: on the msgbox line. Perhaps you could point me in the right direction here.

this is something i can do in just a few lines of code with vbscript. perhaps this is my problem for i keep thinking in vbscript.

Many thanks.

Share this post


Link to post
Share on other sites

Sorry when I mentioned handles I was referring to the list below matching characters in the function help for stringregexp but I am getting to understand them a little better, I have been messing around with them and came up with this but I am not shire that it is exactly correct.

So I have sort of taken a step back from what you posted for me last and were trying to do something a little easier so I can better understand the functionality of stringregexp

so for the learning experience I am trying to extract the name of a HTML page to a string and also there is an ocasional &#xxx; html code that I will have to work out later.

FileOpen(c:/html/0012.html, 0)
$myfile = FileRead(c:/html/0012.html)
FileClose(c:/html/0012.html)
$name = StringRegExp($myfile, '<title>(.*?)</title>', 1)
MsgBox(0, "Output", $name[0], 20)

I have declared the strings before using them and I am using a msgbox for testing the string just to see if it is outputting what I want. However I am not understanding why I have to have [0] after the string name to view it in the message box. Or even use it in combination with other strings for that matter. I have searched the help files and not finding any definition of usage for this type of command if you call it that. How can I get this to work without using [0] for some times it errors out for seemingly no reason with a return error of ==> Subscript used with non-Array variable.: on the msgbox line. Perhaps you could point me in the right direction here.

this is something i can do in just a few lines of code with vbscript. perhaps this is my problem for i keep thinking in vbscript.

Many thanks.

In this case you are only expecting the Title tage to appear once so the usage of StringRegExp(String, Expression, 1) is correct. and if you look at the help file you will see that anything except 0 will return an array. RegExp Arrays are zero based which means in this case that the first element will contain the data in element 0, hence the need to use [0].

Another tip. In most cases there is no need to open and close the file to read it. Just use

FileRead(c:/html/0012.html)

In the function I gave you, the first regexp is expection 1 or more instances of the string so I used flag 3

The break down for that reg exp is

Find any occurances of "<a" followed by a space(\s) then get anything (.+) until we find an equal sign which may or may not be preceeded by a space(\s?=) and possibly followed by another character(.?) which is usually a double quote but in html could also be a single followed by a slash and anything until the end of the tag(/.+</a>)

The replace is much the same except that we are now working only against the results found in each element of the first array. "(?i)<a\s.+\s?=.?/.+>(.+)</a>" translates to

Case-insensitive [(?i)] start with the literal string ("<a") which is follwed hy a space and anything up to the first ">" but that must include an = sign and may or may not contain spaces around the = sign. We are actually ignoring all of this because the part we want follows the ">". That part I enclosed in parenthesis so we can back reference it [(.+)] which means that we get everything up to but NOT INCLUDING the "</a>". The replacement string is what we want the string to start with (<a href="), followed by the part we saved. in this case I back-referenced that with $1 but I could just as easily have used \1 (no difference) and followed the backreference with the remainder of our URL which is just (.html">). Then I but put the display text in which is just the part we saved, again back-referenced as $1 and then closed the URL with </a>

It may seem complex, but really it's not if you compare the breakdown to the actual code I gave you. RegExps are difficult for most developers because there are so many different engines available and many things are not consistent between the engines. Also most developers will avoid their use if there is another fairly simple method of doing the same thing. In this case RegExp was the better way to go, just don't fall into the trap of overusing them, primarily because they are generally slower than normal String functions and there is no sense in banging your head against a wall just so you can write slower code.


George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

Thanks I can actually understand that now and also I went back and started to reread the help files and found a tutorial on stringregexp that I must have overlooked before and that also helped a lot. Then came across @extended which really brought light to the [0] deal. This is very intriguing to me, I did a 48 hour session straight threw playing around with the stringregexp in a while loop using the @extended and I must say its a little tricky but can see the need for it now. I was able to extract the name from a HTML file and replace all the &#XXX; instances with its corresponding characters so I have made some achievements since my first post. However I have experienced some issues, a few times I had a hard crash and once in a while it seams to skip over the &#XXX; for some reason, sort of like it is running so fast it trips.

In the help file it sad that open and close was not necessary but it was good practice to do it that way so that was why I did it. And also I was kind of playing around with stringinstr, am I understanding that this function will not produce a string but only tell you if it exists? And my last question, is it possible to do the equivalent to stringregexp with out the array, for the script that I have in mind will probably contain 30 to 35 stringregexp unless you think this is not overusing it?

Share this post


Link to post
Share on other sites

#19 ·  Posted (edited)

Thanks I can actually understand that now and also I went back and started to reread the help files and found a tutorial on stringregexp that I must have overlooked before and that also helped a lot. Then came across @extended which really brought light to the [0] deal. This is very intriguing to me, I did a 48 hour session straight threw playing around with the stringregexp in a while loop using the @extended and I must say its a little tricky but can see the need for it now. I was able to extract the name from a HTML file and replace all the &#XXX; instances with its corresponding characters so I have made some achievements since my first post. However I have experienced some issues, a few times I had a hard crash and once in a while it seams to skip over the &#XXX; for some reason, sort of like it is running so fast it trips.

In the help file it sad that open and close was not necessary but it was good practice to do it that way so that was why I did it. And also I was kind of playing around with stringinstr, am I understanding that this function will not produce a string but only tell you if it exists? And my last question, is it possible to do the equivalent to stringregexp with out the array, for the script that I have in mind will probably contain 30 to 35 stringregexp unless you think this is not overusing it?

Without seeing the actual RegExps and the page you are running it against, I won't venture a guess at why it missed some. RegExp doesn't run fast at any time so I would doubt that it "trips".

Yes, good antiquated code practice dictates that it's better to use FileOpen() and I guess I should use it more. I seldom use it unless I want to delete the contents of the file before writing to it again or if the files I'm reading are very large file in which case the file handle method is faster than a simple FileRead()

$hFile = FileOpen($File, 2)

FileWrite($hFile, MyFunc($sStr))

FileClose($hFile)

One of the reasons I don't bother with the FileOpen() when doing a read is because I will often write functions like

Func MyFunc($sStr)
   If FileExists($sStr) Then $sStr = FileRead($sStr)
 ;; Do some string manipulation here
   Return $sStr
EndFunc

StringInStr does not, by itself return a string. It does return the starting position of the given string or returns 0 if the string can not be found. It can be used as a reference point in other string functions though.

$sStr = "This is some string of text."
MsgBox(4096, "RESULTS", StringMid($sStr, StringInStr($sStr, "some")))

Without using an array, you can not RETURN the reults of a RegExp. It can be used to verify a string though. Much like StringInStr()

If StringRegExp($sStr, "(?i)this\s.*\.") Then ;; do whatever.

That call could also be written as

If StringRegExp($sStr, "(?i)this\s.*\.", 0)

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Share this post


Link to post
Share on other sites

Ok I have that finished now i figured it out thank you for the help. I am now on a new issue with stringregexp and perhaps you could help me once more.

i want to return a complete "<table> .*? </table>" from a string using stringregexp the problem is that the table is on multiple lines and the number of @cr in the table very, Is this possible?

if not what function should i use.

thank you.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0