Best way to do a variable StringReplace?


So I've successfully pulled 51 links from a site, replaced what I needed ect. ect.. but theres a more difficult replacement I need to do that I can't really figure out myself :)

This is one of the links that I start out with:


I've successfully replaced that to what I want and it comes out:


Now, I need to be able to get that to just display:

The Honey Buzz Principle

I can get it up until Episode- but nothing after that ;)

And the problem is, I need to do that for the following list:


All of those names.. is this even possible?

Thanks ;)


This looks like a job for captain regexp!

I'll have a try, but only if you promise not to bump within 24h anymore. ;)

You'll find that it's almost impossible to exclude the sub-title as there is almost no way to figure out where the title ends and the sub-title starts. (now you know someone will solve it to prove me wrong)

You could compare the words against a dictionairy file, look for a invalid word and figure you what two words it could be a contraction of, but If you really want the title displayed correctly I'd advise you to extract it from the website.

This will strip the url parts and replace the dashes though:

#include <Array.au3>
#include <File.au3>

Local $aUrls
_FileReadToArray("test.log", $aUrls)
;assuming $aUrls has all the Urls and a count in element 0
$sPrefix = "\Qhttp://www.oneplace.com/player/paws-and-tales/\E" ;a quote of the startstring
$sPossible = "(?:episode-\d+-){0,1}" ;exclude "episode-:digits:- if it exists
$sCapture = "(.*)" ;capture anything until
$sEnd = "-\d+.html" ;a dash, some digits followed by .html
$sPattern = $sPrefix & $sPossible & $sCapture & $sEnd ;assemble the pattern
For $i = 1 To $aUrls[0]
    $aReturn = StringRegExp($aUrls[$i],$sPattern,1) ;use the pattern on each entry
    If @error Then ContinueLoop ;skip entries that do not match the pattern. (should be none)
    $aUrls[$i] = _CapsnDashes($aReturn[0])
_ArrayDisplay($aUrls) ;Now we have all the names, but they are followed by the sub-title

Func _CapsnDashes($sSource)
    Local $aSource = StringSplit($sSource,"-",2)
    Local $sReturn = ""
    For $sWord In $aSource
    $sReturn &= StringUpper(StringLeft($sWord,1)) & StringTrimLeft($sWord,1) & " "
    Return StringTrimRight($sReturn,1)

Edit: Added a function to handle capitalisation.

Edit2: now uses the same test file as UEZ

Edited by Tvern
Try this:

#include <Array.au3>
#Include <String.au3>

$hFile = FileOpen("Test.log")
While 1
    $line = FileReadLine($hFile)
    If @error = -1 Then ExitLoop
    $t = _StringProper(StringReplace(StringRegExpReplace($line, "(?i).+\/(episode-\d+-)*(.+)-\d+.html", "$2"), "-", " "))
    ConsoleWrite($t & @CRLF)

Test.log content is the text from the code box!

Thanks to Oscar for helping me ;)



Edited by UEZ

Alright, this is really close, thanks a lot UEZ.

A couple of them didn't go right:


Comes out as: The Honey Buzz PrincipleGreed Steals Our Joy.

Just a few of them do this, is there a reason for this? Thanks a bunch!

I would try and do it myself.. but I have NO idea what all of this is:

Since some of that isn't in the text document, lol ;) Edited by Damein


Comes out as: The Honey Buzz PrincipleGreed Steals Our Joy.

Just a few of them do this, is there a reason for this? Thanks a bunch!

Read my comment on sub-titles. It's almost impossible to recognognise where they start, so you can't remoev them reliably.

I've tried downloading the titles by feeding the urls through InetRead, but it seems like the website is pretty slow to respond. I tried to do all 53 downloads simultaniously to speed things up. (tries this using WinHttp, TCP and Inetget, which was the fastest) But it was still slow. (5-15 sec for 52 titles)

I would try and do it myself.. but I have NO idea what all of this is:


Since some of that isn't in the text document, lol ;)

You can look at StringRegExp in the helpfile. (StringRegExpReplace is related)

Regular expressions allows you to search for matching substrings, based on patterns, rather than exact searchstrings.

The PCRE toolkit by GEOSoft offers a nice enviroment to practice patterns on. There is a link in his signature.

You can also take a look at the pattern in my example which I broke up in distinct parts, making it easier to see what each does.

Edit: Yours is a bit cleaner UEZ, although I think you don't need to escape the slash in your pattern. I wasn't aware of the _StringProper function. Could have saved me some work.

Edited by Tvern
Alright, this is really close, thanks a lot UEZ.

A couple of them didn't go right:


Comes out as: The Honey Buzz PrincipleGreed Steals Our Joy.

Just a few of them do this, is there a reason for this? Thanks a bunch!

I would try and do it myself.. but I have NO idea what all of this is:

Since some of that isn't in the text document, lol ;)

I don't know how to decide that http://www.oneplace.com/player/paws-and-tales/episode-12-the-honey-buzz-principlegreed-steals-our-joy-138329.html should become The Honey Buzz Principle! For me it is not clear how your "algorithm" is working!

Regular expressions are really not easy to understand and you have to work with it a lot of time to understand the logic behind. Follow the advice from Tvern!



Based on the link in your other topic I came up with this method to get the names that go with each link:

#include <array.au3>
#include <file.au3>
Global $aFile, $aLinks
Global $sFilePath, $sSource
$sFilePath = "Linkss.txt"
$sSource = BinaryToString(InetRead("http://www.oneplace.com/ministries/paws-and-tales/listen/broadcast-archives.html")) ;download the page source
$aLinks = StringRegExp($sSource,'(?i)(?s)<h5><a href="(.*?)">(.*?)</a></h5>',3) ;put all matching links in an array
If Not IsArray($aLinks) Then ;check if links where found
    MsgBox(0,"error","No links found!")

Global $aLinkTitles[UBound($aLinks)/2][2]
For $i = 0 To UBound($aLinks) -1 Step 2
    $aLinkTitles[$i/2][0] = StringReplace($aLinks[$i],"/ministries/paws-and-tales/listen/","/player/paws-and-tales/")
    $aLinkTitles[$i/2][1] = StringRegExpReplace($aLinks[$i+1],"(?i)(?:Episode #\d+: ){0,1}(.*?)(?:â€.*){0,1}","$1")

I think that should get you on your way.

You might want to consider doing something with the additional information on the page though.

Also you'll have to come up with a way to store this data effeciently. An ini file might do the trick.

Wow, that's nice Tvern.

As for the saving of the data.. I can't seem to get it to work.. using the var $aLinkTitles seems to come up empty unless using in _ArrayDisplay.

Here is what I changed it to:

#include <array.au3>
#include <file.au3>
Global $aFile, $aLinks
Global $sFilePath, $sSource, $IniFilePath
$sFilePath = "Test.txt"
$IniFilePath = "Names.ini"
$sSource = BinaryToString(InetRead("http://www.oneplace.com/ministries/paws-and-tales/listen/broadcast-archives.html")) ;download the page source
$aLinks = StringRegExp($sSource,'(?i)(?s)<h5><a href="(.*?)">(.*?)</a></h5>',3) ;put all matching links in an array
If Not IsArray($aLinks) Then ;check if links where found
    MsgBox(0,"error","No links found!")

Global $aLinkTitles[UBound($aLinks)/2][2]
For $i = 0 To UBound($aLinks) -1 Step 2
    $aLinkTitles[$i/2][0] = StringReplace($aLinks[$i],"/ministries/paws-and-tales/listen/","/player/paws-and-tales/")
    $aLinkTitles[$i/2][1] = StringRegExpReplace($aLinks[$i+1],"(?i)(?:Episode #\d+: ){0,1}(.*?)(?:â€.*){0,1}","$1")
IniWrite($IniFilePath, "Names", "Episodes", $aLinkTitles)

And that just puts: Episodes= in the ini file, no names.

I'll look into it further, but so far this is good. Thanks ;)


Arrays don't display as text if you refer to them without using an index. Each index is like a normal variable which can hold any data a normal variable can hold.

IniWriteSection will take a 2D array and write it to a section, while IniWrite writes one value at the time. (excluding creating the section and key if needed)

If you want to use Iniwrite to write the file, you have to use it in a loop and perform an iniwrite for each element.

The benefit of iniwritesection is that it is probably a little faster and that you can retrieve the data with just another IniReadSection.

IniWrite has the benefit that you can save allot of data about each title. I've made an example that just stores the Url, but you could add a key to store a tooltip, summary, rating, or whatever you want.

To read the values from the IniWrite sections back, you have to use IniReadSectionNames, and then an IniRead for each section to return the adress.

This script has an example of both methods, just pick the layout you like best, or come up with another layout yourself.

#include <array.au3>
#include <file.au3>
Global $aFile, $aLinks
Global $IniFilePath, $IniFilePath2, $sSource, $IniFilePath
$IniFilePath = "Names.ini"
$IniFilePath2 = "Names2.ini"
$sSource = BinaryToString(InetRead("http://www.oneplace.com/ministries/paws-and-tales/listen/broadcast-archives.html")) ;download the page source
$aLinks = StringRegExp($sSource,'(?i)(?s)<h5><a href="(.*?)">(.*?)</a></h5>',3) ;put all matching links in an array
If Not IsArray($aLinks) Then ;check if links where found
    MsgBox(0,"error","No links found!")

Global $aLinkTitles[UBound($aLinks)/2][2]
For $i = 0 To UBound($aLinks) -1 Step 2
    $aLinkTitles[$i/2][0] = StringReplace($aLinks[$i],"/ministries/paws-and-tales/listen/","/player/paws-and-tales/")
    $aLinkTitles[$i/2][1] = StringRegExpReplace($aLinks[$i+1],"(?i)(?:Episode #\d+: ){0,1}(.*?)(?:â€.*){0,1}","$1")
    IniWrite($IniFilePath2, $aLinkTitles[$i/2][1], "adress", $aLinkTitles[$i/2][0]) ;this saves each item as a seperate section.
IniWriteSection($IniFilePath, "Links and Names", $aLinkTitles) ;this saves all itemes in one section.

Ps: I'm not sure if there is a limit in how long section names, key names, or key values can be. But it works, so I figure it's ok.

Edit: An additional benefit when using inifiles is that you can't create double entries, so if you append to the file, existing values will be overwritten with the same value, while new values are added.

Edited by Tvern
