Jump to content

Best way to do a variable StringReplace?


Damein
 Share

Recommended Posts

So I've successfully pulled 51 links from a site, replaced what I needed ect. ect.. but theres a more difficult replacement I need to do that I can't really figure out myself :)

This is one of the links that I start out with:

http://www.oneplace.com/ministries/paws-and-tales/listen/episode-12-the-honey-buzz-principlegreed-steals-our-joy-138329.html

I've successfully replaced that to what I want and it comes out:

http://www.oneplace.com/player/paws-and-tales/episode-12-the-honey-buzz-principlegreed-steals-our-joy-138329.html

Now, I need to be able to get that to just display:

The Honey Buzz Principle

I can get it up until Episode- but nothing after that ;)

And the problem is, I need to do that for the following list:

http://www.oneplace.com/player/paws-and-tales/episode-12-the-honey-buzz-principlegreed-steals-our-joy-138329.html
http://www.oneplace.com/player/paws-and-tales/episode-11-a-race-against-timeputting-others-first-138328.html
http://www.oneplace.com/player/paws-and-tales/episode-10-the-lighthouseour-conscience-is-from-god-137373.html
http://www.oneplace.com/player/paws-and-tales/god-with-the-wind-127055.html
http://www.oneplace.com/player/paws-and-tales/a-closer-look-127054.html
http://www.oneplace.com/player/paws-and-tales/correction-course-127053.html
http://www.oneplace.com/player/paws-and-tales/standing-alone-127052.html
http://www.oneplace.com/player/paws-and-tales/episode-5-the-princessprayer-126136.html
http://www.oneplace.com/player/paws-and-tales/episode-4-high-noonovercoming-fear-126135.html
http://www.oneplace.com/player/paws-and-tales/episode-3-to-have-and-give-notsharing-126134.html
http://www.oneplace.com/player/paws-and-tales/episode-2-grace-to-hughgrace-126133.html
http://www.oneplace.com/player/paws-and-tales/episode-48-the-island-of-nedtwo-are-better-than-one-120111.html
http://www.oneplace.com/player/paws-and-tales/episode-47-the-story-of-esther-part-3god-will-never-leave-those-who-trust-him-120110.html
http://www.oneplace.com/player/paws-and-tales/episode-46-the-story-of-esther-part-2god-has-a-plan-for-our-lives-120109.html
http://www.oneplace.com/player/paws-and-tales/episode-45-the-story-of-esther-part-1god-causes-all-things-to-work-together-for-good-120108.html
http://www.oneplace.com/player/paws-and-tales/episode-44-the-plans-i-havegod-is-our-rock-in-times-of-change-120107.html
http://www.oneplace.com/player/paws-and-tales/episode-40-the-least-of-allall-things-are-possible-with-god-114389.html
http://www.oneplace.com/player/paws-and-tales/episode-39-miss-helga-grisselshowing-grace-to-others-114388.html
http://www.oneplace.com/player/paws-and-tales/episode-38-cj-ahabenvy-114387.html
http://www.oneplace.com/player/paws-and-tales/episode-62-shadow-valley-part-6our-future-is-in-christ-114386.html
http://www.oneplace.com/player/paws-and-tales/shadow-valley-part-5-103706.html
http://www.oneplace.com/player/paws-and-tales/shadow-valley-part-4-103705.html
http://www.oneplace.com/player/paws-and-tales/shadow-valley-part-3-103704.html
http://www.oneplace.com/player/paws-and-tales/shadow-valley-part-2-103703.html
http://www.oneplace.com/player/paws-and-tales/shadow-valley-part-1-103702.html
http://www.oneplace.com/player/paws-and-tales/goliath-100399.html
http://www.oneplace.com/player/paws-and-tales/the-tribe-99157.html
http://www.oneplace.com/player/paws-and-tales/whose-name-is-jealous-98125.html
http://www.oneplace.com/player/paws-and-tales/im-a-believer-97342.html
http://www.oneplace.com/player/paws-and-tales/cylinder-137k-95248.html
http://www.oneplace.com/player/paws-and-tales/the-gift-95247.html
http://www.oneplace.com/player/paws-and-tales/the-grecian-urn-94330.html
http://www.oneplace.com/player/paws-and-tales/eye-of-the-tiger-93150.html
http://www.oneplace.com/player/paws-and-tales/the-dedication-92508.html
http://www.oneplace.com/player/paws-and-tales/the-gift-85534.html
http://www.oneplace.com/player/paws-and-tales/the-grecian-urn-85533.html
http://www.oneplace.com/player/paws-and-tales/if-the-tooth-be-known-85532.html
http://www.oneplace.com/player/paws-and-tales/the-scarlet-stain-85531.html
http://www.oneplace.com/player/paws-and-tales/plans-in-the-breaking-85530.html
http://www.oneplace.com/player/paws-and-tales/a-pirates-life-85529.html
http://www.oneplace.com/player/paws-and-tales/the-hullabaloo-at-hunker-hill-85528.html
http://www.oneplace.com/player/paws-and-tales/blinded-by-the-sight-85527.html
http://www.oneplace.com/player/paws-and-tales/the-road-to-christmas-85526.html
http://www.oneplace.com/player/paws-and-tales/every-good-thing-85525.html
http://www.oneplace.com/player/paws-and-tales/true-riches-85524.html
http://www.oneplace.com/player/paws-and-tales/the-perfect-christmas-gift-85523.html
http://www.oneplace.com/player/paws-and-tales/stacis-dilemma-67857.html
http://www.oneplace.com/player/paws-and-tales/tiffany-cometh-67856.html
http://www.oneplace.com/player/paws-and-tales/a-conscious-effort-67855.html
http://www.oneplace.com/player/paws-and-tales/hold-the-anchovies-67854.html
http://www.oneplace.com/player/paws-and-tales/episode-14-the-great-go-cart-racecooperation-67853.html
http://www.oneplace.com/player/paws-and-tales/episode-13-snake-oil-67852.html

All of those names.. is this even possible?

Thanks ;)

MCR.jpg?t=1286371579

Most recent sig. I made

Quick Launcher W/ Profiles Topic Movie Database Topic & Website | LiveStreamer Pro Website | YouTube Stand-Alone Playlist Manager: Topic | Weather Desktop Widget: Topic | Flash Memory Game: Topic | Volume Control With Mouse / iTunes Hotkeys: Topic | Weather program: Topic | Paws & Tales radio drama podcast mini-player: Topic | Quick Math Calculations: Topic

Link to comment
Share on other sites

*Bump*

MCR.jpg?t=1286371579

Most recent sig. I made

Quick Launcher W/ Profiles Topic Movie Database Topic & Website | LiveStreamer Pro Website | YouTube Stand-Alone Playlist Manager: Topic | Weather Desktop Widget: Topic | Flash Memory Game: Topic | Volume Control With Mouse / iTunes Hotkeys: Topic | Weather program: Topic | Paws & Tales radio drama podcast mini-player: Topic | Quick Math Calculations: Topic

Link to comment
Share on other sites

This looks like a job for captain regexp!

I'll have a try, but only if you promise not to bump within 24h anymore. ;)

You'll find that it's almost impossible to exclude the sub-title as there is almost no way to figure out where the title ends and the sub-title starts. (now you know someone will solve it to prove me wrong)

You could compare the words against a dictionairy file, look for a invalid word and figure you what two words it could be a contraction of, but If you really want the title displayed correctly I'd advise you to extract it from the website.

This will strip the url parts and replace the dashes though:

#include <Array.au3>
#include <File.au3>

Local $aUrls
_FileReadToArray("test.log", $aUrls)
;assuming $aUrls has all the Urls and a count in element 0
$sPrefix = "\Qhttp://www.oneplace.com/player/paws-and-tales/\E" ;a quote of the startstring
$sPossible = "(?:episode-\d+-){0,1}" ;exclude "episode-:digits:- if it exists
$sCapture = "(.*)" ;capture anything until
$sEnd = "-\d+.html" ;a dash, some digits followed by .html
$sPattern = $sPrefix & $sPossible & $sCapture & $sEnd ;assemble the pattern
For $i = 1 To $aUrls[0]
    $aReturn = StringRegExp($aUrls[$i],$sPattern,1) ;use the pattern on each entry
    If @error Then ContinueLoop ;skip entries that do not match the pattern. (should be none)
    $aUrls[$i] = _CapsnDashes($aReturn[0])
Next
_ArrayDisplay($aUrls) ;Now we have all the names, but they are followed by the sub-title

Func _CapsnDashes($sSource)
    Local $aSource = StringSplit($sSource,"-",2)
    Local $sReturn = ""
    For $sWord In $aSource
    $sReturn &= StringUpper(StringLeft($sWord,1)) & StringTrimLeft($sWord,1) & " "
    Next
    Return StringTrimRight($sReturn,1)
EndFunc

Edit: Added a function to handle capitalisation.

Edit2: now uses the same test file as UEZ

Edited by Tvern
Link to comment
Share on other sites

Try this:

#include <Array.au3>
#Include <String.au3>

$hFile = FileOpen("Test.log")
While 1
    $line = FileReadLine($hFile)
    If @error = -1 Then ExitLoop
    $t = _StringProper(StringReplace(StringRegExpReplace($line, "(?i).+\/(episode-\d+-)*(.+)-\d+.html", "$2"), "-", " "))
    ConsoleWrite($t & @CRLF)
Wend
FileClose($hFile)

Test.log content is the text from the code box!

Thanks to Oscar for helping me ;)

Br,

UEZ

Edited by UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

Alright, this is really close, thanks a lot UEZ.

A couple of them didn't go right:

http://www.oneplace.com/player/paws-and-tales/episode-12-the-honey-buzz-principlegreed-steals-our-joy-138329.html

Comes out as: The Honey Buzz PrincipleGreed Steals Our Joy.

Just a few of them do this, is there a reason for this? Thanks a bunch!

I would try and do it myself.. but I have NO idea what all of this is:

(?i).+\/(episode-\d+-)*(.+)-\d+.htm
Since some of that isn't in the text document, lol ;) Edited by Damein

MCR.jpg?t=1286371579

Most recent sig. I made

Quick Launcher W/ Profiles Topic Movie Database Topic & Website | LiveStreamer Pro Website | YouTube Stand-Alone Playlist Manager: Topic | Weather Desktop Widget: Topic | Flash Memory Game: Topic | Volume Control With Mouse / iTunes Hotkeys: Topic | Weather program: Topic | Paws & Tales radio drama podcast mini-player: Topic | Quick Math Calculations: Topic

Link to comment
Share on other sites

http://www.oneplace.com/player/paws-and-...rinciplegreed-steals-our-joy-138329.html

Comes out as: The Honey Buzz PrincipleGreed Steals Our Joy.

Just a few of them do this, is there a reason for this? Thanks a bunch!

Read my comment on sub-titles. It's almost impossible to recognognise where they start, so you can't remoev them reliably.

I've tried downloading the titles by feeding the urls through InetRead, but it seems like the website is pretty slow to respond. I tried to do all 53 downloads simultaniously to speed things up. (tries this using WinHttp, TCP and Inetget, which was the fastest) But it was still slow. (5-15 sec for 52 titles)

I would try and do it myself.. but I have NO idea what all of this is:

(?i).+\/(episode-\d+-)*(.+)-\d+.htm

Since some of that isn't in the text document, lol ;)

You can look at StringRegExp in the helpfile. (StringRegExpReplace is related)

Regular expressions allows you to search for matching substrings, based on patterns, rather than exact searchstrings.

The PCRE toolkit by GEOSoft offers a nice enviroment to practice patterns on. There is a link in his signature.

You can also take a look at the pattern in my example which I broke up in distinct parts, making it easier to see what each does.

Edit: Yours is a bit cleaner UEZ, although I think you don't need to escape the slash in your pattern. I wasn't aware of the _StringProper function. Could have saved me some work.

Edited by Tvern
Link to comment
Share on other sites

Alright, this is really close, thanks a lot UEZ.

A couple of them didn't go right:

http://www.oneplace.com/player/paws-and-tales/episode-12-the-honey-buzz-principlegreed-steals-our-joy-138329.html

Comes out as: The Honey Buzz PrincipleGreed Steals Our Joy.

Just a few of them do this, is there a reason for this? Thanks a bunch!

I would try and do it myself.. but I have NO idea what all of this is:

(?i).+\/(episode-\d+-)*(.+)-\d+.htm
Since some of that isn't in the text document, lol ;)

I don't know how to decide that http://www.oneplace.com/player/paws-and-tales/episode-12-the-honey-buzz-principlegreed-steals-our-joy-138329.html should become The Honey Buzz Principle! For me it is not clear how your "algorithm" is working!

Regular expressions are really not easy to understand and you have to work with it a lot of time to understand the logic behind. Follow the advice from Tvern!

Br,

UEZ

Please don't send me any personal message and ask for support! I will not reply!

Selection of finest graphical examples at Codepen.io

The own fart smells best!
Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!
¯\_(ツ)_/¯  ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ

Link to comment
Share on other sites

Based on the link in your other topic I came up with this method to get the names that go with each link:

#include <array.au3>
#include <file.au3>
Global $aFile, $aLinks
Global $sFilePath, $sSource
$sFilePath = "Linkss.txt"
$sSource = BinaryToString(InetRead("http://www.oneplace.com/ministries/paws-and-tales/listen/broadcast-archives.html")) ;download the page source
$aLinks = StringRegExp($sSource,'(?i)(?s)<h5><a href="(.*?)">(.*?)</a></h5>',3) ;put all matching links in an array
If Not IsArray($aLinks) Then ;check if links where found
    MsgBox(0,"error","No links found!")
    Exit
EndIf

Global $aLinkTitles[UBound($aLinks)/2][2]
For $i = 0 To UBound($aLinks) -1 Step 2
    $aLinkTitles[$i/2][0] = StringReplace($aLinks[$i],"/ministries/paws-and-tales/listen/","/player/paws-and-tales/")
    $aLinkTitles[$i/2][1] = StringRegExpReplace($aLinks[$i+1],"(?i)(?:Episode #\d+: ){0,1}(.*?)(?:â€.*){0,1}","$1")
Next
_ArrayDisplay($aLinkTitles)

I think that should get you on your way.

You might want to consider doing something with the additional information on the page though.

Also you'll have to come up with a way to store this data effeciently. An ini file might do the trick.

Link to comment
Share on other sites

Wow, that's nice Tvern.

As for the saving of the data.. I can't seem to get it to work.. using the var $aLinkTitles seems to come up empty unless using in _ArrayDisplay.

Here is what I changed it to:

#include <array.au3>
#include <file.au3>
Global $aFile, $aLinks
Global $sFilePath, $sSource, $IniFilePath
$sFilePath = "Test.txt"
$IniFilePath = "Names.ini"
$sSource = BinaryToString(InetRead("http://www.oneplace.com/ministries/paws-and-tales/listen/broadcast-archives.html")) ;download the page source
$aLinks = StringRegExp($sSource,'(?i)(?s)<h5><a href="(.*?)">(.*?)</a></h5>',3) ;put all matching links in an array
If Not IsArray($aLinks) Then ;check if links where found
    MsgBox(0,"error","No links found!")
    Exit
EndIf

Global $aLinkTitles[UBound($aLinks)/2][2]
For $i = 0 To UBound($aLinks) -1 Step 2
    $aLinkTitles[$i/2][0] = StringReplace($aLinks[$i],"/ministries/paws-and-tales/listen/","/player/paws-and-tales/")
    $aLinkTitles[$i/2][1] = StringRegExpReplace($aLinks[$i+1],"(?i)(?:Episode #\d+: ){0,1}(.*?)(?:â€.*){0,1}","$1")
Next
IniWrite($IniFilePath, "Names", "Episodes", $aLinkTitles)

And that just puts: Episodes= in the ini file, no names.

I'll look into it further, but so far this is good. Thanks ;)

MCR.jpg?t=1286371579

Most recent sig. I made

Quick Launcher W/ Profiles Topic Movie Database Topic & Website | LiveStreamer Pro Website | YouTube Stand-Alone Playlist Manager: Topic | Weather Desktop Widget: Topic | Flash Memory Game: Topic | Volume Control With Mouse / iTunes Hotkeys: Topic | Weather program: Topic | Paws & Tales radio drama podcast mini-player: Topic | Quick Math Calculations: Topic

Link to comment
Share on other sites

Arrays don't display as text if you refer to them without using an index. Each index is like a normal variable which can hold any data a normal variable can hold.

IniWriteSection will take a 2D array and write it to a section, while IniWrite writes one value at the time. (excluding creating the section and key if needed)

If you want to use Iniwrite to write the file, you have to use it in a loop and perform an iniwrite for each element.

The benefit of iniwritesection is that it is probably a little faster and that you can retrieve the data with just another IniReadSection.

IniWrite has the benefit that you can save allot of data about each title. I've made an example that just stores the Url, but you could add a key to store a tooltip, summary, rating, or whatever you want.

To read the values from the IniWrite sections back, you have to use IniReadSectionNames, and then an IniRead for each section to return the adress.

This script has an example of both methods, just pick the layout you like best, or come up with another layout yourself.

#include <array.au3>
#include <file.au3>
Global $aFile, $aLinks
Global $IniFilePath, $IniFilePath2, $sSource, $IniFilePath
$IniFilePath = "Names.ini"
$IniFilePath2 = "Names2.ini"
$sSource = BinaryToString(InetRead("http://www.oneplace.com/ministries/paws-and-tales/listen/broadcast-archives.html")) ;download the page source
$aLinks = StringRegExp($sSource,'(?i)(?s)<h5><a href="(.*?)">(.*?)</a></h5>',3) ;put all matching links in an array
If Not IsArray($aLinks) Then ;check if links where found
    MsgBox(0,"error","No links found!")
    Exit
EndIf

Global $aLinkTitles[UBound($aLinks)/2][2]
For $i = 0 To UBound($aLinks) -1 Step 2
    $aLinkTitles[$i/2][0] = StringReplace($aLinks[$i],"/ministries/paws-and-tales/listen/","/player/paws-and-tales/")
    $aLinkTitles[$i/2][1] = StringRegExpReplace($aLinks[$i+1],"(?i)(?:Episode #\d+: ){0,1}(.*?)(?:â€.*){0,1}","$1")
    IniWrite($IniFilePath2, $aLinkTitles[$i/2][1], "adress", $aLinkTitles[$i/2][0]) ;this saves each item as a seperate section.
Next
IniWriteSection($IniFilePath, "Links and Names", $aLinkTitles) ;this saves all itemes in one section.

Ps: I'm not sure if there is a limit in how long section names, key names, or key values can be. But it works, so I figure it's ok.

Edit: An additional benefit when using inifiles is that you can't create double entries, so if you append to the file, existing values will be overwritten with the same value, while new values are added.

Edited by Tvern
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...