Damein Posted October 21, 2010 Share Posted October 21, 2010 So I've successfully pulled 51 links from a site, replaced what I needed ect. ect.. but theres a more difficult replacement I need to do that I can't really figure out myself This is one of the links that I start out with: http://www.oneplace.com/ministries/paws-and-tales/listen/episode-12-the-honey-buzz-principlegreed-steals-our-joy-138329.html I've successfully replaced that to what I want and it comes out: http://www.oneplace.com/player/paws-and-tales/episode-12-the-honey-buzz-principlegreed-steals-our-joy-138329.html Now, I need to be able to get that to just display: The Honey Buzz Principle I can get it up until Episode- but nothing after that And the problem is, I need to do that for the following list: expandcollapse popuphttp://www.oneplace.com/player/paws-and-tales/episode-12-the-honey-buzz-principlegreed-steals-our-joy-138329.html http://www.oneplace.com/player/paws-and-tales/episode-11-a-race-against-timeputting-others-first-138328.html http://www.oneplace.com/player/paws-and-tales/episode-10-the-lighthouseour-conscience-is-from-god-137373.html http://www.oneplace.com/player/paws-and-tales/god-with-the-wind-127055.html http://www.oneplace.com/player/paws-and-tales/a-closer-look-127054.html http://www.oneplace.com/player/paws-and-tales/correction-course-127053.html http://www.oneplace.com/player/paws-and-tales/standing-alone-127052.html http://www.oneplace.com/player/paws-and-tales/episode-5-the-princessprayer-126136.html http://www.oneplace.com/player/paws-and-tales/episode-4-high-noonovercoming-fear-126135.html http://www.oneplace.com/player/paws-and-tales/episode-3-to-have-and-give-notsharing-126134.html http://www.oneplace.com/player/paws-and-tales/episode-2-grace-to-hughgrace-126133.html http://www.oneplace.com/player/paws-and-tales/episode-48-the-island-of-nedtwo-are-better-than-one-120111.html http://www.oneplace.com/player/paws-and-tales/episode-47-the-story-of-esther-part-3god-will-never-leave-those-who-trust-him-120110.html http://www.oneplace.com/player/paws-and-tales/episode-46-the-story-of-esther-part-2god-has-a-plan-for-our-lives-120109.html http://www.oneplace.com/player/paws-and-tales/episode-45-the-story-of-esther-part-1god-causes-all-things-to-work-together-for-good-120108.html http://www.oneplace.com/player/paws-and-tales/episode-44-the-plans-i-havegod-is-our-rock-in-times-of-change-120107.html http://www.oneplace.com/player/paws-and-tales/episode-40-the-least-of-allall-things-are-possible-with-god-114389.html http://www.oneplace.com/player/paws-and-tales/episode-39-miss-helga-grisselshowing-grace-to-others-114388.html http://www.oneplace.com/player/paws-and-tales/episode-38-cj-ahabenvy-114387.html http://www.oneplace.com/player/paws-and-tales/episode-62-shadow-valley-part-6our-future-is-in-christ-114386.html http://www.oneplace.com/player/paws-and-tales/shadow-valley-part-5-103706.html http://www.oneplace.com/player/paws-and-tales/shadow-valley-part-4-103705.html http://www.oneplace.com/player/paws-and-tales/shadow-valley-part-3-103704.html http://www.oneplace.com/player/paws-and-tales/shadow-valley-part-2-103703.html http://www.oneplace.com/player/paws-and-tales/shadow-valley-part-1-103702.html http://www.oneplace.com/player/paws-and-tales/goliath-100399.html http://www.oneplace.com/player/paws-and-tales/the-tribe-99157.html http://www.oneplace.com/player/paws-and-tales/whose-name-is-jealous-98125.html http://www.oneplace.com/player/paws-and-tales/im-a-believer-97342.html http://www.oneplace.com/player/paws-and-tales/cylinder-137k-95248.html http://www.oneplace.com/player/paws-and-tales/the-gift-95247.html http://www.oneplace.com/player/paws-and-tales/the-grecian-urn-94330.html http://www.oneplace.com/player/paws-and-tales/eye-of-the-tiger-93150.html http://www.oneplace.com/player/paws-and-tales/the-dedication-92508.html http://www.oneplace.com/player/paws-and-tales/the-gift-85534.html http://www.oneplace.com/player/paws-and-tales/the-grecian-urn-85533.html http://www.oneplace.com/player/paws-and-tales/if-the-tooth-be-known-85532.html http://www.oneplace.com/player/paws-and-tales/the-scarlet-stain-85531.html http://www.oneplace.com/player/paws-and-tales/plans-in-the-breaking-85530.html http://www.oneplace.com/player/paws-and-tales/a-pirates-life-85529.html http://www.oneplace.com/player/paws-and-tales/the-hullabaloo-at-hunker-hill-85528.html http://www.oneplace.com/player/paws-and-tales/blinded-by-the-sight-85527.html http://www.oneplace.com/player/paws-and-tales/the-road-to-christmas-85526.html http://www.oneplace.com/player/paws-and-tales/every-good-thing-85525.html http://www.oneplace.com/player/paws-and-tales/true-riches-85524.html http://www.oneplace.com/player/paws-and-tales/the-perfect-christmas-gift-85523.html http://www.oneplace.com/player/paws-and-tales/stacis-dilemma-67857.html http://www.oneplace.com/player/paws-and-tales/tiffany-cometh-67856.html http://www.oneplace.com/player/paws-and-tales/a-conscious-effort-67855.html http://www.oneplace.com/player/paws-and-tales/hold-the-anchovies-67854.html http://www.oneplace.com/player/paws-and-tales/episode-14-the-great-go-cart-racecooperation-67853.html http://www.oneplace.com/player/paws-and-tales/episode-13-snake-oil-67852.html All of those names.. is this even possible? Thanks Most recent sig. I made Quick Launcher W/ Profiles Topic Movie Database Topic & Website | LiveStreamer Pro Website | YouTube Stand-Alone Playlist Manager: Topic | Weather Desktop Widget: Topic | Flash Memory Game: Topic | Volume Control With Mouse / iTunes Hotkeys: Topic | Weather program: Topic | Paws & Tales radio drama podcast mini-player: Topic | Quick Math Calculations: Topic Link to comment Share on other sites More sharing options...
Damein Posted October 21, 2010 Author Share Posted October 21, 2010 *Bump* Most recent sig. I made Quick Launcher W/ Profiles Topic Movie Database Topic & Website | LiveStreamer Pro Website | YouTube Stand-Alone Playlist Manager: Topic | Weather Desktop Widget: Topic | Flash Memory Game: Topic | Volume Control With Mouse / iTunes Hotkeys: Topic | Weather program: Topic | Paws & Tales radio drama podcast mini-player: Topic | Quick Math Calculations: Topic Link to comment Share on other sites More sharing options...
Tvern Posted October 21, 2010 Share Posted October 21, 2010 (edited) This looks like a job for captain regexp! I'll have a try, but only if you promise not to bump within 24h anymore. You'll find that it's almost impossible to exclude the sub-title as there is almost no way to figure out where the title ends and the sub-title starts. (now you know someone will solve it to prove me wrong) You could compare the words against a dictionairy file, look for a invalid word and figure you what two words it could be a contraction of, but If you really want the title displayed correctly I'd advise you to extract it from the website. This will strip the url parts and replace the dashes though: #include <Array.au3> #include <File.au3> Local $aUrls _FileReadToArray("test.log", $aUrls) ;assuming $aUrls has all the Urls and a count in element 0 $sPrefix = "\Qhttp://www.oneplace.com/player/paws-and-tales/\E" ;a quote of the startstring $sPossible = "(?:episode-\d+-){0,1}" ;exclude "episode-:digits:- if it exists $sCapture = "(.*)" ;capture anything until $sEnd = "-\d+.html" ;a dash, some digits followed by .html $sPattern = $sPrefix & $sPossible & $sCapture & $sEnd ;assemble the pattern For $i = 1 To $aUrls[0] $aReturn = StringRegExp($aUrls[$i],$sPattern,1) ;use the pattern on each entry If @error Then ContinueLoop ;skip entries that do not match the pattern. (should be none) $aUrls[$i] = _CapsnDashes($aReturn[0]) Next _ArrayDisplay($aUrls) ;Now we have all the names, but they are followed by the sub-title Func _CapsnDashes($sSource) Local $aSource = StringSplit($sSource,"-",2) Local $sReturn = "" For $sWord In $aSource $sReturn &= StringUpper(StringLeft($sWord,1)) & StringTrimLeft($sWord,1) & " " Next Return StringTrimRight($sReturn,1) EndFunc Edit: Added a function to handle capitalisation. Edit2: now uses the same test file as UEZ Edited October 22, 2010 by Tvern Link to comment Share on other sites More sharing options...
UEZ Posted October 21, 2010 Share Posted October 21, 2010 (edited) Try this:#include <Array.au3> #Include <String.au3> $hFile = FileOpen("Test.log") While 1 $line = FileReadLine($hFile) If @error = -1 Then ExitLoop $t = _StringProper(StringReplace(StringRegExpReplace($line, "(?i).+\/(episode-\d+-)*(.+)-\d+.html", "$2"), "-", " ")) ConsoleWrite($t & @CRLF) Wend FileClose($hFile)Test.log content is the text from the code box!Thanks to Oscar for helping me Br,UEZ Edited October 21, 2010 by UEZ Please don't send me any personal message and ask for support! I will not reply! Selection of finest graphical examples at Codepen.io The own fart smells best! ✌Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!¯\_(ツ)_/¯ ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ Link to comment Share on other sites More sharing options...
Damein Posted October 22, 2010 Author Share Posted October 22, 2010 (edited) Alright, this is really close, thanks a lot UEZ. A couple of them didn't go right: http://www.oneplace.com/player/paws-and-tales/episode-12-the-honey-buzz-principlegreed-steals-our-joy-138329.html Comes out as: The Honey Buzz PrincipleGreed Steals Our Joy. Just a few of them do this, is there a reason for this? Thanks a bunch! I would try and do it myself.. but I have NO idea what all of this is: (?i).+\/(episode-\d+-)*(.+)-\d+.htm Since some of that isn't in the text document, lol Edited October 22, 2010 by Damein Most recent sig. I made Quick Launcher W/ Profiles Topic Movie Database Topic & Website | LiveStreamer Pro Website | YouTube Stand-Alone Playlist Manager: Topic | Weather Desktop Widget: Topic | Flash Memory Game: Topic | Volume Control With Mouse / iTunes Hotkeys: Topic | Weather program: Topic | Paws & Tales radio drama podcast mini-player: Topic | Quick Math Calculations: Topic Link to comment Share on other sites More sharing options...
Tvern Posted October 22, 2010 Share Posted October 22, 2010 (edited) http://www.oneplace.com/player/paws-and-...rinciplegreed-steals-our-joy-138329.html Comes out as: The Honey Buzz PrincipleGreed Steals Our Joy. Just a few of them do this, is there a reason for this? Thanks a bunch! Read my comment on sub-titles. It's almost impossible to recognognise where they start, so you can't remoev them reliably. I've tried downloading the titles by feeding the urls through InetRead, but it seems like the website is pretty slow to respond. I tried to do all 53 downloads simultaniously to speed things up. (tries this using WinHttp, TCP and Inetget, which was the fastest) But it was still slow. (5-15 sec for 52 titles) I would try and do it myself.. but I have NO idea what all of this is: (?i).+\/(episode-\d+-)*(.+)-\d+.htm Since some of that isn't in the text document, lol You can look at StringRegExp in the helpfile. (StringRegExpReplace is related) Regular expressions allows you to search for matching substrings, based on patterns, rather than exact searchstrings. The PCRE toolkit by GEOSoft offers a nice enviroment to practice patterns on. There is a link in his signature. You can also take a look at the pattern in my example which I broke up in distinct parts, making it easier to see what each does. Edit: Yours is a bit cleaner UEZ, although I think you don't need to escape the slash in your pattern. I wasn't aware of the _StringProper function. Could have saved me some work. Edited October 22, 2010 by Tvern Link to comment Share on other sites More sharing options...
UEZ Posted October 22, 2010 Share Posted October 22, 2010 Alright, this is really close, thanks a lot UEZ. A couple of them didn't go right: http://www.oneplace.com/player/paws-and-tales/episode-12-the-honey-buzz-principlegreed-steals-our-joy-138329.html Comes out as: The Honey Buzz PrincipleGreed Steals Our Joy. Just a few of them do this, is there a reason for this? Thanks a bunch! I would try and do it myself.. but I have NO idea what all of this is: (?i).+\/(episode-\d+-)*(.+)-\d+.htm Since some of that isn't in the text document, lol I don't know how to decide that http://www.oneplace.com/player/paws-and-tales/episode-12-the-honey-buzz-principlegreed-steals-our-joy-138329.html should become The Honey Buzz Principle! For me it is not clear how your "algorithm" is working! Regular expressions are really not easy to understand and you have to work with it a lot of time to understand the logic behind. Follow the advice from Tvern! Br, UEZ Please don't send me any personal message and ask for support! I will not reply! Selection of finest graphical examples at Codepen.io The own fart smells best! ✌Her 'sikim hıyar' diyene bir avuç tuz alıp koşma!¯\_(ツ)_/¯ ٩(●̮̮̃•̃)۶ ٩(-̮̮̃-̃)۶ૐ Link to comment Share on other sites More sharing options...
Tvern Posted October 22, 2010 Share Posted October 22, 2010 Based on the link in your other topic I came up with this method to get the names that go with each link: #include <array.au3> #include <file.au3> Global $aFile, $aLinks Global $sFilePath, $sSource $sFilePath = "Linkss.txt" $sSource = BinaryToString(InetRead("http://www.oneplace.com/ministries/paws-and-tales/listen/broadcast-archives.html")) ;download the page source $aLinks = StringRegExp($sSource,'(?i)(?s)<h5><a href="(.*?)">(.*?)</a></h5>',3) ;put all matching links in an array If Not IsArray($aLinks) Then ;check if links where found MsgBox(0,"error","No links found!") Exit EndIf Global $aLinkTitles[UBound($aLinks)/2][2] For $i = 0 To UBound($aLinks) -1 Step 2 $aLinkTitles[$i/2][0] = StringReplace($aLinks[$i],"/ministries/paws-and-tales/listen/","/player/paws-and-tales/") $aLinkTitles[$i/2][1] = StringRegExpReplace($aLinks[$i+1],"(?i)(?:Episode #\d+: ){0,1}(.*?)(?:â€.*){0,1}","$1") Next _ArrayDisplay($aLinkTitles) I think that should get you on your way. You might want to consider doing something with the additional information on the page though. Also you'll have to come up with a way to store this data effeciently. An ini file might do the trick. Link to comment Share on other sites More sharing options...
Damein Posted October 22, 2010 Author Share Posted October 22, 2010 Wow, that's nice Tvern. As for the saving of the data.. I can't seem to get it to work.. using the var $aLinkTitles seems to come up empty unless using in _ArrayDisplay. Here is what I changed it to: #include <array.au3> #include <file.au3> Global $aFile, $aLinks Global $sFilePath, $sSource, $IniFilePath $sFilePath = "Test.txt" $IniFilePath = "Names.ini" $sSource = BinaryToString(InetRead("http://www.oneplace.com/ministries/paws-and-tales/listen/broadcast-archives.html")) ;download the page source $aLinks = StringRegExp($sSource,'(?i)(?s)<h5><a href="(.*?)">(.*?)</a></h5>',3) ;put all matching links in an array If Not IsArray($aLinks) Then ;check if links where found MsgBox(0,"error","No links found!") Exit EndIf Global $aLinkTitles[UBound($aLinks)/2][2] For $i = 0 To UBound($aLinks) -1 Step 2 $aLinkTitles[$i/2][0] = StringReplace($aLinks[$i],"/ministries/paws-and-tales/listen/","/player/paws-and-tales/") $aLinkTitles[$i/2][1] = StringRegExpReplace($aLinks[$i+1],"(?i)(?:Episode #\d+: ){0,1}(.*?)(?:â€.*){0,1}","$1") Next IniWrite($IniFilePath, "Names", "Episodes", $aLinkTitles) And that just puts: Episodes= in the ini file, no names. I'll look into it further, but so far this is good. Thanks Most recent sig. I made Quick Launcher W/ Profiles Topic Movie Database Topic & Website | LiveStreamer Pro Website | YouTube Stand-Alone Playlist Manager: Topic | Weather Desktop Widget: Topic | Flash Memory Game: Topic | Volume Control With Mouse / iTunes Hotkeys: Topic | Weather program: Topic | Paws & Tales radio drama podcast mini-player: Topic | Quick Math Calculations: Topic Link to comment Share on other sites More sharing options...
Tvern Posted October 22, 2010 Share Posted October 22, 2010 (edited) Arrays don't display as text if you refer to them without using an index. Each index is like a normal variable which can hold any data a normal variable can hold. IniWriteSection will take a 2D array and write it to a section, while IniWrite writes one value at the time. (excluding creating the section and key if needed) If you want to use Iniwrite to write the file, you have to use it in a loop and perform an iniwrite for each element. The benefit of iniwritesection is that it is probably a little faster and that you can retrieve the data with just another IniReadSection. IniWrite has the benefit that you can save allot of data about each title. I've made an example that just stores the Url, but you could add a key to store a tooltip, summary, rating, or whatever you want. To read the values from the IniWrite sections back, you have to use IniReadSectionNames, and then an IniRead for each section to return the adress. This script has an example of both methods, just pick the layout you like best, or come up with another layout yourself. #include <array.au3> #include <file.au3> Global $aFile, $aLinks Global $IniFilePath, $IniFilePath2, $sSource, $IniFilePath $IniFilePath = "Names.ini" $IniFilePath2 = "Names2.ini" $sSource = BinaryToString(InetRead("http://www.oneplace.com/ministries/paws-and-tales/listen/broadcast-archives.html")) ;download the page source $aLinks = StringRegExp($sSource,'(?i)(?s)<h5><a href="(.*?)">(.*?)</a></h5>',3) ;put all matching links in an array If Not IsArray($aLinks) Then ;check if links where found MsgBox(0,"error","No links found!") Exit EndIf Global $aLinkTitles[UBound($aLinks)/2][2] For $i = 0 To UBound($aLinks) -1 Step 2 $aLinkTitles[$i/2][0] = StringReplace($aLinks[$i],"/ministries/paws-and-tales/listen/","/player/paws-and-tales/") $aLinkTitles[$i/2][1] = StringRegExpReplace($aLinks[$i+1],"(?i)(?:Episode #\d+: ){0,1}(.*?)(?:â€.*){0,1}","$1") IniWrite($IniFilePath2, $aLinkTitles[$i/2][1], "adress", $aLinkTitles[$i/2][0]) ;this saves each item as a seperate section. Next IniWriteSection($IniFilePath, "Links and Names", $aLinkTitles) ;this saves all itemes in one section. Ps: I'm not sure if there is a limit in how long section names, key names, or key values can be. But it works, so I figure it's ok. Edit: An additional benefit when using inifiles is that you can't create double entries, so if you append to the file, existing values will be overwritten with the same value, while new values are added. Edited October 22, 2010 by Tvern Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now