Jump to content

Searching String for Multiple Occurring


bvr
 Share

Recommended Posts

Basically I gather data from google trends rss all in one string. I tell it to gather everything from the "<ol" so it fetches the ordered list of the 10 trending keywords all on a different line. The problem is that is all one string. I want to be able to search the string for multiple occurring words or strings of words.

Of course there won't be multiple trends in the first hour, but I plan on gathering keywords every hour and comparing new keywords to old ones and finding the matches and separating them. It would be easy to "search" for keywords, the problem is I never know what the keyword is going to be, so I can't really specify what to search for. Could I search each line separate from the string, or make each line a sub string of a string, or use an array? Kind of confused on how to handle the data.

Link to comment
Share on other sites

  • Moderators

Hi, bvr. You're not the only one confused here :) Hard to give suggestions on how to perform your search when you state you don't know what you'll be searching for.

"Profanity is the last vestige of the feeble mind. For the man who cannot express himself forcibly through intellect must do so through shock and awe" - Spencer W. Kimball

How to get your question answered on this forum!

Link to comment
Share on other sites

Local $Url = 'http://www.google.com/trends/hottrends/atom/hourly'
Local $Html = BinaryToString(InetRead($Url))


;Get the <ol> code from google
$Html = StringMid($Html,StringInStr($Html,'<ol'))


;Remove the javascripts
$Html = StringRegExpReplace($Html,'(?s)(?i)<script.+?script>','')


;Retrieve the visible text
Local $Text = StringRegExpReplace($Html,'(?s)<.+?>','')

;gets rid of the "]]>" thats left when we get visible text
$Text1 = StringReplace($Text, "]]>", " ")


;Display the visible text
FileWrite("keywords.txt",@CRLF&$Text1)

So after I pull the string of keywords, it saves is in a long string in a text file. I want to break that string up into 1 keyword per line, then be able to search through those keywords everytime I get the new keyword list, and place the multipules together to see which keywords are valuable. Then after I figure that out, I'm going to check competition for keywords and strings. But this is meant for SEO, I thought it would be fun to try after I found out how to pull data from the web.

Edited by bvr
Link to comment
Share on other sites

StringSplit() gets you a nice array to play with. Am not sure I'm clear enough on your goal to suggest whether another array column would be useful for keeping counts of how many times the strings were encountered, or if you'd be wanting to merge newer scans into this array, etc.

#include <Array.au3>
Local $Url = 'http://www.google.com/trends/hottrends/atom/hourly'
Local $Html = BinaryToString(InetRead($Url))

;Get the <ol> code from google
$Html = StringMid($Html,StringInStr($Html,'<ol'))

;Remove the javascripts
$Html = StringRegExpReplace($Html,'(?s)(?i)<script.+?script>','')

;Retrieve the visible text
Local $Text = StringRegExpReplace($Html,'(?s)<.+?>','')
;gets rid of the "]]>" thats left when we get visible text
$Text1 = StringStripWS(StringReplace($Text, "]]>", ""), 3)
$aText = StringSplit($text1, @CRLF)

;Display the visible text
_ArrayDisplay($aText)
;FileWrite("keywords.txt",@CRLF&$Text1)
Edited by Spiff59
Link to comment
Share on other sites

here is googles code: view-source:http://www.google.com/trends/hottrends/atom/hourly

The only way to tell the difference in keywords would be the classes. Some are medium, no change, low and so on. So maybe I could just grab the medium or low keywords and then search for occurring with new scans?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...