XeroFx Posted April 18, 2012 Share Posted April 18, 2012 (edited) Ok, so im working on this project for school about companies and their headquarters location. To create either a heat map or a cluster map of information on where companies are headquartered, created, etc.I have used wikipedia to get the company names, what im looking to do is pull the headquarter location out of wikipedia's page.now i know that im going to need to use something like what somdcomputerguy posted #include <INet.au3> $aPasses = StringRegExp(_INetGetSource('http://www.generate-password.com'),"value=........", 3) MsgBox(0, "Generated Passwords", @TAB & StringReplace($aPasses[0],'value="', "") & ' : ' & StringReplace($aPasses[1],'value="', ""))However, i will modify the script to take whats in my clipboard and use it as the "website" in the source field... (when i press a hotkey)that part i got down, the issue im having is that i dont understand fully how StringRegExp works...Here is an example webpage of wikipedia that i would like to pull information out of: Wikipedia:AGCOI took a gander at their structure and there is no specific name for the headquarters, other than headquarters, however.. the string after can differ by MANY different letters and marks...Information: <tr class=""> <th scope="row" style="text-align: left;">Headquarters</th> <td class="label" style=""><a href="/wiki/Duluth,_Georgia" title="Duluth, Georgia">Duluth</a>, <a href="/wiki/Georgia_%28U.S._state%29" title="Georgia (U.S. state)">Georgia</a>, <a href="/wiki/USA" title="USA" class="mw-redirect">USA</a></td> </tr>The information that i would need would be: "Duluth, Georgia, USA"if someone would point me in the direction that i can take to understanding this better, possibly with some examples, or even if someone could write up a hint to what i need to do to get headquarters working, i am pretty confident that i can get others working.Thank you, if you need more information please let me know! Edited April 18, 2012 by XeroFx Link to comment Share on other sites More sharing options...
Realm Posted April 18, 2012 Share Posted April 18, 2012 Hello XeroFx, I am no expert with SRE either, and there probably is a better way to gather your information. However after testing this SRE on a the full page at Wiki, that you provided, this example worked as expected: $text = '<tr class="">' & @CRLF _ & '<th scope="row" style="text-align: left;">Headquarters</th>' & @CRLF _ & '<td class="label" style=""><a href="/wiki/Duluth,_Georgia" title="Duluth, Georgia">Duluth</a>, <a href="/wiki/Georgia_%28U.S._state%29" title="Georgia (U.S. state)">Georgia</a>, <a href="/wiki/USA" title="USA" class="mw-redirect">USA</a></td>' & @CRLF _ & '</tr>' $sre = StringRegExp($text, '<tr class="">rn<th (?:.*?)>Headquarters</th>rn<td (?:.*?)" title="(?:.*?)">(.*?)</a>, <a href="(?:.*?)" title="(?:.*?)">(.*?)</a>, <a href="(?:.*?)" title="(?:.*?)" class="(?:.*?)">(.*?)</a></td>rn</tr>', 3) If @error Then ConsoleWrite( '- Error: ' & @error &', Extended: ' & @extended & @LF ) _ArrayDisplay($sre) My Contributions: Unix Timestamp: Calculate Unix time, or seconds since Epoch, accounting for your local timezone and daylight savings time. RegEdit Jumper: A Small & Simple interface based on Yashied's Reg Jumper Function, for searching Hives in your registry. Link to comment Share on other sites More sharing options...
XeroFx Posted April 18, 2012 Author Share Posted April 18, 2012 Thank you for such a quick response, ill test this as soon as i get home tomorrow!!!! So, when you are not sure the length of an item, you use: (?:.*?) correct? Link to comment Share on other sites More sharing options...
Realm Posted April 18, 2012 Share Posted April 18, 2012 (edited) Not exactly ?: Tells SRE not to include the following match criteria in your results . will match any single character except newline (@LF) * Tells it to repeat the previous criteria, in this case more single characters. ? when placed after a repeating character, will find the smallest match. Edit: Extra info: If we didn't include the ending '?' it would have given us the largest possible match and in your case unexpected results. For Example: $text = 'Test Text<Need This Text>and <Do not Need this text>' $SRE = StringRegExp($text, 'Test Text<(.*)>', 1) _ArrayDisplay($SRE) $SRE = StringRegExp($text, 'Test Text<(.*?)>', 1) _ArrayDisplay($SRE) The First example returns = Need This Text>and <Do not Need this text When we instruct it to return the shortest match by adding the '?' after the repeating character '*' we get = Need This Text Edited April 18, 2012 by Realm My Contributions: Unix Timestamp: Calculate Unix time, or seconds since Epoch, accounting for your local timezone and daylight savings time. RegEdit Jumper: A Small & Simple interface based on Yashied's Reg Jumper Function, for searching Hives in your registry. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now