Jump to content

[ANGRY SOLVE] Regular Expression Help


Recommended Posts

I have a source, puled from a website, and I need to find any occurance of <br /> and </p> so that I can pull the text.

It would be ideal if i could find ="def_p"> and then find everything after it to the next </p> but that doesn't seem to work; however just finding <br /> and </p> works great because only the text I want is wrapped in them.

So, did i type this right?

StringRegExp($source, '(.*?)<[[br /]|[/p]]>', 1, $nOffset)

or, is it:

StringRegExp($source, '(.*?)<[br|/p]>', 1, $nOffset)

or, is it:

StringRegExp($source, '(.*?)[<br /> | </p>]', 1, $nOffset)

Thanks!

Edited by zackrspv

-_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë 맧ëñ§ë øƒ !ïƒë.

Link to comment
Share on other sites

$pat="test1 <br />test"&@CRLF&"string and another "&@LF&" new "&@CR&" new line</p> test2"
$reg=StringRegExp($pat,"(?s)<br \/>(.)+<\/p>" , 2)
MsgBox(0 , "" , StringTrimLeft(StringTrimRight($reg[0],4),6))

Explanation:

(?s)=. matches even new lines :)

\/=matches a /

(.)+=matches any char except new lines(but with the (?s) it matches new lines too ;))

Cheers:)

Edited by alexmadman

Only two things are infinite, the universe and human stupidity, and i'm not sure about the former -Alber EinsteinPractice makes perfect! but nobody's perfect so why practice at all?http://forum.ambrozie.ro

Link to comment
Share on other sites

Can we see the website source you want to pull from? maybe not the whole thing... but just the relevant area.

Per your request:

<div class="def_p">
         <p>Research Science Institute. Insanely prestigious summer research program for rising high school seniors at MIT and CalTech. Practically impossible to gain admission.</p>

         <p style="font-style: italic">RSI has the best of the best.</p>                  <div class="tags">by <a href="/author.php?author=nerdish">nerdish</a> Jan 2, 2005 <a href="java script:void(0)" class="shareoff" onclick="share_toggle(this, 973915, 'http://www.urbanup.com/973915', 'http://del.icio.us/post?url=http%3A%2F%2Fwww.urbanup.com%2F973915&title=&notes=')">email it</a></div>
               </div>

I'll try the other solution as well, to see if i can get it to work. But, if you can help me determine this, it would be great. Each item I want to extract is wrapped in the div class def_p

-_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë 맧ëñ§ë øƒ !ïƒë.

Link to comment
Share on other sites

Hi,

this way? (?<="def_p">)(.*)(?=<\/p>)

Mega

Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Link to comment
Share on other sites

Hi,

this way? (?<="def_p">)(.*)(?=<\/p>)

Mega

Hiya,

Thanks for the response, but that didn't pull up any results.

-_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë 맧ëñ§ë øƒ !ïƒë.

Link to comment
Share on other sites

Hi,

???

This ...

<div class="def_p">
          <p>Research Science Institute. Insanely prestigious summer research program for rising high school seniors at MIT and CalTech. Practically impossible to gain admission.</p>
 
          <p style="font-style: italic">RSI has the best of the best.</p>                 <div class="tags">by <a href="/author.php?author=nerdish">nerdish</a> Jan 2, 2005 <a href="java script:void(0)" class="shareoff" onclick="share_toggle(this, 973915, 'http://www.urbanup.com/973915', 'http://del.icio.us/post?url=http%3A%2F%2Fwww.urbanup.com%2F973915&title=&notes=')">email it</a></div>
                </div>

and you need this?

<p>Research Science Institute. Insanely prestigious summer research program for rising high school seniors at MIT and CalTech. Practically impossible to gain admission.</p>
  
           <p style="font-style: italic">RSI has the best of the best.</p>

right?

Then this works for me : (?-i)(?s)(?<="def_p">)(.*)(?=<\/p>)

Mega

P.S.: I tested the pattern only in a tool not in Autoit.

Edited by Xenobiologist

Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Link to comment
Share on other sites

$pat="test1 <br />test"&@CRLF&"string and another "&@LF&" new "&@CR&" new line</p> test2"
$reg=StringRegExp($pat,"(?s)<br \/>(.)+<\/p>" , 2)
MsgBox(0 , "" , StringTrimLeft(StringTrimRight($reg[0],4),6))

Explanation:

(?s)=. matches even new lines :)

\/=matches a /

(.)+=matches any char except new lines(but with the (?s) it matches new lines too ;))

Cheers:)

Hum,

Implimenting your changes causes the script to auto close once it hits the regular expression.

Here's the modified code:

Func sSearch()
    $source = ""
    $str = ""
    $str2 = ""
    GUICtrlSetState($Edit1, $GUI_SHOW)
    GUICtrlSetState($Combo1, $GUI_HIDE)
    GUICtrlSetState($Terms, $GUI_HIDE)
    GUICtrlSetState($Defs, $GUI_HIDE)
    GUICtrlSetState($comboLable1, $GUI_HIDE)
    GUICtrlSetState($Progress1, $GUI_SHOW)
    $item = StringReplace($item2, " ", "+")
    
    $source = (_INetGetSource("http://www.urbandictionary.com/define.php?term=" & $item))
    If StringInStr($source, "isn't defined", 0) > "0" Then
        GUICtrlSetData($Edit1, "Term not found, sorry, try doing a Google Search instead!")
        GUICtrlSetState($Progress1, $GUI_HIDE)
    Else
        GUICtrlSetData($Edit1, "Term found, loading definition!")
        For $i = 0 To 100
            GUICtrlSetData($Progress1, $i)
            Sleep(5)
            GUICtrlSetData($Progress1, 0)
        Next
        GUICtrlSetState($Progress1, $GUI_HIDE)
        $nOffset = 1
        $str = ""
        While 1
            $array = StringRegExp($source, "(?s)<br \/>(.)+<\/p>", 1, $nOffset)
            If @error = 0 Then
                $nOffset = @extended
            Else
                ExitLoop
            EndIf
            For $i = 0 To UBound($array) - 1
                $str = $array[$i]
                $str = StringRegExpReplace($str, "&#(.*?);", "")
                $str = StringRegExpReplace($str, "<(.*?)>", "")
                $str = StringRegExpReplace($str, "</(.*?)>", "")
                $str = StringRegExpReplace($str, "&(.*?);", "")
                $str = StringRegExpReplace($str, "\t", "")
            Next
        WEnd
        GUICtrlSetData($Edit1, $str)
        
    EndIf
EndFunc  ;==>sSearch

Why is it closing like that?

-_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë 맧ëñ§ë øƒ !ïƒë.

Link to comment
Share on other sites

WARNING: LONG LONG POST, includes most of the code i'm working with and the resources i'm searching as well as the site it comes from.

Then this works for me : (?-i)(?s)(?<="def_p">)(.*)(?=<\/p>)

P.S.: I tested the pattern only in a tool not in Autoit.

Hum, well, at least I get results, yay! However, it pulls out way too much. For example; that search i gave you was for 'RSI' and this is what your regex did pull:

Repetitive Strain Injury. 
The price you pay for over-indulging in a single form of entertainment.
         For goodness sake put down that mouse, Johnny! You'll give yourself RSI...               by CougarSW2 Nov 19, 2004 email it
                  
            
            
               
                  permalink:
                   del.icio.us
               
               
                  
                  Send to a friend
               
               
                  your email:
                  
               
               
                  their email:
                  
               
               
                  
                   send me the word of the day (it's free)
               
               
                  
                  
                     
                     
                        
                           
                           
                        
                     
                  
               
            
            
         
               
   




   2.
   RSI
   
      
         
         21 up, 6 down
         
      
   


   
   
      
         Research Science Institute. Insanely prestigious summer research program for rising high school seniors at MIT and CalTech. Practically impossible to gain admission.
         RSI has the best of the best.                by nerdish Jan 2, 2005 email it
               
   




   3.
   RSI
   
      
         
         1 thumb up
         
      
   


   
   
      
         An acronym for Rapid Sequence Induction, an emergency medical procedure sometimes performed by paramedics prior to intubation on conscience patients. It is essentially a rapid anesthetization. 
         EMT: Resps are 6 chief!
Medic: Alright, we need to intubate. Is he still conscious?
EMT: Partially
Medic: Go ahead and set up for an RSI        tags ems emt paramedic intubation emergency         by pcbene C-ville, VA Dec 24, 2007 email it
               
   



   
   

<!--
google_ad_client = "pub-4733233155277872";
google_alternate_ad_url = "http://www.urbandictionary.com/asbackup_medrect.html";
google_ad_width = 300;
google_ad_height = 250;
google_ad_format = "300x250_as";
google_ad_type = "image";
//2007-03-31: define rectangle
google_ad_channel = "1113511432";
google_color_border = "FBFFEA";
google_color_bg = "FFF3DA";
google_color_link = "83AE84";
google_color_text = "000000";
google_color_url = "000000";
//-->

<script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js">


   



   4.
   RSi
   
      
         
         5 up, 5 down
         
      
   


   
   
      
         RSi - a small band from the Belfast area in Northern Ireland.

Currently un-signed, but have a manager.

Could be described as Space Rock or progressive


         Kid one : Hey dOOd. Heard of RSi?

Kid two : No but my older brother gets it all the time.

Note, the sheer amount of line spaces above, I'm not sure why that happens. But note, also, all the extra information it pulls. Here's the source:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"  "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<title>Urban Dictionary: rsi</title>
<meta name="description" content="Urban Dictionary is a slang dictionary with your definitions. Define your world">
<style type="text/css">
<!--
   @import "http://static.urbandictionary.com/css/urban.css?1205139850";
   @import "http://static.urbandictionary.com/css/define.css?1205139850";
-->
</style>
<script type="text/javascript" language="javascript" src="http://static.urbandictionary.com/js/urban.js?1205139850"></script><script type="text/javascript" language="javascript" src="http://static.urbandictionary.com/js/thumbs3.js?1205139850"></script><script type="text/javascript" language="javascript" src="http://static.urbandictionary.com/js/share.js?1205139850"></script><link rel="search" type="application/opensearchdescription+xml" title="Urban Dictionary Search" href="http://www.urbandictionary.com/osd.xml" />
</head>
<body>
<div id="banner1">
   <div id="banner2">

      <div id="banner3">
         <div id="logo"><a href="/"><img src="http://static.urbandictionary.com/img/logo.gif" alt="urbandictionary.com" width="259" height="81"></a></div>
         <div id="form"><form method="get" action="http://www.urbandictionary.com/define.php" name="define">
            <table><tr><td><input type="text" name="term" size="30" tabindex="1" value="rsi"></td><td><input type="submit" value="search"></td></tr></table>
            <div id="tagline">Urban Dictionary is a slang dictionary with your definitions. <b>Define your world.</b></div>
         </form></div>
         <div id="topnav"><a href="/" class="active">browse</a> <a href="/daily.php">word of the day</a> <a href="/insert.php?word=rsi">add</a> <a href="/editor.php">edit</a> <a href="/book.php">new book</a> <a href="/news.php">press</a> <a href="/tools.php">tools</a> <a href="/chat.php">chat</a> <a href="/yesterday.php">newest</a></div>

         <div id="banner-color">&nbsp;</div>
         <div><img src="http://static.urbandictionary.com/img/banner.jpg" width="765" height="94"></div>
      </div>
   </div>
</div>

<div id="whole">
   <div id="content">
      <div id="subnav1">
         <div id="subnav2"><a href="/random.php">random</a> <a href="/browse.php?character=A">A</a> <a href="/browse.php?character=B">B</a> <a href="/browse.php?character=C">C</a> <a href="/browse.php?character=D">D</a> <a href="/browse.php?character=E">E</a> <a href="/browse.php?character=F">F</a> <a href="/browse.php?character=G">G</a> <a href="/browse.php?character=H">H</a> <a href="/browse.php?character=I">I</a> <a href="/browse.php?character=J">J</a> <a href="/browse.php?character=K">K</a> <a href="/browse.php?character=L">L</a> <a href="/browse.php?character=M">M</a> <a href="/browse.php?character=N">N</a> <a href="/browse.php?character=O">O</a> <a href="/browse.php?character=P">P</a> <a href="/browse.php?character=Q">Q</a> <a href="/browse.php?word=rsi" class="active">R</a> <a href="/browse.php?character=S">S</a> <a href="/browse.php?character=T">T</a> <a href="/browse.php?character=U">U</a> <a href="/browse.php?character=V">V</a> <a href="/browse.php?character=W">W</a> <a href="/browse.php?character=X">X</a> <a href="/browse.php?character=Y">Y</a> <a href="/browse.php?character=Z">Z</a> <a href="/browse.php?character=*">#</a></div>

      </div>
<table border="0" cellpadding="0" cellspacing="0" style="width: 805px">
<tr style="vertical-align: top">
<td style="width: 150px; background-color: #FFF3DA">

<div class="leftist-tabs">

<a href="http://www.amazon.com/gp/product/0740768751?ie=UTF8&tag=urbandictio08-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0740768751"><img src="http://static.urbandictionary.com/img/book-on-star.gif" width="137" height="105" border="0"/></a>

<a href="http://www.amazon.com/gp/product/0740768751?ie=UTF8&tag=urbandictio08-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0740768751">order <i>mo' urban</i></a><br/>
<a href="http://www.amazon.com/gp/product/0740768751?ie=UTF8&tag=urbandictio08-20&linkCode=as2&camp=1789&creative=9325&creativeASIN=0740768751">on amazon</a> and <a href="http://search.barnesandnoble.com/Mo-Urban-Dictionary/Aaron-Peckham/e/9780740768750/?itm=1&afsrc=1&lkid=J23965981&pubid=K134569&byo=1">b&amp;n</a>

<b>now shipping</b><br/>

</div>

<div style="padding: 10px;">
<ul class="leftist" id="leftist">
<li><a href="/define.php?term=RS+Ship">RS Ship</a><li><a href="/define.php?term=rs%2A">rs*</a><li><a href="/define.php?term=rs-resources">rs-resources</a><li><a href="/define.php?term=rs2">rs2</a><li><a href="/define.php?term=Rs2+Product">Rs2 Product</a><li><a href="/define.php?term=rs6">rs6</a><li><a href="/define.php?term=RSA">RSA</a><li><a href="/define.php?term=rsag">rsag</a><li><a href="/define.php?term=rsb">rsb</a><li><a href="/define.php?term=rsbo">rsbo</a><li><a href="/define.php?term=RSC">RSC</a><li><a href="/define.php?term=RsCheatNet">RsCheatNet</a><li><a href="/define.php?term=rsd">rsd</a><li><a href="/define.php?term=rsd+-+Real+Street+Drags">rsd - Real Street Drags</a><li><a href="/define.php?term=RSe">RSe</a><li><a href="/define.php?term=rsf">rsf</a><li><a href="/define.php?term=rsfx">rsfx</a><li><a href="/define.php?term=RSGB">RSGB</a><li><a href="/define.php?term=RSGC">RSGC</a><li><a href="/define.php?term=rsh">rsh</a><li><div class="active">rsi</div><li><a href="/define.php?term=rsk">rsk</a><li><a href="/define.php?term=Rskillz">Rskillz</a><li><a href="/define.php?term=RSL">RSL</a><li><a href="/define.php?term=RSM">RSM</a><li><a href="/define.php?term=RSM+International">RSM International</a><li><a href="/define.php?term=rsmami">rsmami</a><li><a href="/define.php?term=rsmv">rsmv</a><li><a href="/define.php?term=RSN">RSN</a><li><a href="/define.php?term=RSO">RSO</a><li><a href="/define.php?term=rsod">rsod</a><li><a href="/define.php?term=RSP">RSP</a><li><a href="/define.php?term=RSP%27d+off">RSP'd off</a><li><a href="/define.php?term=RSPCA">RSPCA</a><li><a href="/define.php?term=rspct">rspct</a><li><a href="/define.php?term=Rspecial">Rspecial</a><li><a href="/define.php?term=RSPW">RSPW</a><li><a href="/define.php?term=RSR">RSR</a><li><a href="/define.php?term=RSS">RSS</a><li><a href="/define.php?term=rssXz">rssXz</a><li><a href="/define.php?term=RST">RST</a></ul>

</div>
</td>

<td style="width: 465px; padding: 15px">
<table cellpadding="0" cellspacing="0" border="0" width="100%">
<tr valign="top">
   <td class="def_number" width="20">1.</td>
   <td class="def_word">RSI</td>
   <td class="def_thumbs">
      <table cellpadding="0" cellspacing="3" border="0" style="margin-left: auto"><tr>
         <td><a href="java script:void(0)" onclick="thumbs.click(906999, 1)"><img src="http://static.urbandictionary.com/thumbsup.gif" width="19" height="19" id="thumbs_906999_1_gif"></a></td>

         <td nowrap><span id="thumbs_906999"><strong>34</strong> up, <strong>3</strong> down</span></td>
         <td><a href="java script:void(0)" onclick="thumbs.click(906999, 0)"><img src="http://static.urbandictionary.com/thumbsdown.gif" width="19" height="19" id="thumbs_906999_0_gif"></a></td>
      </tr></table>
   </td>
</tr>
<tr>
   <td></td>

   <td colspan="2">
      <div class="def_p">
         <p>Repetitive Strain Injury. <br />
The price you pay for over-indulging in a single form of entertainment.</p>
         <p style="font-style: italic">&quot;For goodness sake put down that mouse, Johnny! You'll give yourself RSI...&quot;</p>                 <div class="tags">by <a href="/author.php?author=CougarSW2">CougarSW2</a> Nov 19, 2004 <a href="java script:void(0)" class="shareoff" onclick="share_toggle(this, 906999, 'http://www.urbanup.com/906999', 'http://del.icio.us/post?url=http%3A%2F%2Fwww.urbanup.com%2F906999&title=&notes=')">email it</a></div>

                  <div id="fold" style="display: none">
            <form onsubmit="share_send(this)" action="java script:void(0)" name="share"><input type="hidden" name="defid">
            <table width="100%" border="0" cellspacing="0" cellpadding="0">
               <tr>
                  <td class="fold-left"><span onclick="document.getElementById('permalink').click()">permalink:</span></td>
                  <td><input type="text" value="" onclick="this.focus(); this.select()" size="30" name="permalink" id="permalink"> <a href="java script:void(0)" id="delicious_fold">del.icio.us</a></td>
               </tr>

               <tr>
                  <td></td>
                  <td style="padding-top: 15px">Send to a friend</td>
               </tr>
               <tr>
                  <td class="fold-left">your email:</td>
                  <td><input type="text" size="30" name="yours" id="session_email"></td>
               </tr>

               <tr>
                  <td class="fold-left">their email:</td>
                  <td><input type="text" size="30" name="theirs"></td>
               </tr>
               <tr>
                  <td></td>
                  <td><input type="checkbox" name="subscribe"> send me the word of the day (it's free)</td>

               </tr>
               <tr>
                  <td></td>
                  <td>
                     <div class="height: 5px">&nbsp;</div>
                     <table>
                        <tr>
                           <td><input type="submit" value="Send message"></td>
                           <td class="never" id="share_status" width="180"></td>

                        </tr>
                     </table>
                  </td>
               </tr>
            </table>
            </form>
         </div>
               </div>
   </td>

</tr>


<tr valign="top">
   <td class="def_number" width="20">2.</td>
   <td class="def_word">RSI</td>
   <td class="def_thumbs">
      <table cellpadding="0" cellspacing="3" border="0" style="margin-left: auto"><tr>
         <td><a href="java script:void(0)" onclick="thumbs.click(973915, 1)"><img src="http://static.urbandictionary.com/thumbsup.gif" width="19" height="19" id="thumbs_973915_1_gif"></a></td>
         <td nowrap><span id="thumbs_973915"><strong>21</strong> up, <strong>6</strong> down</span></td>

         <td><a href="java script:void(0)" onclick="thumbs.click(973915, 0)"><img src="http://static.urbandictionary.com/thumbsdown.gif" width="19" height="19" id="thumbs_973915_0_gif"></a></td>
      </tr></table>
   </td>
</tr>
<tr>
   <td></td>
   <td colspan="2">
      <div class="def_p">
         <p>Research Science Institute. Insanely prestigious summer research program for rising high school seniors at MIT and CalTech. Practically impossible to gain admission.</p>

         <p style="font-style: italic">RSI has the best of the best.</p>                  <div class="tags">by <a href="/author.php?author=nerdish">nerdish</a> Jan 2, 2005 <a href="java script:void(0)" class="shareoff" onclick="share_toggle(this, 973915, 'http://www.urbanup.com/973915', 'http://del.icio.us/post?url=http%3A%2F%2Fwww.urbanup.com%2F973915&title=&notes=')">email it</a></div>
               </div>
   </td>
</tr>


<tr valign="top">

   <td class="def_number" width="20">3.</td>
   <td class="def_word">RSI</td>
   <td class="def_thumbs">
      <table cellpadding="0" cellspacing="3" border="0" style="margin-left: auto"><tr>
         <td><a href="java script:void(0)" onclick="thumbs.click(2758354, 1)"><img src="http://static.urbandictionary.com/thumbsup.gif" width="19" height="19" id="thumbs_2758354_1_gif"></a></td>
         <td nowrap><span id="thumbs_2758354"><strong>1</strong> thumb up</span></td>

         <td><a href="java script:void(0)" onclick="thumbs.click(2758354, 0)"><img src="http://static.urbandictionary.com/thumbsdown.gif" width="19" height="19" id="thumbs_2758354_0_gif"></a></td>
      </tr></table>
   </td>
</tr>
<tr>
   <td></td>
   <td colspan="2">
      <div class="def_p">
         <p>An acronym for Rapid Sequence Induction, an emergency medical procedure sometimes performed by paramedics prior to intubation on conscience patients. It is essentially a rapid anesthetization. </p>

         <p style="font-style: italic">EMT: Resps are 6 chief!<br />
Medic: Alright, we need to intubate. Is he still conscious?<br />
EMT: Partially<br />
Medic: Go ahead and set up for an RSI</p>        <div class="tags">tags <a href="/define.php?term=ems">ems</a> <a href="/define.php?term=emt">emt</a> <a href="/define.php?term=paramedic">paramedic</a> <a href="/define.php?term=intubation">intubation</a> <a href="/define.php?term=emergency">emergency</a></div>       <div class="tags">by <a href="/author.php?author=pcbene">pcbene</a> C-ville, VA Dec 24, 2007 <a href="java script:void(0)" class="shareoff" onclick="share_toggle(this, 2758354, 'http://www.urbanup.com/2758354', 'http://del.icio.us/post?url=http%3A%2F%2Fwww.urbanup.com%2F2758354&title=&notes=ems+emt+paramedic+intubation+emergency+urbandictionary')">email it</a></div>

               </div>
   </td>
</tr>

<tr>
   <td></td>
   <td style="padding: 10px" colspan="2">
<center>
<script type="text/javascript"><!--
google_ad_client = "pub-4733233155277872";
google_alternate_ad_url = "http://www.urbandictionary.com/asbackup_medrect.html";
google_ad_width = 300;
google_ad_height = 250;
google_ad_format = "300x250_as";
google_ad_type = "image";
//2007-03-31: define rectangle
google_ad_channel = "1113511432";
google_color_border = "FBFFEA";
google_color_bg = "FFF3DA";
google_color_link = "83AE84";
google_color_text = "000000";
google_color_url = "000000";
//-->
</script>
<script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>
</center>

   </td>
</tr>

<tr valign="top">
   <td class="def_number" width="20">4.</td>
   <td class="def_word">RSi</td>
   <td class="def_thumbs">
      <table cellpadding="0" cellspacing="3" border="0" style="margin-left: auto"><tr>
         <td><a href="java script:void(0)" onclick="thumbs.click(2213018, 1)"><img src="http://static.urbandictionary.com/thumbsup.gif" width="19" height="19" id="thumbs_2213018_1_gif"></a></td>

         <td nowrap><span id="thumbs_2213018"><strong>5</strong> up, <strong>5</strong> down</span></td>
         <td><a href="java script:void(0)" onclick="thumbs.click(2213018, 0)"><img src="http://static.urbandictionary.com/thumbsdown.gif" width="19" height="19" id="thumbs_2213018_0_gif"></a></td>
      </tr></table>
   </td>
</tr>
<tr>
   <td></td>

   <td colspan="2">
      <div class="def_p">
         <p>RSi - a small band from the Belfast area in Northern Ireland.<br />
<br />
Currently un-signed, but have a manager.<br />
<br />
Could be described as <a href="/define.php?term=Space+Rock">Space Rock</a> or <a href="/define.php?term=progressive">progressive</a><br />

<br />
</p>
         <p style="font-style: italic">Kid one : Hey dOOd. Heard of RSi?<br />
<br />
Kid two : No but my older brother gets it all the time.</p>      <div class="tags">tags <a href="/define.php?term=rsi">rsi</a> <a href="/define.php?term=spacerock">spacerock</a> <a href="/define.php?term=progressive">progressive</a> <a href="/define.php?term=music">music</a> <a href="/define.php?term=belfast">belfast</a></div>         <div class="tags">by <a href="/author.php?author=mel2k7">mel2k7</a> Northern Ireland Jan 23, 2007 <a href="java script:void(0)" class="shareoff" onclick="share_toggle(this, 2213018, 'http://www.urbanup.com/2213018', 'http://del.icio.us/post?url=http%3A%2F%2Fwww.urbanup.com%2F2213018&title=&notes=rsi+spacerock+progressive+music+belfast+urbandictionary')">email it</a></div>

               </div>
   </td>
</tr>

</table><div id="strip" style="margin-top: 50px">
   <table>
      <tr id="pics"><td width="20%"><a href="/define.php?term=mi-24&i=1"><img src="http://media.urbandictionary.com/image/icon/mi-24-36293.jpg" width="64" height="37" border="0"></a></td><td width="20%"><a href="/define.php?term=poverty&i=1"><img src="http://media.urbandictionary.com/image/icon/poverty-35059.jpg" width="64" height="29" border="0"></a></td><td width="20%"><a href="/define.php?term=puppie&i=1"><img src="http://media.urbandictionary.com/image/icon/puppie-9801.jpg" width="64" height="48" border="0"></a></td><td width="20%"><a href="/define.php?term=keep+it+real&i=1"><img src="http://media.urbandictionary.com/image/icon/keepitreal-2047.jpg" width="64" height="46" border="0"></a></td><td width="20%"><a href="/define.php?term=super+chill&i=1"><img src="http://media.urbandictionary.com/image/icon/superchill-55866.jpg" width="64" height="48" border="0"></a></td></tr><tr><td><a href="/define.php?term=mi-24&i=1">mi-24</a></td><td><a href="/define.php?term=poverty&i=1">poverty</a></td><td><a href="/define.php?term=puppie&i=1">puppie</a></td><td><a href="/define.php?term=keep+it+real&i=1">keep it real</a></td><td><a href="/define.php?term=super+chill&i=1">super chill</a></td></tr>   </table>

</div>

<td id="right2" style="width: 130px; background-color: #FFF3DA">
<div style="margin: 10px 5px">
<div style="margin: 0 auto; text-align: center">

<script type="text/javascript"><!--
google_ad_client = "pub-4733233155277872";
google_alternate_ad_url = "http://www.urbandictionary.com/asbackup_skyscraper.html";
google_ad_width = 160;
google_ad_height = 600;
google_ad_format = "160x600_as";
google_ad_type = "text_image";
//2006-12-04: wide skyscraper
google_ad_channel = "9172383876";
google_color_border = "FFF3DA";
google_color_bg = "FFF3DA";
google_color_link = "DE5F25";
google_color_text = "000000";
google_color_url = "000000";
//--></script>
<script type="text/javascript"
  src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>

</div>
</div>
</td>
</tr>

</table>

<div id="bottomnav"><a href="/">browse</a> <a href="/daily.php">word of the day</a> <a href="/insert.php?word=rsi">add</a> <a href="/editor.php">edit</a> <a href="/book.php">new book</a> <a href="/news.php">press</a> <a href="/tools.php">tools</a> <a href="/yesterday.php">newest</a></div>   </div>

   
   <div id="footer"><a href="http://www.urbandictionary.com/">Urban Dictionary</a> is not appropriate for all audiences. &copy;1999-2008. <a href="http://www.urbandictionary.com/tos.php">terms of service</a> | <a href="http://www.urbandictionary.com/feedback.php">feedback</a> | <a href="http://www.urbandictionary.com/tools.php">tools</a> | <a href="http://www.urbandictionary.com/tools_rss.php">rss</a> | <a href="https://adwords.google.com/select/OnsiteSignupLandingPage?client=ca-pub-4733233155277872&referringUrl=http://www.urbandictionary.com">advertise</a></div>

   <!-- 6 -->

</div>
</body>
<script type="text/javascript" src="http://www.google-analytics.com/urchin.js"></script>
<script>
 _uacct="UA-31805-1";
 _udn="urbandictionary.com";
 urchinTracker();
</script>
<!-- Start Quantcast tag -->
<script type="text/javascript" src="http://edge.quantserve.com/quant.js"></script>
<script type="text/javascript">
_qacct="p-77H27_lnOeCCI";quantserve();</script>
<noscript>
<img src="http://pixel.quantserve.com/pixel/p-77H27_lnOeCCI.gif" style="display: none" height="1" width="1" alt="Quantcast"/></noscript>

<!-- End Quantcast tag -->
</html>
<script src="/share_load.php"></script><script src="http://www.urbandictionary.com/thumbs_load.php?defid=906999,973915,2758354,2213018"></script>

Here's the function that I have to help pull that information out:

Func sSearch()
    $source = ""
    $str = ""
    $str2 = ""
    GUICtrlSetState($Edit1, $GUI_SHOW)
    GUICtrlSetState($Combo1, $GUI_HIDE)
    GUICtrlSetState($Terms, $GUI_HIDE)
    GUICtrlSetState($Defs, $GUI_HIDE)
    GUICtrlSetState($comboLable1, $GUI_HIDE)
    GUICtrlSetState($Progress1, $GUI_SHOW)
    $item = StringReplace($item2, " ", "+")
    
    $source = (_INetGetSource("http://www.urbandictionary.com/define.php?term=" & $item))
    If StringInStr($source, "isn't defined", 0) > "0" Then
        GUICtrlSetData($Edit1, "Term not found, sorry, try doing a Google Search instead!")
        GUICtrlSetState($Progress1, $GUI_HIDE)
    Else
        GUICtrlSetData($Edit1, "Term found, loading definition!")
        For $i = 0 To 100
            GUICtrlSetData($Progress1, $i)
            Sleep(5)
            GUICtrlSetData($Progress1, 0)
        Next
        GUICtrlSetState($Progress1, $GUI_HIDE)
        $nOffset = 1
        $str = ""
        While 1
            $array = StringRegExp($source, '(?-i)(?s)(?<="def_p">)(.*)(?=<\/p>)', 1, $nOffset)
            If @error = 0 Then
                $nOffset = @extended
            Else
                ExitLoop
            EndIf
            For $i = 0 To UBound($array) - 1
                $str = $array[$i]
                $str = StringRegExpReplace($str, "&#(.*?);", "")
                $str = StringRegExpReplace($str, "<(.*?)>", "")
                $str = StringRegExpReplace($str, "</(.*?)>", "")
                $str = StringRegExpReplace($str, "&(.*?);", "")
                $str = StringRegExpReplace($str, "\t", "")
            Next
        WEnd
        GUICtrlSetData($Edit1, $str)
        
    EndIf
EndFunc  ;==>sSearch

So, then, why, oh why, is it not pulling exactly what i need?

-_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë 맧ëñ§ë øƒ !ïƒë.

Link to comment
Share on other sites

Hi,

just post :

Source and expected result. Then we can do the rest.

Mega

Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Link to comment
Share on other sites

Hi,

just post :

Source and expected result. Then we can do the rest.

Mega

Hiya,

I posted the source above.

As to what I need out of it, i need to pull the definition of the term; as defined in the source. For example: Search Term: 'RSI'; returns:

<div class="def_p">
         <p>Repetitive Strain Injury. <br />
The price you pay for over-indulging in a single form of entertainment.</p>

...

<div class="def_p">
         <p>Research Science Institute. Insanely prestigious summer research program for rising high school seniors at MIT and CalTech. Practically impossible to gain admission.</p>

         <p style="font-style: italic">RSI has the best of the best.</p>

...

<div class="def_p">
         <p>An acronym for Rapid Sequence Induction, an emergency medical procedure sometimes performed by paramedics prior to intubation on conscience patients. It is essentially a rapid anesthetization. </p>

         <p style="font-style: italic">EMT: Resps are 6 chief!<br />
Medic: Alright, we need to intubate. Is he still conscious?<br />

...

 <div class="def_p">
         <p>RSi - a small band from the Belfast area in Northern Ireland.<br />
<br />
Currently un-signed, but have a manager.<br />

I only want to see:

Repetitive Strain Injury.
---The price you pay for over-indulging in a single form of entertainment.

Research Science Institute. Insanely prestigious summer research program for rising high school seniors at MIT and CalTech. Practically impossible to gain admission.
---RSI has the best of the best.

An acronym for Rapid Sequence Induction, an emergency medical procedure sometimes performed by paramedics prior to intubation on conscience patients. It is essentially a rapid anesthetization.
---EMT: Resps are 6 chief!
---Medic: Alright, we need to intubate. Is he still conscious?

RSi - a small band from the Belfast area in Northern Ireland.
---Currently un-signed, but have a manager.

Note, i don't necessarly need to see the tidbits after the ---'s; but if i can get them, then fine. I really just need the main description. The whole project revolves around pulling just the definitions, but the quotes and examples are nice too.

Note, again, I do not wish to use an IE com object to display the page; i just want to see the data.

Thanks

-_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë 맧ëñ§ë øƒ !ïƒë.

Link to comment
Share on other sites

Hi,

is this already enough? I saved your source in source.txt.

#include<Array.au3>
Global $source = FileRead(FileOpen(@ScriptDir & '\source.txt', 0))
MsgBox(0, 0, $source)

Global $re_A = _getIt($source)
_ArrayDisplay($re_A)

Func _getIt($text)
    Local $re = StringRegExp($text, '(?-i)(?s)"def_p">(.*?)<\/p>', 3)
    If @error Then Return -1
    Return $re
EndFu

Mega

Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...