Jump to content
Sign in to follow this  
John117

Need help with webparse.

Recommended Posts

John117

Hey I need to dump the <>'s and all info in them.

Please show me how if you can.

I have tried string replace but needs a wildcard

#include <array.au3>
#include <string.au3>
#include <IE.au3>
$Subject = "Male"
$oIE = _IECreate("http://thesaurus.reference.com/browse/" & $Subject, 0, 1, 1,1)
$sHTML = _IEDocReadHTML($oIE)
_Run1()

Func _Run1()
    $aNameArray  = _StringBetween($sHTML, '<b>Synonyms:</b></td>', '</span></td>')
    _ArrayDisplay($aNameArray)
EndFunc
_IEQuit ($oIE)

;need to strip everything within and including <>  and strip *
Edited by John117

Share this post


Link to post
Share on other sites
John117

Solved:

#include <array.au3>
#include <string.au3>
#include <IE.au3>
Dim $aNameArray
$Subject = "Male"
$oIE = _IECreate("http://thesaurus.reference.com/browse/" & $Subject, 0, 1, 1,1)
$sHTML = _IEDocReadHTML($oIE)
_Run1()

Func _Run1()
    $aNameArray  = _StringBetween($sHTML, '<b>Synonyms:</b></td>', '</span></td>')
    For $i = 0 to UBound($aNameArray) -1
        $aNameArray[$i] = StringRegExpReplace($aNameArray[$i], '<(.*?)>', "", 0)
    Next
EndFunc
_ArrayDisplay($aNameArray)
_IEQuit ($oIE)

;need to strip everything within and including <>  and strip *

Share this post


Link to post
Share on other sites
SmOke_N

Solved:

#include <array.au3>
#include <string.au3>
#include <IE.au3>
Dim $aNameArray
$Subject = "Male"
$oIE = _IECreate("http://thesaurus.reference.com/browse/" & $Subject, 0, 1, 1,1)
$sHTML = _IEDocReadHTML($oIE)
_Run1()

Func _Run1()
    $aNameArray  = _StringBetween($sHTML, '<b>Synonyms:</b></td>', '</span></td>')
    For $i = 0 to UBound($aNameArray) -1
        $aNameArray[$i] = StringRegExpReplace($aNameArray[$i], '<(.*?)>', "", 0)
    Next
EndFunc
_ArrayDisplay($aNameArray)
_IEQuit ($oIE)

;need to strip everything within and including <>  and strip *
You could also do something like:
_get_synonyms($sHTML)

Func _get_synonyms($s_html)
    $s_html = StringRegExpReplace($s_html, "(?s)(?i)<b>Antonyms:</b></td>.*?</span></td>", "")
    Local $a_name_array = StringRegExp($s_html, "(?s)(?i)\x22http://thesaurus.reference.com/browse/(\w+)\x22", 3)
    _ArrayDisplay($a_name_array)
EndFunc

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Share this post


Link to post
Share on other sites
Valuater

Well... here's my approach...

#include <array.au3>
#include <string.au3>
#include <IE.au3>
$Subject = "Male"
$oIE = _IECreate("http://thesaurus.reference.com/browse/" & $Subject, 0, 1, 1, 1)
$sHTML = _IEDocReadHTML($oIE)
_Run1()

Func _Run1()
    $aNameArray = _StringBetween($sHTML, '<b>Synonyms:</b></td>', '</span></td>')

    For $x = 0 To UBound($aNameArray) - 1
        $aNameArray[$x] = StringReplace($aNameArray[$x], "<TD><SPAN>", "")
        $aNameTemp = ""
        $split = StringSplit($aNameArray[$x], @CRLF)
        For $i = 1 To UBound($split) - 1
            $aNameTemp &= __Stringbetween($split[$i], ">", "</A>")
        Next
        $aNameArray[$x] = $aNameTemp
    Next

    _ArrayDisplay($aNameArray)
EndFunc   ;==>_Run1
_IEQuit($oIE)

;need to strip everything within and including <>  and strip *

Func __StringBetween($s_String, $s_Start, $s_End = 0)
    $s_Start = StringInStr($s_String, $s_Start) + StringLen($s_Start)
    Return StringMid($s_String, $s_Start, StringInStr($s_String, $s_End) - $s_Start)
EndFunc   ;==>__StringBetween

... a little late, but it works

8)


NEWHeader1.png

Share this post


Link to post
Share on other sites
DaleHohm

I don't understand:

;need to strip everything within and including <> and strip *

shat do you mean by "strip *"?

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites
John117

Sorry guys, after marking it as solved, I didn't check back on it.

As soon as I get to my home pc (has autoit) I will test out your methods to see what I can learn from them.

@Dale

by 'strip' I just meant remove. remove anything like <this>

including "this", "<" and ">" also, "*"

Ex: <this>some stuff*<that><and the other>

Result: some stuff

I am just learning 'StringRegExpReplace' so it took me a while to come up with something.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.