Jump to content

Need help with webparse.


Recommended Posts

Hey I need to dump the <>'s and all info in them.

Please show me how if you can.

I have tried string replace but needs a wildcard

#include <array.au3>
#include <string.au3>
#include <IE.au3>
$Subject = "Male"
$oIE = _IECreate("http://thesaurus.reference.com/browse/" & $Subject, 0, 1, 1,1)
$sHTML = _IEDocReadHTML($oIE)
_Run1()

Func _Run1()
    $aNameArray  = _StringBetween($sHTML, '<b>Synonyms:</b></td>', '</span></td>')
    _ArrayDisplay($aNameArray)
EndFunc
_IEQuit ($oIE)

;need to strip everything within and including <>  and strip *
Edited by John117
Link to comment
Share on other sites

Solved:

#include <array.au3>
#include <string.au3>
#include <IE.au3>
Dim $aNameArray
$Subject = "Male"
$oIE = _IECreate("http://thesaurus.reference.com/browse/" & $Subject, 0, 1, 1,1)
$sHTML = _IEDocReadHTML($oIE)
_Run1()

Func _Run1()
    $aNameArray  = _StringBetween($sHTML, '<b>Synonyms:</b></td>', '</span></td>')
    For $i = 0 to UBound($aNameArray) -1
        $aNameArray[$i] = StringRegExpReplace($aNameArray[$i], '<(.*?)>', "", 0)
    Next
EndFunc
_ArrayDisplay($aNameArray)
_IEQuit ($oIE)

;need to strip everything within and including <>  and strip *
Link to comment
Share on other sites

  • Moderators

Solved:

#include <array.au3>
#include <string.au3>
#include <IE.au3>
Dim $aNameArray
$Subject = "Male"
$oIE = _IECreate("http://thesaurus.reference.com/browse/" & $Subject, 0, 1, 1,1)
$sHTML = _IEDocReadHTML($oIE)
_Run1()

Func _Run1()
    $aNameArray  = _StringBetween($sHTML, '<b>Synonyms:</b></td>', '</span></td>')
    For $i = 0 to UBound($aNameArray) -1
        $aNameArray[$i] = StringRegExpReplace($aNameArray[$i], '<(.*?)>', "", 0)
    Next
EndFunc
_ArrayDisplay($aNameArray)
_IEQuit ($oIE)

;need to strip everything within and including <>  and strip *
You could also do something like:
_get_synonyms($sHTML)

Func _get_synonyms($s_html)
    $s_html = StringRegExpReplace($s_html, "(?s)(?i)<b>Antonyms:</b></td>.*?</span></td>", "")
    Local $a_name_array = StringRegExp($s_html, "(?s)(?i)\x22http://thesaurus.reference.com/browse/(\w+)\x22", 3)
    _ArrayDisplay($a_name_array)
EndFunc

Common sense plays a role in the basics of understanding AutoIt... If you're lacking in that, do us all a favor, and step away from the computer.

Link to comment
Share on other sites

Well... here's my approach...

#include <array.au3>
#include <string.au3>
#include <IE.au3>
$Subject = "Male"
$oIE = _IECreate("http://thesaurus.reference.com/browse/" & $Subject, 0, 1, 1, 1)
$sHTML = _IEDocReadHTML($oIE)
_Run1()

Func _Run1()
    $aNameArray = _StringBetween($sHTML, '<b>Synonyms:</b></td>', '</span></td>')

    For $x = 0 To UBound($aNameArray) - 1
        $aNameArray[$x] = StringReplace($aNameArray[$x], "<TD><SPAN>", "")
        $aNameTemp = ""
        $split = StringSplit($aNameArray[$x], @CRLF)
        For $i = 1 To UBound($split) - 1
            $aNameTemp &= __Stringbetween($split[$i], ">", "</A>")
        Next
        $aNameArray[$x] = $aNameTemp
    Next

    _ArrayDisplay($aNameArray)
EndFunc   ;==>_Run1
_IEQuit($oIE)

;need to strip everything within and including <>  and strip *

Func __StringBetween($s_String, $s_Start, $s_End = 0)
    $s_Start = StringInStr($s_String, $s_Start) + StringLen($s_Start)
    Return StringMid($s_String, $s_Start, StringInStr($s_String, $s_End) - $s_Start)
EndFunc   ;==>__StringBetween

... a little late, but it works

8)

NEWHeader1.png

Link to comment
Share on other sites

I don't understand:

;need to strip everything within and including <> and strip *

shat do you mean by "strip *"?

Dale

Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y

Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Link to comment
Share on other sites

Sorry guys, after marking it as solved, I didn't check back on it.

As soon as I get to my home pc (has autoit) I will test out your methods to see what I can learn from them.

@Dale

by 'strip' I just meant remove. remove anything like <this>

including "this", "<" and ">" also, "*"

Ex: <this>some stuff*<that><and the other>

Result: some stuff

I am just learning 'StringRegExpReplace' so it took me a while to come up with something.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...