Sign in to follow this  
Followers 0
Dampe

Parsing html data..

5 posts in this topic

#1 ·  Posted (edited)

I'm trying to parse raw html into a big string, just leaving the important things behind.

Func _SearchForQuest()
   Local $GlobalQuestURL = "http:http://thottbot.com/?s="
    
    $tSearchParam = GUICtrlRead ($GUIInpts[5])
    If $tSearchParam <> "" Then
        
        _SetStatus("Connecting to " & $GlobalQuestURL & $tSearchParam)
        
        ;//Onu is meditating
        ;$tQuestResSource = _INetGetSource ($GlobalQuestURL & $tSearchParam)
                 $tQuestResSource = _INetGetSource ($GlobalQuestURL & "Onu is meditating")
        _SetStatus("Retrieved data from " & $GlobalQuestURL & $tSearchParam)
        
        If StringInStr ($tQuestResSource, "No search results found.") Then
            GUICtrlSetData ($GUILblQuestDescription, "No search results found.")
        Else
            MsgBox (32, "test", $tQuestResSource)
        EndIf
        
        
    EndIf
    
EndFunc

Anyone want to give me a hand / point in the direction?

Edited by Dampe

Share this post


Link to post
Share on other sites



#3 ·  Posted (edited)

I'm trying to parse raw html into a big string, just leaving the important things behind.

What are those important things?

This just strips all HTML tags, leaving the text unchanged.

Global $oMyError = ObjEvent("AutoIt.Error", "COMError")


InetGet("http://thottbot.com/?s=", @ScriptDir & "\html.html")
$sSource = FileRead(@ScriptDir & "\html.html")
$sPlainText = _HTML_StripTags($sSource)
ConsoleWrite($sPlainText & @CRLF)


Func _HTML_StripTags($sHTML)
    If Not StringStripWS($sHTML, 8) Then Return SetError(1, 0, "")
    Local $oHTML = ObjCreate("HTMLFILE")
    If @error Then Return SetError(2, 0, "")
    $oHTML.Open()
    $oHTML.Write($sHTML)
    If Not $oHTML.Body.InnerText Then Return SetError(3, 0, "")
    Return SetError(0, 0, $oHTML.Body.InnerText)
EndFunc   ;==>_HTML_StripTags

Func COMError()
    MsgBox(16, "AutoItCOM Test", "We intercepted a COM Error !" & @CRLF & @CRLF & _
            "err.description is: " & @TAB & $oMyError.description & @CRLF & _
            "err.windescription:" & @TAB & $oMyError.windescription & @CRLF & _
            "err.number is: " & @TAB & Hex($oMyError.number, 8) & @CRLF & _
            "err.lastdllerror is: " & @TAB & $oMyError.lastdllerror & @CRLF & _
            "err.scriptline is: " & @TAB & $oMyError.scriptline & @CRLF & _
            "err.source is: " & @TAB & $oMyError.source & @CRLF & _
            "err.helpfile is: " & @TAB & $oMyError.helpfile & @CRLF & _
            "err.helpcontext is: " & @TAB & $oMyError.helpcontext _
            )
    SetError(1)
EndFunc   ;==>COMError
Edited by Robjong

Share this post


Link to post
Share on other sites

Why this?

Should it not be:

?

Yep.

Accident, only put that there on the forum so people knew what I was trying to do :)

What are those important things?

This just strips all HTML tags, leaving the text unchanged.

Global $oMyError = ObjEvent("AutoIt.Error", "COMError")


InetGet("http://thottbot.com/?s=", @ScriptDir & "\html.html")
$sSource = FileRead(@ScriptDir & "\html.html")
$sPlainText = _HTML_StripTags($sSource)
ConsoleWrite($sPlainText & @CRLF)


Func _HTML_StripTags($sHTML)
    If Not StringStripWS($sHTML, 8) Then Return SetError(1, 0, "")
    Local $oHTML = ObjCreate("HTMLFILE")
    If @error Then Return SetError(2, 0, "")
    $oHTML.Open()
    $oHTML.Write($sHTML)
    If Not $oHTML.Body.InnerText Then Return SetError(3, 0, "")
    Return SetError(0, 0, $oHTML.Body.InnerText)
EndFunc   ;==>_HTML_StripTags

Func COMError()
    MsgBox(16, "AutoItCOM Test", "We intercepted a COM Error !" & @CRLF & @CRLF & _
            "err.description is: " & @TAB & $oMyError.description & @CRLF & _
            "err.windescription:" & @TAB & $oMyError.windescription & @CRLF & _
            "err.number is: " & @TAB & Hex($oMyError.number, 8) & @CRLF & _
            "err.lastdllerror is: " & @TAB & $oMyError.lastdllerror & @CRLF & _
            "err.scriptline is: " & @TAB & $oMyError.scriptline & @CRLF & _
            "err.source is: " & @TAB & $oMyError.source & @CRLF & _
            "err.helpfile is: " & @TAB & $oMyError.helpfile & @CRLF & _
            "err.helpcontext is: " & @TAB & $oMyError.helpcontext _
            )
    SetError(1)
EndFunc   ;==>COMError
Thanks, hopefully I'll be able to do something from there.

Share this post


Link to post
Share on other sites

Well, I see your just trying to get a certain string out of the a thottbot.com page for wow info... See I made a model edit project a while back, and it would search thottbot.com for a certain item's ID # So the user could model edit their appearance of the weapon/item they wanted.. And since theres alot of un-useful info in all that HTML I found a great function that i used to parse HTML data from _INETGetSource() function which would find a certain string between 2 known parameters.. Basicly the modded StringBetween function... But since the original string between function is stupid because you need to know the string your searching for lol This one is perfect!

Heres the Modded Function I use to parse HTML..

Func _StringBetween2($s, $from, $to)
    $x = StringInStr($s, $from) + StringLen($from)
    $y = StringInStr(StringTrimLeft($s, $x), $to)
    Return StringMid($s, $x, $y)
EndFunc  ;==>_StringBetween

and Heres an example of using it...

$THOTTBOT = _INetGetSource("Http://www.thottbot.com/")
$String = _StringBetween2($THOTTBOT, "<Title>", "</Title>")
MsgBox(1, "ThottBot.com", $String)

See how that works? Just a simple example but it parses the HTML source very fast and in that small example...

It will return $String as Thottbot: World of Warcraft in a MsgBox() .. This is just a simple example but can be used to parse multiple strings , no matter how big the source... plus Thottbot source pages arent to big anywayz... Enjoy and good luck :)


*WoW Dev Projects: AFK Tele Bot development journalSimple Player Pointer Scanner + Z-Teleport*My Projects: coming soon.Check out my WoW Dev wiki for patch 3.0.9!http://www.wowdev.wikidot.com

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0