Sign in to follow this  
Followers 0
Rorka

Rip text from a website?

6 posts in this topic

Okay I want to rip text from a website, is this possible?

I want to get the version number which is shown like this:

Current Version: 0.2

So I only require the 0.2 part.

the website link is: Clicky

Thanks in advance

Share this post


Link to post
Share on other sites



#3 ·  Posted (edited)

IE functions will do it.

Read the helpfile: User Defined Function ==> IE Management

Mat

Edit: But InetGet is better

Edited by Mat

Share this post


Link to post
Share on other sites

Try that (worked for me):

#AutoIt3Wrapper_Change2CUI=y

#include <INet.au3>

Global Const $URL = "http://www.mmowned.com/forums/bots-programs/266946-release-beta-theexplorer-updated-04-11-09-a.html"
ConsoleWrite("Loading URL: " & $URL & @CRLF)

Global $Source = _InetGetSource($URL)
If Not StringLen($Source) Then
    ;MsgBox(16, @ScriptName, StringFormat("Error loading the following page.\n\nURL: %s", $URL))
    ConsoleWrite("Error loading URL." & @CRLF) ; '" & $URL & "'" & @CRLF)

    Exit
EndIf

; Find posts div
Global $CurrentVersion = ""
Global $posPos = 0
Global $divPos = StringRegExp($Source,'<div[^>]+id\s*=\s*"posts"[^>]*>', 0)
If $divPos Then
    $divPos = @extended
    ; Find first post
    Global $PostID = StringRegExp($Source, '<div[^>]+id\s*=\s*"post_message_(\d+)"[^>]*>', 1, $divPos)
    $postPos = @extended

    If Not @error And StringIsInt($PostID[0]) Then
        ; Find Current version
        $CurrentVersion = StringRegExp($Source, "(?i)Current\s*Version[^:]*:\s*(\d+\.\d+)", 1, $postPos)
        If Not @error Then
            $CurrentVersion = $CurrentVersion[0]
        EndIf
    EndIf
EndIf

If Not StringLen($CurrentVersion) Then
    ;MsgBox(64, @ScriptName, "Current Version: " & $CurrentVersion)
    ConsoleWrite("Version not found" & @CRLF)
Else
    ;MsgBox(16, @ScriptName, "Current Version not found.")
    ConsoleWrite("Current Version: " & $CurrentVersion & @CRLF)
EndIf

; Sleep a little so it's actually possible to check the version
If @Compiled Then
    Sleep(5000)
EndIf

Share this post


Link to post
Share on other sites

Try that (worked for me):

#AutoIt3Wrapper_Change2CUI=y

#include <INet.au3>

Global Const $URL = "http://www.mmowned.com/forums/bots-programs/266946-release-beta-theexplorer-updated-04-11-09-a.html"
ConsoleWrite("Loading URL: " & $URL & @CRLF)

Global $Source = _InetGetSource($URL)
If Not StringLen($Source) Then
    ;MsgBox(16, @ScriptName, StringFormat("Error loading the following page.\n\nURL: %s", $URL))
    ConsoleWrite("Error loading URL." & @CRLF) ; '" & $URL & "'" & @CRLF)

    Exit
EndIf

; Find posts div
Global $CurrentVersion = ""
Global $posPos = 0
Global $divPos = StringRegExp($Source,'<div[^>]+id\s*=\s*"posts"[^>]*>', 0)
If $divPos Then
    $divPos = @extended
    ; Find first post
    Global $PostID = StringRegExp($Source, '<div[^>]+id\s*=\s*"post_message_(\d+)"[^>]*>', 1, $divPos)
    $postPos = @extended

    If Not @error And StringIsInt($PostID[0]) Then
        ; Find Current version
        $CurrentVersion = StringRegExp($Source, "(?i)Current\s*Version[^:]*:\s*(\d+\.\d+)", 1, $postPos)
        If Not @error Then
            $CurrentVersion = $CurrentVersion[0]
        EndIf
    EndIf
EndIf

If Not StringLen($CurrentVersion) Then
    ;MsgBox(64, @ScriptName, "Current Version: " & $CurrentVersion)
    ConsoleWrite("Version not found" & @CRLF)
Else
    ;MsgBox(16, @ScriptName, "Current Version not found.")
    ConsoleWrite("Current Version: " & $CurrentVersion & @CRLF)
EndIf

; Sleep a little so it's actually possible to check the version
If @Compiled Then
    Sleep(5000)
EndIf

Thanks.

Share this post


Link to post
Share on other sites

Edit: But InetGet is better

Not better, just different. If the content you want is created with dynamic HTML or is behind authentication, INetGet will not do it for you, but the IE functions will.

Dale


Free Internet Tools: DebugBar, AutoIt IE Builder, HTTP UDF, MODIV2, IE Developer Toolbar, IEDocMon, Fiddler, HTML Validator, WGet, curl

MSDN docs: InternetExplorer Object, Document Object, Overviews and Tutorials, DHTML Objects, DHTML Events, WinHttpRequest, XmlHttpRequest, Cross-Frame Scripting, Office object model

Automate input type=file (Related)

Alternative to _IECreateEmbedded? better: _IECreatePseudoEmbedded  Better Better?

IE.au3 issues with Vista - Workarounds

SciTe Debug mode - it's magic: #AutoIt3Wrapper_run_debug_mode=Y Doesn't work needs to be ripped out of the troubleshooting lexicon. It means that what you tried did not produce the results you expected. It begs the questions 1) what did you try?, 2) what did you expect? and 3) what happened instead?

Reproducer: a small (the smallest?) piece of stand-alone code that demonstrates your trouble

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0