Sign in to follow this  
Followers 0
atnextc

Pull Data From Media Wiki Page

14 posts in this topic

#1 ·  Posted (edited)

I'm trying to pull data from our internal media wiki page.  I have a script that pulls the html and extracts the information, writes it to a file, throws it in an array, etc.  This is to slow and doesn't always work as expected.

So I'm now trying to pull the data from the wiki page directly using the following code below.

I'm not sure this is the correct way to go as this doesn't print anything to the console. It throws an error regarding IE.au3

#include <IE.au3>

Local $oIE = _IECreate("http://interalsite/mediawiki/index.php?title=C-ASHB/MRFD-01A-Q2-2013&action=edit")
;_IELinkClickByText($oIE, "Continue to this website (not recommended).")
Local $oForm = _IEFormGetObjByName($oIE, "editform action=/mediawiki/index.php?title=C-ASHB/MRFD-01A-Q2-2013&action=submit")
Local $oQuery = _IEFormElementGetObjByName($oForm, "wpTextBox1")
ConsoleWrite($oQuery)

I grabbed the Form name and element using debug bar, but i'm not sure if this is the correct syntax to use.

Any help would be greatly appreciated.

IE.au3 Error below

--> IE.au3 V2.4-0 Warning from function _IEFormGetObjByName, $_IEStatus_NoMatch
--> IE.au3 V2.4-0 Error from function _IEFormElementGetObjByName, $_IEStatus_InvalidDataType
Edited by atnextc

Share this post


Link to post
Share on other sites



Did you try this?

_IEBodyReadHTML

Scripts & functions Organize Includes Let Scite organize the include files

Yahtzee The game "Yahtzee" (Kniffel, DiceLion)

LoginWrapper Secure scripts by adding a query (authentication)

_RunOnlyOnThis UDF Make sure that a script can only be executed on ... (Windows / HD / ...)

Internet-Café Server/Client Application Open CD, Start Browser, Lock remote client, etc.

MultipleFuncsWithOneHotkey Start different funcs by hitting one hotkey different times

Share this post


Link to post
Share on other sites

I did, but that didn't really work either....wasn't consistent, and it was slow. Plus it still had to parse the data.

Share this post


Link to post
Share on other sites

I'm 99% sure that there isn't a form named " editform action=/mediawiki/index.php?title=C-ASHB/MRFD-01A-Q2-2013&action=submit". Start with that,

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

@hiho  See attachment from debugbar.

Looks right to me, unless i'm misreading something.

I have also found that all the data that I need is in "TEXTAREA name= wpTextbox1"

See attachment.

post-26087-0-98076500-1370577813_thumb.p

post-26087-0-26524300-1370577961_thumb.p

Edited by atnextc

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

"pull data" is a question with many answers. You do not clearly specify what you want as result ?

From mediaWiki you can directly do something like 

http://en.wikipedia.org/wiki/Special:Export/PageNameHere

This will give you an XML dump with original wikiformat.

Would you like HTML output ?

http://en.wikipedia.org/w/index.php?title=test&action=render

http://en.wikipedia.org/w/index.php?title=test&action=raw

You can even get JSON out..

http://en.wikipedia.org/w/api.php?format=json&action=query&titles=Main%20Page&prop=revisions&rvprop=content

In my view all much simpler than parsing an editbox. You may have your reasons for doing this though.

Sorry that I do not have time to post AutoIT code. The links are broken on bold text, hope you get point.

EDIT: forgot that this would be useful : http://www.mediawiki.org/wiki/API .. and this one.. http://www.mediawiki.org/w/api.php?action=help

Edited by Myicq

I am just a hobby programmer, and nothing great to publish right now.

Share this post


Link to post
Share on other sites

#7 ·  Posted (edited)

I'm 99% sure that there isn't a form named " editform action=/mediawiki/index.php?title=C-ASHB/MRFD-01A-Q2-2013&action=submit". Start with that,

 

Hiho is correct. The actual form name is "editform".

Edited by DanP2

Share this post


Link to post
Share on other sites

hiho DanP2, I see now I was reading that line as one when in fact the space actually matters....i'll try that and report back..

Myicq, thanks for your post i'll check those links out and post again if i need anymore help.

Share this post


Link to post
Share on other sites

I was able to get the contents of the "form/editbox" that I was looking for.

Having a bit of an issue getting _StringBetween to behave nicely.

 

$ZSideInterface= xe-3/4/5
_StringBetween($ZSideinterface,'-','/')
FileWrite($rdcfile, $ZSideinterface)
 
I would think that ZSideInterface would be written to the file as "3".  Its returning the whole string of $ZSideInterface.
 
Any ideas?

Share this post


Link to post
Share on other sites

What does the help file say that _StringBetween returns?


If I posted any code, assume that code was written using the latest release version unless stated otherwise. Also, if it doesn't work on XP I can't help with that because I don't have access to XP, and I'm not going to.
Give a programmer the correct code and he can do his work for a day. Teach a programmer to debug and he can do his work for a lifetime - by Chirag Gude
How to ask questions the smart way!

I hereby grant any person the right to use any code I post, that I am the original author of, on the autoitscript.com forums, unless I've specifically stated otherwise in the code or the thread post. If you do use my code all I ask, as a courtesy, is to make note of where you got it from.

Back up and restore Windows user files _Array.au3 - Modified array functions that include support for 2D arrays.  -  ColorChooser - An add-on for SciTE that pops up a color dialog so you can select and paste a color code into a script.  -  Customizable Splashscreen GUI w/Progress Bar - Create a custom "splash screen" GUI with a progress bar and custom label.  -  _FileGetProperty - Retrieve the properties of a file  -  SciTE Toolbar - A toolbar demo for use with the SciTE editor  -  GUIRegisterMsg demo - Demo script to show how to use the Windows messages to interact with controls and your GUI.  -   Latin Square password generator

Share this post


Link to post
Share on other sites

Returns the string between the start search string and the end search string.

 

 

Which i've read, but its not "returning" anything so to speak, its like it's not even reading that line in the script.

Share this post


Link to post
Share on other sites

 

Returns the string between the start search string and the end search string.

Check again :P

Share this post


Link to post
Share on other sites

Ok guys so i'm back..... I was able to get the last issue sorted out. So, thanks for that.

I have a new issue though.  My script as of now is doing what's its being told to do and working, however it's pulling information, that after i'm done manipulating it will end up being duplicate information that will not need to be written to my text file.  All of this is happening in a loop, script runs for ($x) times, so I can in theory have the same information populating my text file 14,15 times.

I"ve been looking at the idea of "arrays within arrays" but i'm still new and learning how to use them correctly.

EXAMPLE OF GENERATED TEXT FILE BELOW JUST ASSUMING WERE USING 2 PROJECTS:

=====================================================================================

Project: Project Name Test #1

These projects are to add an additional 110GB of capacity between Router1 and Router2.

Router1: Install Line Card into slot 3

 

Project: Project Name Test #2

These projects are to add an additional 110GB of capacity between Router1 and Router2.

Router1: Install Line Card into slot 3

 
=====================================================================================
 
 
The issues are:
 
1. " The line about the purpose of the projects is redundant after the first project since all of the requested work is happening for the same location.
2. "Router1: Install Line Card into slot 3"  <------- You can only install a physical piece of hardware 1 time, yet its being generated to install it on project #2 given the way our wiki page is written.
 
 
 
 
What I"m Hoping To Accomplish:
 
I would like to be able to detect these duplicate and erroneous errors "mid-script" and negate them ever even being written to the file.
 
 
Sample Code Below:
 
 
How Data Is Being Collected And Read:
 
#comments-start

        THIS PART OF THE CODE WILL LOOK AT THE INPUT PROJECT BOX AND READ THE PROJECT NAME
        THEN IT WILL LOAD THAT PROJECT PAGE UP IN A BROWSER INVISIBLE TO THE USER AND GRAB THE CONTENTS
        OF THE EDITBOX WHICH CONTAINS ALL OF THE INFORMATION FOR THE SPECIFIED PROJECT!!

        ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
    #comments-end

    Local $oIE = _IECreate("https://internalsite.net/mediawiki/index.php?title=" & GUICtrlRead($ProjectNameInputBox) & "&action=edit", 0, 0, 1, 1)
    _IELinkClickByText($oIE, "Continue to this website (not recommended).")
    Local $oDiv = _IEGetObjById($oIE, "wpTextbox1")
    $HTML = (_IEPropertyGet($oDiv, "innertext") & @CRLF)
    _IEQuit($oIE)

    #comments-start

        THIS PORTION OF THE CODE WILL TAKE THE DATA FROM THE WIKIPAGE AND CLEAN UP SOME FORMATTING.

        1. Replaces "= " with a carraige return
        2. Replaces "=" with a carraige return
        3. Replaces " " with nothing--essentially deleting spaces.

    #comments-end

    ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

    $HTML = StringRegExpReplace($HTML, "= ", @CRLF)
    $HTML = StringRegExpReplace($HTML, "=", @CRLF)
    $HTML = StringRegExpReplace($HTML, " ", "")

    $ArrayText = StringSplit($HTML, @CRLF, 1)

    ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


    ; HERE ARE THE ARRAY INDEX VALUES FOR AN C4 PROJECT
        If $ArrayText[0] = "182" Then
            $ALocation = $ArrayText[36]
            $ZLocation = $ArrayText[38]
            $ASideRouter = $ArrayText[40]
            $ZSideRouter = $ArrayText[42]
            $ASideFPCNeeded = $ArrayText[68]
            EndIf

    $c4file = FileOpen("c4.txt", 1)
    FileWrite($c4file, @CRLF & @CRLF)
    FileWrite($c4file, "Project: ")
    FileWrite($c4file, GUICtrlRead($ProjectNameInputBox))
    FileWrite($c4file, @CRLF & @CRLF)
    FileWrite($c4file, "These projects are to add an additional ")
    FileWrite($c4file, GUICtrlRead($NumberofProjects) * 10)
    FileWrite($c4file, "GB of capacity between ")
    FileWrite($c4file, $ALocation)
    FileWrite($c4file, " and ")
    FileWrite($c4file, $ZLocation)
    FileWrite($c4file, ".")
    FileWrite($c4file, @CRLF & @CRLF)

        If Not $ASideFPCNeeded = "" Then
            $c4file = FileOpen("c4.txt", 1)
            FileWrite($c4file, $ALocation)
            FileWrite($c4file, ": Install FPC into slot ")
            $aArray = _StringBetween($ASideinterface, "-", "/", 3)
            FileWrite($c4file, _ArrayToString($aArray))
            FileWrite($c4file, @CRLF)
            FileClose($c4file)

As always, any help is greatly appreciated thanks alot.

 

Share this post


Link to post
Share on other sites

If i were to write a 2-d array to loop through this information how would i do it?  I was able to get it to loop properly when hard-coding the values, but I am unsure of how I would begin to use a 2-d array to read the data into an array and be able to loop through the data on each run

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0