Jump to content

Grab info


Recommended Posts

Hi to all, i've another question for you =P

I've this html page :

<TABLE WIDTH="452" BORDER="0" CELLSPACING="0" CELLPADDING="0" BGCOLOR="#A6B9C8">
            <TD WIDTH="125" VALIGN="TOP" BGCOLOR="#A6B9C8"><FONT FACE="Arial, Helvetica"><B>MY INFO 1</B></FONT></TD>
            <TD WIDTH="5" VALIGN="TOP" BGCOLOR="#A6B9C8"><IMG SRC="../../images/hello.gif" WIDTH=4 HEIGHT=1 BORDER=0 ALT=""></TD>
            <TD WIDTH="121" VALIGN="TOP" BGCOLOR="#A6B9C8"><FONT FACE="Arial, Helvetica" SIZE="-1">MY INFO 2</FONT></TD>
            <TD WIDTH="121" VALIGN="TOP" BGCOLOR="#A6B9C8"><FONT FACE="Arial, Helvetica" SIZE="-1">MY INFO 3</FONT></TD>
            <TD WIDTH="80" VALIGN="TOP" BGCOLOR="#A6B9C8"><A HREF="http://MY INFO 4.com"><IMG SRC="../../images/exit.gif" WIDTH=78 HEIGHT=16 BORDER=0 ALT="Exit"></A></TD>
        </TABLE>

Now i want grab for example "MY INFO 1" , "MY INFO 2", "MY INFO 3", "MY INFO 4"...of course the value "MY INFO" are only an example :)...

Now i can grab with my script only the "MY INFO 1" with this code :

$MYINFO1 = _StringBetween ($Source,'<FONT FACE="Arial, Helvetica"><B>','  </B>'

But i've problem to grab the other INFO...how i can grab the other info?...

Thanks a lot!

Link to comment
Share on other sites

Use the _IE* functions of the IE.au3 UDF. See help file.

You could use _IETableWriteToArray(), or get a collection of the TD tags with _IETagNameGetCollection() and loop through it getting the value of each one with _IEPropertyGet() for "innerText".

If you have to string parse it, look into StringRegExp(). See help file.

Example with StringRegExp():

#include <Array.au3>

$sString = '<TABLE WIDTH="452" BORDER="0" CELLSPACING="0" CELLPADDING="0" BGCOLOR="#A6B9C8">' & _
        '<TD WIDTH="125" VALIGN="TOP" BGCOLOR="#A6B9C8"><FONT FACE="Arial, Helvetica"><B>MY INFO 1</B></FONT></TD>' & _
        '<TD WIDTH="5" VALIGN="TOP" BGCOLOR="#A6B9C8"><IMG SRC="../../images/hello.gif" WIDTH=4 HEIGHT=1 BORDER=0 ALT=""></TD>' & _
        '<TD WIDTH="121" VALIGN="TOP" BGCOLOR="#A6B9C8"><FONT FACE="Arial, Helvetica" SIZE="-1">MY INFO 2</FONT></TD>' & _
        '<TD WIDTH="121" VALIGN="TOP" BGCOLOR="#A6B9C8"><FONT FACE="Arial, Helvetica" SIZE="-1">MY INFO 3</FONT></TD>' & _
        '<TD WIDTH="80" VALIGN="TOP" BGCOLOR="#A6B9C8"><A HREF="http://MY INFO 4.com"><IMG SRC="../../images/exit.gif" WIDTH=78 HEIGHT=16 BORDER=0 ALT="Exit"></A></TD>' & _
        '</TABLE>'

$aRET = StringRegExp($sString, '(?U)(?:>)(?:<B>)?([^<]+)(?:</B>)?(?:</FONT)', 3)
_ArrayDisplay($aRET)

:)

Edited by PsaltyDS
Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

  • Moderators

StungStang,

This does it: ;)

#include <Array.au3>

$sText = '<TABLE WIDTH="452" BORDER="0" CELLSPACING="0" CELLPADDING="0" BGCOLOR="#A6B9C8">' & _
            '<TD WIDTH="125" VALIGN="TOP" BGCOLOR="#A6B9C8"><FONT FACE="Arial, Helvetica"><B>MY INFO 1</B></FONT></TD>' & _
            '<TD WIDTH="5" VALIGN="TOP" BGCOLOR="#A6B9C8"><IMG SRC="../../images/hello.gif" WIDTH=4 HEIGHT=1 BORDER=0 ALT=""></TD>' & _
            '<TD WIDTH="121" VALIGN="TOP" BGCOLOR="#A6B9C8"><FONT FACE="Arial, Helvetica" SIZE="-1">MY INFO 2</FONT></TD>' & _
            '<TD WIDTH="121" VALIGN="TOP" BGCOLOR="#A6B9C8"><FONT FACE="Arial, Helvetica" SIZE="-1">MY INFO 3</FONT></TD>' & _
            '<TD WIDTH="80" VALIGN="TOP" BGCOLOR="#A6B9C8"><A HREF="http://MY INFO 4.com"><IMG SRC="../../images/exit.gif" WIDTH=78 HEIGHT=16 BORDER=0 ALT="Exit"></A></TD>' & _
        '</TABLE>'

$aResult = StringSplit(StringRegExpReplace(StringRegExpReplace($sText, "<(.*?)>", "<>"), "<[>|<]+>", "<>"), "<>", 1)

$sInfo4 = StringRegExpReplace($sText, ".*http:\/\/(.*?).com.*", "$1")

ConsoleWrite($aResult[2] & @CRLF)
ConsoleWrite($aResult[3] & @CRLF)
ConsoleWrite($aResult[4] & @CRLF)
ConsoleWrite($sInfo4 & @CRLF)

But I am sure a real SRE guru will come along and do it one line in a minute! :idiot:

The SREs are pretty simple, but do ask if you want them explained. :)

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

  • Moderators

StungStang,

In the first line, the innermost SRER removes any characters within <>. The second SRER then collapses multiple consecutive <> into a single <>. Finally the StringSplit splits the result on these <> and retrieves the remaining text which was not within <> at the start - and that is the first 3 items you are looking for.

The second line looks for any text betweem "http://" and ".com" - which is how you get the final item.

So as long as your items match those criteria you can pull them from any page. if you need info which does not match those criteria then you will have to develop new SREs. You will really enjoy that - they are so much fun, they make my brain bleed! :)

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

  • Moderators

PsaltyDS,

learn to use the _IE* functions

Quite agree, but he did ask. :)

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

This is an example of my html file see --->HERE<---

How you can see i have 10 equal table but with differen "MY INFO EXAMPLE" value.

Be carefull the Last MY INFO VALUE at the end may have .com or.net, etc... for example (http://www.MYINFO.com or http://www.MYINFO.net)

I try with :

#include <String.au3>
#include <Array.au3>
#include <File.au3>
#include <Inet.au3>

$Input = InputBox ("","Link")
$Source = _INetGetSource ($Input)
$aResult = StringSplit(StringRegExpReplace(StringRegExpReplace($Source, "<(.*?)>", "<>"), "<[>|<]+>", "<>"), "<>", 1)
$sInfo4 = StringRegExpReplace($Source, ".*http:\/\/(.*?).com.*", "$1")

ConsoleWrite($aResult[2] & @CRLF)
ConsoleWrite($aResult[3] & @CRLF)
ConsoleWrite($aResult[4] & @CRLF)
ConsoleWrite($sInfo4 & @CRLF)

But don't work :)...how i can fix that?

Thanks!

Edited by StungStang
Link to comment
Share on other sites

  • Moderators

StungStang,

This works reasonably well - although you do get some false URL matches as well: :)

; Get all text not between <>
$aResult = StringSplit(StringRegExpReplace(StringRegExpReplace($Source, "<(.*?)>", "<>"), "<[>|<]+>", "<>"), "<>", 1)
_ArrayDisplay($aResult)

; Get any URLS then are between "http://" and either ".com" or ".net"
$aURL = StringRegExp($Source, "(?i)http:\/\/(.*?)\.(?:com|net)", 3)
_ArrayDisplay($aURL)

But as PsaltyDS keeps pointing out - you would be much better off using the IE functions that he suggested above. I have only been playing with these SREs for my own amusement and learning. ;)

M23

Public_Domain.png.2d871819fcb9957cf44f4514551a2935.png Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind

Open spoiler to see my UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Link to comment
Share on other sites

@M23

It's dont work for me :)...becouse your funcion grab all item Between "<" and ">". But on my html there arent only this table...this 10 table are only a part of my source. But i am interested only on table contenent...

I see the IE function help...but honestly I didn't understand anything ;)

Another soluction?

Thanks

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...