Sign in to follow this  
Followers 0
StungStang

Grab info

12 posts in this topic

Hi to all, i've another question for you =P

I've this html page :

<TABLE WIDTH="452" BORDER="0" CELLSPACING="0" CELLPADDING="0" BGCOLOR="#A6B9C8">
            <TD WIDTH="125" VALIGN="TOP" BGCOLOR="#A6B9C8"><FONT FACE="Arial, Helvetica"><B>MY INFO 1</B></FONT></TD>
            <TD WIDTH="5" VALIGN="TOP" BGCOLOR="#A6B9C8"><IMG SRC="../../images/hello.gif" WIDTH=4 HEIGHT=1 BORDER=0 ALT=""></TD>
            <TD WIDTH="121" VALIGN="TOP" BGCOLOR="#A6B9C8"><FONT FACE="Arial, Helvetica" SIZE="-1">MY INFO 2</FONT></TD>
            <TD WIDTH="121" VALIGN="TOP" BGCOLOR="#A6B9C8"><FONT FACE="Arial, Helvetica" SIZE="-1">MY INFO 3</FONT></TD>
            <TD WIDTH="80" VALIGN="TOP" BGCOLOR="#A6B9C8"><A HREF="http://MY INFO 4.com"><IMG SRC="../../images/exit.gif" WIDTH=78 HEIGHT=16 BORDER=0 ALT="Exit"></A></TD>
        </TABLE>

Now i want grab for example "MY INFO 1" , "MY INFO 2", "MY INFO 3", "MY INFO 4"...of course the value "MY INFO" are only an example :)...

Now i can grab with my script only the "MY INFO 1" with this code :

$MYINFO1 = _StringBetween ($Source,'<FONT FACE="Arial, Helvetica"><B>','  </B>'

But i've problem to grab the other INFO...how i can grab the other info?...

Thanks a lot!

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

Use the _IE* functions of the IE.au3 UDF. See help file.

You could use _IETableWriteToArray(), or get a collection of the TD tags with _IETagNameGetCollection() and loop through it getting the value of each one with _IEPropertyGet() for "innerText".

If you have to string parse it, look into StringRegExp(). See help file.

Example with StringRegExp():

#include <Array.au3>

$sString = '<TABLE WIDTH="452" BORDER="0" CELLSPACING="0" CELLPADDING="0" BGCOLOR="#A6B9C8">' & _
        '<TD WIDTH="125" VALIGN="TOP" BGCOLOR="#A6B9C8"><FONT FACE="Arial, Helvetica"><B>MY INFO 1</B></FONT></TD>' & _
        '<TD WIDTH="5" VALIGN="TOP" BGCOLOR="#A6B9C8"><IMG SRC="../../images/hello.gif" WIDTH=4 HEIGHT=1 BORDER=0 ALT=""></TD>' & _
        '<TD WIDTH="121" VALIGN="TOP" BGCOLOR="#A6B9C8"><FONT FACE="Arial, Helvetica" SIZE="-1">MY INFO 2</FONT></TD>' & _
        '<TD WIDTH="121" VALIGN="TOP" BGCOLOR="#A6B9C8"><FONT FACE="Arial, Helvetica" SIZE="-1">MY INFO 3</FONT></TD>' & _
        '<TD WIDTH="80" VALIGN="TOP" BGCOLOR="#A6B9C8"><A HREF="http://MY INFO 4.com"><IMG SRC="../../images/exit.gif" WIDTH=78 HEIGHT=16 BORDER=0 ALT="Exit"></A></TD>' & _
        '</TABLE>'

$aRET = StringRegExp($sString, '(?U)(?:>)(?:<B>)?([^<]+)(?:</B>)?(?:</FONT)', 3)
_ArrayDisplay($aRET)

:)

Edited by PsaltyDS

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law

Share this post


Link to post
Share on other sites

StungStang,

This does it: ;)

#include <Array.au3>

$sText = '<TABLE WIDTH="452" BORDER="0" CELLSPACING="0" CELLPADDING="0" BGCOLOR="#A6B9C8">' & _
            '<TD WIDTH="125" VALIGN="TOP" BGCOLOR="#A6B9C8"><FONT FACE="Arial, Helvetica"><B>MY INFO 1</B></FONT></TD>' & _
            '<TD WIDTH="5" VALIGN="TOP" BGCOLOR="#A6B9C8"><IMG SRC="../../images/hello.gif" WIDTH=4 HEIGHT=1 BORDER=0 ALT=""></TD>' & _
            '<TD WIDTH="121" VALIGN="TOP" BGCOLOR="#A6B9C8"><FONT FACE="Arial, Helvetica" SIZE="-1">MY INFO 2</FONT></TD>' & _
            '<TD WIDTH="121" VALIGN="TOP" BGCOLOR="#A6B9C8"><FONT FACE="Arial, Helvetica" SIZE="-1">MY INFO 3</FONT></TD>' & _
            '<TD WIDTH="80" VALIGN="TOP" BGCOLOR="#A6B9C8"><A HREF="http://MY INFO 4.com"><IMG SRC="../../images/exit.gif" WIDTH=78 HEIGHT=16 BORDER=0 ALT="Exit"></A></TD>' & _
        '</TABLE>'

$aResult = StringSplit(StringRegExpReplace(StringRegExpReplace($sText, "<(.*?)>", "<>"), "<[>|<]+>", "<>"), "<>", 1)

$sInfo4 = StringRegExpReplace($sText, ".*http:\/\/(.*?).com.*", "$1")

ConsoleWrite($aResult[2] & @CRLF)
ConsoleWrite($aResult[3] & @CRLF)
ConsoleWrite($aResult[4] & @CRLF)
ConsoleWrite($sInfo4 & @CRLF)

But I am sure a real SRE guru will come along and do it one line in a minute! :idiot:

The SREs are pretty simple, but do ask if you want them explained. :)

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Ops, i've many tables structured like that in a html page, but all "MY INFO" value are different. How i can adapt the scipt to read the contents of all tables?

Thanks :)

Edited by StungStang

Share this post


Link to post
Share on other sites

StungStang,

In the first line, the innermost SRER removes any characters within <>. The second SRER then collapses multiple consecutive <> into a single <>. Finally the StringSplit splits the result on these <> and retrieves the remaining text which was not within <> at the start - and that is the first 3 items you are looking for.

The second line looks for any text betweem "http://" and ".com" - which is how you get the final item.

So as long as your items match those criteria you can pull them from any page. if you need info which does not match those criteria then you will have to develop new SREs. You will really enjoy that - they are so much fun, they make my brain bleed! :)

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

PsaltyDS,

learn to use the _IE* functions

Quite agree, but he did ask. :)

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

#8 ·  Posted (edited)

This is an example of my html file see --->HERE<---

How you can see i have 10 equal table but with differen "MY INFO EXAMPLE" value.

Be carefull the Last MY INFO VALUE at the end may have .com or.net, etc... for example (http://www.MYINFO.com or http://www.MYINFO.net)

I try with :

#include <String.au3>
#include <Array.au3>
#include <File.au3>
#include <Inet.au3>

$Input = InputBox ("","Link")
$Source = _INetGetSource ($Input)
$aResult = StringSplit(StringRegExpReplace(StringRegExpReplace($Source, "<(.*?)>", "<>"), "<[>|<]+>", "<>"), "<>", 1)
$sInfo4 = StringRegExpReplace($Source, ".*http:\/\/(.*?).com.*", "$1")

ConsoleWrite($aResult[2] & @CRLF)
ConsoleWrite($aResult[3] & @CRLF)
ConsoleWrite($aResult[4] & @CRLF)
ConsoleWrite($sInfo4 & @CRLF)

But don't work :)...how i can fix that?

Thanks!

Edited by StungStang

Share this post


Link to post
Share on other sites

StungStang,

This works reasonably well - although you do get some false URL matches as well: :)

; Get all text not between <>
$aResult = StringSplit(StringRegExpReplace(StringRegExpReplace($Source, "<(.*?)>", "<>"), "<[>|<]+>", "<>"), "<>", 1)
_ArrayDisplay($aResult)

; Get any URLS then are between "http://" and either ".com" or ".net"
$aURL = StringRegExp($Source, "(?i)http:\/\/(.*?)\.(?:com|net)", 3)
_ArrayDisplay($aURL)

But as PsaltyDS keeps pointing out - you would be much better off using the IE functions that he suggested above. I have only been playing with these SREs for my own amusement and learning. ;)

M23


Any of my own code posted anywhere on the forum is available for use by others without any restriction of any kind._______My UDFs:

Spoiler

ArrayMultiColSort ---- Sort arrays on multiple columns
ChooseFileFolder ---- Single and multiple selections from specified path treeview listing
Date_Time_Convert -- Easily convert date/time formats, including the language used
ExtMsgBox --------- A highly customisable replacement for MsgBox
GUIExtender -------- Extend and retract multiple sections within a GUI
GUIFrame ---------- Subdivide GUIs into many adjustable frames
GUIListViewEx ------- Insert, delete, move, drag, sort, edit and colour ListView items
GUITreeViewEx ------ Check/clear parent and child checkboxes in a TreeView
Marquee ----------- Scrolling tickertape GUIs
NoFocusLines ------- Remove the dotted focus lines from buttons, sliders, radios and checkboxes
Notify ------------- Small notifications on the edge of the display
Scrollbars ----------Automatically sized scrollbars with a single command
StringSize ---------- Automatically size controls to fit text
Toast -------------- Small GUIs which pop out of the notification area

 

Share this post


Link to post
Share on other sites

@M23

It's dont work for me :)...becouse your funcion grab all item Between "<" and ">". But on my html there arent only this table...this 10 table are only a part of my source. But i am interested only on table contenent...

I see the IE function help...but honestly I didn't understand anything ;)

Another soluction?

Thanks

Share this post


Link to post
Share on other sites

StungStang,

Look at _IETableGetCollection. Run the example. It is obvious what is going on, and, how it fits processing your 10 tables.

kylomas


Forum Rules         Procedure for posting code

"I like pigs.  Dogs look up to us.  Cats look down on us.  Pigs treat us as equals."

- Sir Winston Churchill

Share this post


Link to post
Share on other sites

@kylomas

This metod doesnt work...my html page give a redirect...for cath my table i've to disable the hotredirect on my browser...

Another soluction..._IE function dont work for page :)

Hi!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0