Sign in to follow this  
Followers 0
incepator

TD extract algoritm

10 posts in this topic

#1 ·  Posted (edited)

hello!

I have a small problem with extracting data from a table.

I want to extract these names ...

Aer conditionat mobil

Aer conditionat

Agentii imobiliare

etc...

how can I do that?

<TD> and </ TD>

this is my script:

#include <Inet.au3>
#NoTrayIcon
#include <String.au3>
#include <ButtonConstants.au3>
#include <GUIConstantsEx.au3>
#include <ListViewConstants.au3>
#include <WindowsConstants.au3>
#include <Ie.au3>
#Region ### START Koda GUI section ### Form=
$Form1 = GUICreate("Form1", 404, 292, -1, -1)
$Button1 = GUICtrlCreateButton("Arata", 8, 8, 75, 25)
$ListView1 = GUICtrlCreateListView("#Nr|#Name", 8, 40, 386, 238)
GUICtrlSendMsg(-1, $LVM_SETCOLUMNWIDTH, 0, 50)
GUICtrlSendMsg(-1, $LVM_SETCOLUMNWIDTH, 1, 300)
GUISetState(@SW_SHOW)
#EndRegion ### END Koda GUI section ###
While 1
$nMsg = GUIGetMsg()
Switch $nMsg
Case $GUI_EVENT_CLOSE
Exit
Case $Button1
     $link =_IECreate("file:///H:/Tabel%20De%20Facut/index.html",Default,1,1,0)
     _IELoadWait($link)
     Global $SourceText = _IEBodyReadHTML($link)
     _IEQuit($link)
     $string1 = _StringBetween($SourceText,'<TD>','</TD>')
     For $1 = 0 To 106
     GUICtrlCreateListViewItem($1+1&"|"&$string1[$1], $ListView1)
     Next
     MsgBox(0,"","Finish !")
EndSwitch
WEnd

index.rar

Edited by incepator

Share this post


Link to post
Share on other sites



use these instead:

_IETableGetCollection

_IETableWriteToArray


IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

if you analyzed the problem exposed by me above, you'd have realized that these functions do not help me with anything, the problem is that TD is everywhere ...

i need an algoritm ...

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

how about post the table, with removed private data. Those functions will return what you need (when you select the proper parent table), and then you can loop through the array as necessary.

so, there are <th>/<tr>/<td>

th is header, tr is a row, td is the data within the row.

So when you read in the proper table, and print it to the array, you can loop through the array as you need.

http://www.w3schools.com/html/html_tables.asp

Edited by jdelaney

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

understand what you say, but it does not work in my case, look in the index.html file ...

Thank you!

Share this post


Link to post
Share on other sites

I prefer to not download things from forums. Can you post the code, and surround it in the proper [] tags...html in this case. Else, someone will be around, probably tomorrow.


IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

ok, this is a part of the code:

<tr>
                        <td><a href="http://www.apartamentedevanzare.org" target="_blank">www.apartamentedevanzare.org</a></td>
                        <td>Apartamente de vanzare</td>
                        <td>
                            <span class="btn small blue oai_seo" data-ttgoogle="16" data-ttyahoo="-" data-ttbing="-" data-ttwords="apartamente de vanzare, vanzari apartamente">Vezi pozitie</span>
                        </td>
                        <td>
                                                        <a class="btn small blue" href=" http://wstats.net/ro/website/apartamentedevanzare.org " target="_blank">Google analytics</a>
                                                        </td>
                        <td>
               <a href="http://secret.ideideafaceri.org/site/editare/22" class="btn small green">Editare</a>             </td>
                        <td><a href="http://secret.ideideafaceri.org/site/sterge/22" class="delete_confirm btn small orange">Sterge</a></td>
                    </tr>
                                    <tr>
                        <td><a href="http://www.autosecondhand.org" target="_blank">www.autosecondhand.org</a></td>
                        <td>Auto second hand</td>
                        <td>
                            <span class="btn small blue oai_seo" data-ttgoogle="n/a" data-ttyahoo="-" data-ttbing="-" data-ttwords="auto second hand, auto secondhand">Vezi pozitie</span>
                        </td>
                        <td>
                                                        <a class="btn small blue" href="http://wstats.net/ro/website/autosecondhand.org " target="_blank">Google analytics</a>
                                                        </td>
                        <td>
               <a href="http://secret.ideideafaceri.org/site/editare/98" class="btn small green">Editare</a>             </td>
                        <td><a href="http://secret.ideideafaceri.org/site/sterge/98" class="delete_confirm btn small orange">Sterge</a></td>
                    </tr>
                                    <tr>
                        <td><a href="http://www.bijuterii-aur-argint.org" target="_blank">www.bijuterii-aur-argint.org</a></td>
                        <td>Bijuterii argint</td>
                        <td>
                            <span class="btn small blue oai_seo" data-ttgoogle="n/a" data-ttyahoo="-" data-ttbing="-" data-ttwords="bijuterii argint, bijuterii aur">Vezi pozitie</span>
                        </td>
                        <td>
                                                        <a class="btn small blue" href=" http://wstats.net/ro/website/bijuterii-aur-argint.org " target="_blank">Google analytics</a>
                                                        </td>
                        <td>
               <a href="http://secret.ideideafaceri.org/site/editare/23" class="btn small green">Editare</a>             </td>
                        <td><a href="http://secret.ideideafaceri.org/site/sterge/23" class="delete_confirm btn small orange">Sterge</a></td>
                    </tr>
                                    <tr>
                        <td><a href="http://www.biletavion.org" target="_blank">www.biletavion.org</a></td>
                        <td>Bilete de avion</td>
                        <td>
                            <span class="btn small blue oai_seo" data-ttgoogle="27" data-ttyahoo="-" data-ttbing="-" data-ttwords="bilete de avion, bilet avion, bilete de avion low cost, bilete de avion ieftine">Vezi pozitie</span>
                        </td>
                        <td>
                                                        <a class="btn small blue" href=" http://wstats.net/ro/website/biletavion.org " target="_blank">Google analytics</a>
                                                        </td>
                        <td>
               <a href="http://secret.ideideafaceri.org/site/editare/13" class="btn small green">Editare</a>             </td>
                        <td><a href="http://secret.ideideafaceri.org/site/sterge/13" class="delete_confirm btn small orange">Sterge</a></td>
                    </tr>
                                    <tr>
                        <td><a href="http://www.cabinet--stomatologic.org" target="_blank">www.cabinet--stomatologic.org</a></td>
                        <td>Cabinet stomatologic</td>
                        <td>
                            <span class="btn small blue oai_seo" data-ttgoogle="n/a" data-ttyahoo="-" data-ttbing="-" data-ttwords="cabinet stomatologic">Vezi pozitie</span>
                        </td>
                        <td>
                                                        <a class="btn small blue" href="http://wstats.net/ro/website/cabinet--stomatologic.org " target="_blank">Google analytics</a>
                                                        </td>
                        <td>
               <a href="http://secret.ideideafaceri.org/site/editare/92" class="btn small green">Editare</a>             </td>
                        <td><a href="http://secret.ideideafaceri.org/site/sterge/92" class="delete_confirm btn small orange">Sterge</a></td>
                    </tr>
                                    <tr>
                        <td><a href="http://www.cadouricadou.org" target="_blank">www.cadouricadou.org</a></td>
                        <td>Cadouri</td>
                        <td>
                            <span class="btn small blue oai_seo" data-ttgoogle="n/a" data-ttyahoo="-" data-ttbing="-" data-ttwords="cadouri, cadou">Vezi pozitie</span>
                        </td>
                        <td>
                                                        <a class="btn small blue" href=" http://wstats.net/ro/website/cadouricadou.org " target="_blank">Google analytics</a>
                                                        </td>
                        <td>
               <a href="http://secret.ideideafaceri.org/site/editare/24" class="btn small green">Editare</a>             </td>
                        <td><a href="http://secret.ideideafaceri.org/site/sterge/24" class="delete_confirm btn small orange">Sterge</a></td>
                    </tr>
                                    <tr>
                        <td><a href="http://www.camere--supraveghere.org" target="_blank">www.camere--supraveghere.org</a></td>
                        <td>Camere supraveghere</td>
                        <td>
                            <span class="btn small blue oai_seo" data-ttgoogle="n/a" data-ttyahoo="-" data-ttbing="-" data-ttwords="camere supraveghere, supraveghere video">Vezi pozitie</span>
                        </td>
                        <td>
                                                        <a class="btn small blue" href=" http://wstats.net/ro/website/camere--supraveghere.org " target="_blank">Google analytics</a>
                                                        </td>
                        <td>

ex:

this:

<td><a href="http://www.cabinet--stomatologic.org" target="_blank">www.cabinet--stomatologic.org</a></td>

<td>Cabinet stomatologic</td>

<td>

are "td" everywhere, how can I get delimitation thus can only

Cabinet stomatologic

Share this post


Link to post
Share on other sites

#9 ·  Posted (edited)

seems standard to me. You are looking for the second <td> in each row...if you use the _IETableToArray, it should be easy.

damn autoformatter!

anyways look above for the solution :)

Edited by jdelaney

IEbyXPATH-Grab IE DOM objects by XPATH IEscriptRecord-Makings of an IE script recorder ExcelFromXML-Create Excel docs without excel installed GetAllWindowControls-Output all control data on a given window.

Share this post


Link to post
Share on other sites

#include <Inet.au3>
#NoTrayIcon
#include <String.au3>
#include <ButtonConstants.au3>
#include <GUIConstantsEx.au3>
#include <ListViewConstants.au3>
#include <WindowsConstants.au3>
#include <Ie.au3>
#Region ### START Koda GUI section ### Form=
$Form1 = GUICreate("Form1", 404, 292, -1, -1)
$Button1 = GUICtrlCreateButton("Arata", 8, 8, 75, 25)
$ListView1 = GUICtrlCreateListView("#Nr|#Name", 8, 40, 386, 238)
GUICtrlSendMsg(-1, $LVM_SETCOLUMNWIDTH, 0, 50)
GUICtrlSendMsg(-1, $LVM_SETCOLUMNWIDTH, 1, 300)
GUISetState(@SW_SHOW)
#EndRegion ### END Koda GUI section ###
While 1
$nMsg = GUIGetMsg()
Switch $nMsg
  Case $GUI_EVENT_CLOSE
   Exit
  Case $Button1
   $link =_IECreate("file:///H:/Tabel%20De%20Facut/index.html",Default,0,1,0)
       _IELoadWait($link)
       Local $oTable = _IETableGetCollection($link, 0)
       Local $aTableData = _IETableWriteToArray($oTable)
     _IEQuit($link)
      For $1 = 0 To 106
       GUICtrlCreateListViewItem($1+1&"|"&$aTableData[1][$1], $ListView1)
       Sleep(10)
      Next
       MsgBox(0,"Info","Finish !")
EndSwitch
WEnd

RESOLVED !

thank you very much

jdelaney and DanP2

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0