Sign in to follow this  
Followers 0
duzers

Convert html table to text or cut to text selected information

6 posts in this topic

Hello,

I work with this page URL LINK

I try to make text file or something similar to that:

1</TD>,fin,PLNFI0600010,06MAGNA,06N,

2</TD>,dew,PLNFI0800016,08OCTAVA,08N,

3</TD>,hah,PLABCDT00014,ABCDATA,ABC,

...

I write this code but its work very slowly (to much of line to process).

At the beginning I save the page as tekst.txt, next search string '<TD class="left">' and save next line to file.

The result looks like the above.

$file = FileOpen("C:\tekst.txt", 0)

$i=0
$licz=0
While $i <= 50000
    $line = FileReadLine($file,$i)
    If @error = -1 Then ExitLoop
    if stringcompare($line,'<TD class="left">',2)==0 then
    $licz=$licz+1;
    $line = FileReadLine($file,$i+1)
    MsgBox(0, "Line read:", $line & "linia: "&$licz& "mod: " & mod($licz, 5))
    FileWrite("C:\tekst2.txt",$line&",")
    if mod($licz, 5)=0 then FileWrite("C:\tekst2.txt",@CRLF)
    endif

    $i=$i+1
Wend

FileClose($file)

How to write faster code?

thx

Share this post


Link to post
Share on other sites



Hello duzers,

You should look into the IE.au3 UDF in the AutoIt Help file.

Especially the _IETableGetCollection() and _IETableWriteToArray() functions.

Realm


My Contributions: Unix Timestamp: Calculate Unix time, or seconds since Epoch, accounting for your local timezone and daylight savings time. RegEdit Jumper: A Small & Simple interface based on Yashied's Reg Jumper Function, for searching Hives in your registry.  

Share this post


Link to post
Share on other sites

#3 ·  Posted (edited)

Hello duzers,

You should look into the IE.au3 UDF in the AutoIt Help file.

Especially the _IETableGetCollection() and _IETableWriteToArray() functions.

Realm

Thank You very much!

Look:

#include <IE.au3>
#include <Array.au3>
$oIE = _IECreate ("http://www.gpw.pl/zrodla/gpw/spws/spws1/akc1napl.html",0,0,1,0)
$oTable = _IETableGetCollection ($oIE, 0)
$aTableData = _IETableWriteToArray ($oTable, True)
_ArrayDisplay($aTableData)
Edited by duzers

Share this post


Link to post
Share on other sites

Thank You very much!

My Pleasure, I had the Time :graduated:

My Contributions: Unix Timestamp: Calculate Unix time, or seconds since Epoch, accounting for your local timezone and daylight savings time. RegEdit Jumper: A Small & Simple interface based on Yashied's Reg Jumper Function, for searching Hives in your registry.  

Share this post


Link to post
Share on other sites

Please help!

How save this array to text file?

#include<array.au3> ;; For displaying the arrays only
#include <INet.au3>

$sURL = "http://www.gpw.pl/zrodla/gpw/spws/spws1/akc1napl.html"

$sData =BinaryToString(InetRead($sURL))
;ConsoleWrite($sData)

_GetTables($sData, "");

Func _GetTables($sSource, $sTable = -1) ;
   Local $sRegExp = "(?i)(?s)(<table.+?</table>)", $iRtn = 3
   If $sTable <> -1 Then ;; a particular table was specified
       $sTitle = $sTable
      $sTable = StringRegExpReplace($sTable,"\s+", "\\s*")
      $sRegExp = "(?i)(?s).*(<table.+?" & $sTable & ".+?</table>)"
      $iRtn = 1
   EndIf

   $aTables = StringRegExp($sSource, $sRegExp, $iRtn)
   If NOT @Error Then
      For $i = 0 To Ubound($aTables) -1
         $aTables[$i] = StringRegExpReplace($aTables[$i], "(?i)(</?t)h.*?>", "$1d>")
         StringRegExpReplace($aTables[$i], "</tr>", "$1")
         $iRows = @Extended
         $iCols = _ColCount($aTables[$i])
         Local $aData[$iRows][$iCols]
         _ColData($aTables[$i], $aData)
         _ArrayDisplay($aData, $iRows & " x " & $iCols & " x " & $sTitle)
         ;; If $sTable <> -1 then you can replace the _ArrayDisplay() with Return $aData
      Next
   EndIf
EndFunc   ;<==> _GetTables()

Func _ColData($sTable, ByRef $aArray)
   Local $aRows = StringRegExp($sTable, "(?i)(?s)(<tr.+?</tr>)", 3), $aVlues
   If @Error Then Return SetError(1,1)
   For $i = 0 To Ubound($aRows) -1
      $aValues = StringRegExp($aRows[$i],"(?i)(?s)<td.+?</td>", 3)
      If NOT @Error Then
         For $j = 0 To Ubound($aValues) -1
            $aArray[$i][$j] = _StripTags($aValues[$j])
         Next
      EndIf
   Next
   Return $aArray
EndFunc   ;<==> _ColData()

Func _StripTags($sStr)
   $sStr = StringRegExpReplace($sStr, "(?i)(?s)<.+?>", "")
   $sStr = StringReplace($sStr, "&nbsp;", "")
   Return StringStripWS($sStr,3)
EndFunc   ;<==> _StripTags() 

Func _ColCount($sTable)
   Local $iCount = 0, $iCols
   Local $aRows = StringRegExp($sTable, "(?i)(?s)(<tr.+?</tr>)", 3)
   If @Error Then Return SetError(1,1)
   For $i = 0 To Ubound($aRows) -1
      StringRegExpReplace($aRows[$i], "(?i)(?s)(<td.+?</td>)", "$1")
      $iCols = @Extended
      If $iCols > $iCount Then $iCount = $iCols
      ;MsgBox(0, "Test Row", $aRows[$i])
   Next
   Return $iCount
EndFunc   ;<==> _ColCount()
;

Share this post


Link to post
Share on other sites

#6 ·  Posted (edited)

Hello duzers,

For 1D arrays you can use _FileWriteFromArray(), although your example provides a 2D array which is not supported with that function. However, I happened to write a _FileWriteFromArrayEX(), which supports 2D arrays, with a column(2nd Dimension) delimeter. Note this will not work with arrays larger than 2 dimensions.

#include <FileConstants.au3>
; #FUNCTION# ====================================================================================================================

; Name...........: _FileWriteFromArrayEX
; Description ...: Writes 1D and 2D Array records to the specified file.
; Syntax.........: _FileWriteFromArrayEX($File, $a_Array[, $row_Base = 0[, $row_UBound = 0[, $col_Base = 0[, $col_Ubound = 0[, $iDelimit = ';']]]]])
; Parameters ....: $File    - String path of the file to write to, or a file handle returned from FileOpen().
;                  $a_Array    - The array to be written to the file.
;                  $row_Base   - Optional: Start Array index to read, normally set to 0 or 1. Default=0
;                  $row_Ubound - Optional: Set to the last record you want to write to the File. default=0 - whole array.
;                  $col_Base   - Optional: Set Array index in 2nd Dimension  to read, normally set ot 0 or 1. Default = 0
;                  $col_Ubound - Optional: Set to the last index in the 2nd Dimension you want to write to the File. Default = 0 - all columns
;                  $iDelimit   - Optional: Set the Delimeter for the 2nd Dimension. No character limits. May be set to ' ' for WS
; Return values .: Success - Returns a 1 if 1D Array written to file
;                          - Returns a 2 if 2D Array written to file
;                  Failure - Returns a 0
;                   @Error  - 0 = No error.
;                        |1 = Error opening specified file
;                        |2 = Input is not an Array
;                        |3 = Error writing to file
;                        |4 = Input is larger than a 2D Array
; Original Author: Jos van der Zande <jdeb at autoitscript dot com>
; Modified.......: Updated for file handles by PsaltyDS at the AutoIt forums.
; EX Modified....: Realm at the AutoIt forums
; Remarks .......: If a string path is provided, the file is overwritten and closed.
;                  To use other write modes, like append or Unicode formats, open the file with FileOpen() first and pass the file handle instead.
;                  If a file handle is passed, the file will still be open after writing.
; Related .......: _FileReadToArray, _FileWriteFromArray
; Link ..........:
; Example .......: No
;===============================================================================================================================
;
Func _FileWriteFromArrayEX($File, $a_Array, $row_Base = 0, $row_Ubound = 0, $col_Base = 0, $col_Ubound = 0, $iDelimit = ';')
    ; Check if we have a valid array as input
    If Not IsArray($a_Array) Then Return SetError(2, 0, 0)

    ; determine last Row
    Local $lastRow = UBound($a_Array) - 1
    If $row_Ubound < 1 Or $row_Ubound > $lastRow Then $row_Ubound = $lastRow
    If $row_Base < 0 Or $row_Base > $lastRow Then $row_Base = 0

    ; determine last Column
    Local $dim = UBound($a_Array,0)
    If $dim > 2 Then Return SetError(4, 0, 0)
    If $dim = 2 Then
        Local $lastCol = UBound($a_Array,2) - 1
        If $col_Ubound < 1 Or $col_Ubound > $lastCol Then $col_Ubound = $lastCol
        If $col_Base < 0 Or $col_Base > $lastCol Then $col_Base = 0
    EndIf

    ; Open output file for overwrite by default, or use input file handle if passed
    Local $hFile
    If IsString($File) Then
        $hFile = FileOpen($File, $FO_OVERWRITE)
    Else
        $hFile = $File
    EndIf
    If $hFile = -1 Then Return SetError(1, 0, 0)

    ; Write array data to file
    Local $ErrorSav = 0
    Local $array_Line, $dlmLen = StringLen($iDelimit)
    For $x = $row_Base To $row_UBound
        $array_Line = ''
        If $dim = 2 Then
            For $xx = $col_Base To $col_Ubound
                $array_Line &= $a_Array[$x][$xx] & $iDelimit
            Next
            $array_Line = StringTrimRight($array_Line,$dlmLen)
        Else
            $array_Line = $a_Array[$x]
        EndIf
        If FileWrite($hFile, $array_Line & @CRLF) = 0 Then
            $ErrorSav = 3
            ExitLoop
        EndIf
    Next

    ; Close file only if specified by a string path
    If IsString($File) Then FileClose($hFile)

    ; Return results
    If $ErrorSav Then Return SetError($ErrorSav, 0, 0)
    Return 1
EndFunc

Enjoy :graduated:

Realm

Edit: Fixed Typos

Edited by Realm

My Contributions: Unix Timestamp: Calculate Unix time, or seconds since Epoch, accounting for your local timezone and daylight savings time. RegEdit Jumper: A Small & Simple interface based on Yashied's Reg Jumper Function, for searching Hives in your registry.  

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0