Jump to content

FileReadLine problem read a very big line ....


Alex117
 Share

Recommended Posts

Hi everyone,

I try to create scripts for downloads information from web site and directly inject to sql.

In my scripts i download (inetget) a html file for analyze it localy.

I use filereadline for read a specific line for stringsplit later.

I use this method a long time ago :-)

This time, i encounter a problem because the line is very, very big. (fyi, the analized file is joined at this topic.).

The number opf the line is 570.

I try yet to search on forum but i never found a similar topic.

In fact the FileRaedLine function is not be able to read the entire line. There is a cut before the end.

For example i use this code :

$base_illustrateur = FileOpen(@ScriptDir &"\illustrateurs.html",0)

$line = FileReadLine($base_illustrateur,571)

$split = StringSplit($line,"</table>",1)

$split = StringSplit($split[1],"</tr>",1)

FileWriteLine("debug_line.txt",$line)

If i compare the first line of debug_line.txt and the 571th line of illustrateurs.html, there will be a difference.

The size of the debuf file is exactly, every time, to 64k !

Is it a limit of the FileReadLine function ?

Thank you very much

Have a nice day.

Regards.

Alex117

Ps : Sorry for my poor english, i'm french.

illustrateurs.zip

Link to comment
Share on other sites

Hi everyone,

I try to create scripts for downloads information from web site and directly inject to sql.

In my scripts i download (inetget) a html file for analyze it localy.

I use filereadline for read a specific line for stringsplit later.

I use this method a long time ago :-)

This time, i encounter a problem because the line is very, very big. (fyi, the analized file is joined at this topic.).

The number opf the line is 570.

I try yet to search on forum but i never found a similar topic.

In fact the FileRaedLine function is not be able to read the entire line. There is a cut before the end.

For example i use this code :

$base_illustrateur = FileOpen(@ScriptDir &"\illustrateurs.html",0)

$line = FileReadLine($base_illustrateur,571)

$split = StringSplit($line,"</table>",1)

$split = StringSplit($split[1],"</tr>",1)

FileWriteLine("debug_line.txt",$line)

If i compare the first line of debug_line.txt and the 571th line of illustrateurs.html, there will be a difference.

The size of the debuf file is exactly, every time, to 64k !

Is it a limit of the FileReadLine function ?

Thank you very much

Have a nice day.

Regards.

Alex117

Ps : Sorry for my poor english, i'm french.

AutoIt's theoretical limit for a string is 2GB. Practical limits in most real machines are more like 128MB, still much more than 64K. I guess there may be a null character in it though. See what you get from running it this way:
$base_illustrateur = FileOpen(@ScriptDir &"\illustrateurs.html",0)
$line = FileReadLine($base_illustrateur,571)

ConsoleWrite("Debug:  $line length = " & Stringlen($line) & @LF)
ConsoleWrite("Debug:  Binary $line length = " & BinaryLen(Binary($line)) & @LF)

$split = StringSplit($line,"</table>",1)
$split = StringSplit($split[1],"</tr>",1)
FileWriteLine("debug_line.txt",$line)

:mellow:

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

Hi ,

Thank you for your response.

The line is generated by a while function. I can't have a NULL character.

I read a line with contain an html table. I split it with <tr>. Example of lines :

<tr style="background-color: transparent;" onmouseover="java script:this.style.backgroundColor='#52A1CA';" onmouseout="java script:this.style.backgroundColor='transparent'"><td><a href="gathering-cartes-illustrateur-1-aaron-boyd.html">Aaron Boyd</a></td></tr>
<tr style="background-color: transparent;" onmouseover="java script:this.style.backgroundColor='#52A1CA';" onmouseout="java script:this.style.backgroundColor='transparent'"><td><a href="gathering-cartes-illustrateur-2-adam-rex.html">Adam Rex</a></td></tr>
<tr style="background-color: transparent;" onmouseover="java script:this.style.backgroundColor='#52A1CA';" onmouseout="java script:this.style.backgroundColor='transparent'"><td><a href="gathering-cartes-illustrateur-3-adrian-smith.html">Adrian Smith</a></td></tr>
etc...

I try you suggest code, and conse debug this response :

Debug: $line length = 61002

Debug: Binary $line length = 61002

If i try to add the 2 debug line in my original script (for the same file) i obtain a difference :

Debug: $line length = 65534

Debug: Binary $line length = 65534

In twice case, the FileWriteLine("debug_line.txt",$line) doesn't give the same results :mellow:

How is it possible to give a different result with the same code on ?

Thank you,

Have a nice day.

Alex

Link to comment
Share on other sites

Hi ,

Thank you for your response.

The line is generated by a while function. I can't have a NULL character.

I read a line with contain an html table. I split it with <tr>. Example of lines :

<tr style="background-color: transparent;" onmouseover="java script:this.style.backgroundColor='#52A1CA';" onmouseout="java script:this.style.backgroundColor='transparent'"><td><a href="gathering-cartes-illustrateur-1-aaron-boyd.html">Aaron Boyd</a></td></tr>
<tr style="background-color: transparent;" onmouseover="java script:this.style.backgroundColor='#52A1CA';" onmouseout="java script:this.style.backgroundColor='transparent'"><td><a href="gathering-cartes-illustrateur-2-adam-rex.html">Adam Rex</a></td></tr>
<tr style="background-color: transparent;" onmouseover="java script:this.style.backgroundColor='#52A1CA';" onmouseout="java script:this.style.backgroundColor='transparent'"><td><a href="gathering-cartes-illustrateur-3-adrian-smith.html">Adrian Smith</a></td></tr>
etc...

I try you suggest code, and conse debug this response :

Debug: $line length = 61002

Debug: Binary $line length = 61002

If i try to add the 2 debug line in my original script (for the same file) i obtain a difference :

Debug: $line length = 65534

Debug: Binary $line length = 65534

In twice case, the FileWriteLine("debug_line.txt",$line) doesn't give the same results :(

How is it possible to give a different result with the same code on ?

Thank you,

Have a nice day.

Alex

It's easy for the same code to produce different results based on different run-time circumstances. Only if the file read are exactly the same, so $line is exactly the same, should you get the same result. Can you post illustrateurs.html or another file like it that produces the same symptoms for you? Without that, I don't see how we can reproduce your conditions and symptoms.

The short answer is: There is no 64KB string limit on the functions you've used in this topic, so that is not the problem.

:mellow:

Edit: I'm just all kinds of wrong here... illustrateurs.html is posted in OP, and the FileReadLine() function is showing a 64K limit that I don't think is intended.

Edited by PsaltyDS
Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

There may be a far easier way to do this. Exactly what is it that you are trying to get from the table? Do you need the link and the link text of just one of them?

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Link to comment
Share on other sites

Further testing shows FileReadLine splits at 65534 character.

Output from the test.

Line size: 65534
Line size: 65534
Line size: 65534
Line size: 65534
Line size: 65534
Line size: 65534
Line size: 65534
Line size: 65534
Line size: 65534
Line size: 65534
Line size: 44660

FileDelete("test.txt")
$line = ""
For $i = 1 To 700000
    $line &= "A"
Next
FileWriteLine("test.txt",$line)

$file = FileOpen("test.txt", 0)

While 1
    $line = FileReadLine($file)
    If @error = -1 Then ExitLoop
    ConsoleWrite("Line size: " & StringLen($line) & @LF)
Wend

FileClose($file)
Edited by Joon
Link to comment
Share on other sites

Further testing shows FileReadLine splits at 65534 character.

Output from the test.

Line size: 65534
Line size: 65534
Line size: 65534
Line size: 65534
Line size: 65534
Line size: 65534
Line size: 65534
Line size: 65534
Line size: 65534
Line size: 65534
Line size: 44660

FileDelete("test.txt")
$line = ""
For $i = 1 To 700000
    $line &= "A"
Next
FileWriteLine("test.txt",$line)

$file = FileOpen("test.txt", 0)

While 1
    $line = FileReadLine($file)
    If @error = -1 Then ExitLoop
    ConsoleWrite("Line size: " & StringLen($line) & @LF)
Wend

FileClose($file)
Well... rats! :)

Confirmed on XP Pro with 3.2.12.1 and 3.2.13.9 Beta (will try .10 in just a bit):

#include <File.au3>

Global $sFile = "test.txt", $hfile
Global $sLine = "abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ!@#$%^&*()"
FileDelete($sFile)

Do
    $sLine &= $sLine
Until Stringlen($sLine) > 2^16
ConsoleWrite("$sLine length = " & StringLen($sLine) & @LF)

For $n = 1 To 10
    FileWriteLine($sFile,$sLine)
Next
ConsoleWrite("Line count = " & _FileCountLines($sFile) & @LF)

; Test with FileReadLine()
ConsoleWrite(@LF & "Test with FileReadLine() -----------------------" & @LF)
$n = 1
While 1
    $sLine = FileReadLine($sFile, $n)
    If @error Then ExitLoop
    ConsoleWrite("Line " & $n & " length = " & StringLen($sLine) & @LF)
    $n += 1
WEnd

; Test with FileRead()
ConsoleWrite(@LF & "Test with FileRead() -----------------------" & @LF)
$sLine = FileRead($sFile)
$avLine = StringSplit($sLine, @CRLF, 1)
For $n = 1 To 10
    ConsoleWrite("Line " & $n & " length = " & StringLen($avLine[$n]) & @LF)
Next

:mellow:

Edit: Confirmed with 3.2.13.10 Beta. Output:

>Running:(3.2.13.10):C:\Program Files\AutoIt3\beta\autoit3.exe "C:\temp\Test.au3"   
$sLine length = 73728
Line count = 10

Test with FileReadLine() -----------------------
Line 1 length = 65534
Line 2 length = 8194
Line 3 length = 65534
Line 4 length = 8194
Line 5 length = 65534
Line 6 length = 8194
Line 7 length = 65534
Line 8 length = 8194
Line 9 length = 65534
Line 10 length = 8194
Line 11 length = 65534
Line 12 length = 8194
Line 13 length = 65534
Line 14 length = 8194
Line 15 length = 65534
Line 16 length = 8194
Line 17 length = 65534
Line 18 length = 8194
Line 19 length = 65534
Line 20 length = 8194

Test with FileRead() -----------------------
Line 1 length = 73728
Line 2 length = 73728
Line 3 length = 73728
Line 4 length = 73728
Line 5 length = 73728
Line 6 length = 73728
Line 7 length = 73728
Line 8 length = 73728
Line 9 length = 73728
Line 10 length = 73728
+>15:26:29 AutoIT3.exe ended.rc:0

:(

Edited by PsaltyDS
Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

It was pointed out by herewasplato that this limit is old in AutoIt: FileReadLine Limit 65534 characters?

Since that limit may have been imposed by Win9x compatibility back then, it might be worth a feature request to change it now. It is at the least a likely documentation update for the FileReadLine() function in the help file.

There are work-arounds:

1. Use FileRead() and StringSplit(), this is demonstrated in my code above.

2. Use _FileReadToArray(), demonstrated in demo below.

3. More options if you want to use WinAPI/FileSystemObject functions.

:mellow:

Demo using _FileReadToArray():

#include <File.au3>

Global $sFile = "test.txt", $avFile
Global $sLine = "abcdefghijklmnopqrstuvwxyz1234567890ABCDEFGHIJKLMNOPQRSTUVWXYZ!@#$%^&*()"
FileDelete($sFile)

Do
    $sLine &= $sLine
Until Stringlen($sLine) > 2^16
ConsoleWrite("$sLine length = " & StringLen($sLine) & @LF)

For $n = 1 To 10
    FileWriteLine($sFile,$sLine)
Next
ConsoleWrite("Line count = " & _FileCountLines($sFile) & @LF)

; Test with _FileReadToArray()
ConsoleWrite(@LF & "Test with FileReadToArray() -----------------------" & @LF)
_FileReadToArray($sFile, $avFile)
For $n = 1 To $avFile[0]
    ConsoleWrite("Line " & $n & " length = " & StringLen($avFile[$n]) & @LF)
Next

:(

Edited by PsaltyDS
Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

Created ticket number #681 to change documentation for FileReadLine() to explain the current 64K limit.

Oops, too slow, as joon had already reported the documentation change in ticket #679.

Also created Feature Request #682 to remove that limit.

The DEVs are aware of the issue.

:mellow:

Edited by PsaltyDS
Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...