Jump to content

[Solved - Sord of] inetget downloads corrupt files


Recommended Posts

Most of the files (90%) I download are ok, The rest are corrupted. I checked the files on server, they are ok.

I also Tried to download by only 1 file with 1 line code like: inetget($url, 01.jpg) And the download was fine. So im confused now what on earth is going on?

I got cable connection.

Is there anything I can do about it?

Edited by goldenix
My Projects:[list][*]Guide - ytube step by step tut for reading memory with autoitscript + samples[*]WinHide - tool to show hide windows, Skinned With GDI+[*]Virtualdub batch job list maker - Batch Process all files with same settings[*]Exp calc - Exp calculator for online games[*]Automated Microsoft SQL Server 2000 installer[*]Image sorter helper for IrfanView - 1 click opens img & move ur mouse to close opened img[/list]
Link to comment
Share on other sites

Does this happen of one particular server?

Can you post short code so we can try replicating the issue?

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Does this happen of one particular server?

Can you post short code so we can try replicating the issue?

It tends to happen from time to time with random servers. but 1 thing is there common. it never happens if I use 1 line code. It always happens when I download several images one after another.

Basically it downloads all thumbnails from the page & put them into a folder.

; #FUNCTION# ;===============================================================================
; AutoIt Version:  3.3.6.0
; Description ...: download 1 gallery, Link must point to thumbnail view
; Author ........: goldenix
;~  _ArrayDisplay($_Arrayline,'')
; ;==========================================================================================
#Include <Array.au3>
#include <INet.au3>

Global $Final_page   = True
Global $gallery_name = ''
Global $ln = 1

$url = 'http://lu.scio.us/hentai/albums/chobitswallpaper/page/1'


    $o = 1
    $url = StringTrimRight($url,1)
    while 1
        _Go($url & $o)

        $o = $o +1
        If $Final_page = True then ExitLoop
    WEnd


;~  =====================================================================
func _Go($url)

    $Final_page  = True

    $HTMLSource = _INetGetSource($url)

    ;## Put the source into an array $_Arrayline
    $_Arrayline = StringSplit($HTMLSource, @LF)

    for $i = 1 to $_Arrayline[0]

        ;## Get Gallery Title
        If StringInStr($_Arrayline[$i],'meta name="title') Then  _Get_gallery_name($_Arrayline[$i])

        ;## search Array Line for thumb_100_
        If StringInStr($_Arrayline[$i],'thumb_100_') Then  _Extract_img_url_from_line($_Arrayline[$i])

        ;## Check if last page
        If StringInStr($_Arrayline[$i],'last.png') Then $Final_page = False

        ;## Thumbs end hire
        If StringInStr($_Arrayline[$i],'Back to the List') Then  ExitLoop

    Next
EndFunc

Func _Get_gallery_name($line)

    $Split = StringSplit($line, 'Album:',1)
    $Split = StringSplit($Split[2],' - Page',1)
;~  _ArrayDisplay($Split,'')
    $gallery_name = $Split[1]
    ConsoleWrite($Split[1] & @CRLF)

EndFunc


Func _Extract_img_url_from_line($_arr_line)

    $Split = StringSplit($_arr_line, 'src="',1)
    $Split = StringSplit($Split[2],'"',1)

        $new_src = stringreplace($Split[1], 'thumb_100_','')
        _Download($new_src)
EndFunc


Func _Download($new_src)

    $ext = StringSplit($new_src,'.',1) ; get file extension
    $ext = '.' & $ext[$ext[0]]

    $split = StringSplit($new_src,'/',1) ; get filename
    $filetosave_as = $split[$split[0]]

        If $ln < 10 Then    $filetosave_as = '00' & $ln & $ext
        If $ln >= 10 Then   $filetosave_as = '0' & $ln & $ext
        If $ln >= 100 Then  $filetosave_as = '' & $ln & $ext

        $dir = 'lu.scio.us - Downloads'

        DirCreate($dir)
        DirCreate($dir & '\' & $gallery_name)

        If Not FileExists($dir & '\' & $gallery_name & '\' & $filetosave_as) Then
            InetGet($new_src, $dir & '\' & $gallery_name & '\' & $filetosave_as,1,0)
            Sleep(500)
        EndIf

        ConsoleWrite($filetosave_as & @CRLF)

        $ln = $ln +1
EndFunc
Edited by goldenix
My Projects:[list][*]Guide - ytube step by step tut for reading memory with autoitscript + samples[*]WinHide - tool to show hide windows, Skinned With GDI+[*]Virtualdub batch job list maker - Batch Process all files with same settings[*]Exp calc - Exp calculator for online games[*]Automated Microsoft SQL Server 2000 installer[*]Image sorter helper for IrfanView - 1 click opens img & move ur mouse to close opened img[/list]
Link to comment
Share on other sites

1. If you have a router, sometimes a router will blink (reset)

2. You probably need to close your inetget connection after "each download"

and put a little sleep in between to slow it down some.

Example Code:

Local $FileName = InetGet($url_item, $sName, 1, 1)
Sleep(250)
Do
    Sleep(250)
Until InetGetInfo($FileName, 2)
Sleep(250)
InetClose($FileName)

"The mediocre teacher tells. The Good teacher explains. The superior teacher demonstrates. The great teacher inspires." -William Arthur Ward

Link to comment
Share on other sites

I've tried some variants: InetGet with options 1 then 17, InetRead. The first time with InetGet about 1/3 of files are incomplete. Subsequent downloads (with reload forced) were complete except one file, got with Inetget.

If Not FileExists($dir & '\' & $gallery_name & '\' & $filetosave_as) Then
            Local $bin
    $bin = InetRead($new_src)
            FileWrite($dir & '\' & $gallery_name & '\' & $filetosave_as, $bin)
    EndIf

No Sleep is needed: your browser doesn't Sleep(500) between GETs I hope!

It's possible that the incomplete downloads are due to a problem with server cacheing.

You could try to double download and throw away the first copy possibly incomplete. Make sure to force reload on second time. It's ugly but could work more reliably.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Line 14 is missing ' at end

$url = 'http://lu.scio.us/hentai/albums/chobitswallpaper/page/1

Yes, it might be server problems

"The mediocre teacher tells. The Good teacher explains. The superior teacher demonstrates. The great teacher inspires." -William Arthur Ward

Link to comment
Share on other sites

Did you try InetGetInfo() to check for errors? If the corrupted files give an error it will be easy to just retry those.

Okey I rewrote the script with InetGetInfo Added new function at the end that will DL & check the file. Result: 95% of files are corrupt. Like you see no errors. In the end I do filesize checking. So I could rewrite it, to re download the corrupt file until I get good one. But this would be stupid. What if I need to re download each file 100 times?

Imagine the time & resource waist.

Obviously the server is not going to fix itself. Also, if I use Download manager or Browser, none of the images is being displayed or downloaded corrupt.

So Inetget does not check the packets integrity? why cant it download files intact?

Since inetget is buggy, are there any other alternatives to download files?

Piece of the log:

011.jpg Size: 759078 Complete->: False Successful->: False @error: 0 @extended: 0 Filesize dont Match: 741.287109375 65.783203125
012.jpg Size: 855656 Complete->: False Successful->: False @error: 0 @extended: 0 Filesize dont Match: 835.6015625 65.7822265625
013.jpg Size: 992410 Complete->: False Successful->: False @error: 0 @extended: 0 Filesize dont Match: 969.150390625 702.8125
014.jpg Size: 816747 Complete->: False Successful->: False @error: 0 @extended: 0
015.jpg Size: 880380 Complete->: False Successful->: False @error: 0 @extended: 0

Rewritten script:

#Include <Array.au3>
#include <INet.au3>

Global $Final_page   = True
Global $gallery_name = ''
Global $ln = 1

$url = 'http://lu.scio.us/hentai/albums/chobitswallpaper/page/1'


    $o = 1
    $url = StringTrimRight($url,1)
    while 1
        _Go($url & $o)

        $o = $o +1
        If $Final_page = True then ExitLoop
    WEnd


;~  =====================================================================
func _Go($url)

    $Final_page  = True

    $HTMLSource = _INetGetSource($url)

    ;## Put the source into an array $_Arrayline
    $_Arrayline = StringSplit($HTMLSource, @LF)

    for $i = 1 to $_Arrayline[0]

        ;## Get Gallery Title
        If StringInStr($_Arrayline[$i],'meta name="title') Then  _Get_gallery_name($_Arrayline[$i])

        ;## search Array Line for thumb_100_
        If StringInStr($_Arrayline[$i],'thumb_100_') Then  _Extract_img_url_from_line($_Arrayline[$i])

        ;## Check if last page
        If StringInStr($_Arrayline[$i],'last.png') Then $Final_page = False

        ;## Thumbs end hire
        If StringInStr($_Arrayline[$i],'Back to the List') Then  ExitLoop

    Next
EndFunc

Func _Get_gallery_name($line)

    $Split = StringSplit($line, 'Album:',1)
    $Split = StringSplit($Split[2],' - Page',1)
;~  _ArrayDisplay($Split,'')
    $gallery_name = $Split[1]
;~  ConsoleWrite($Split[1] & @CRLF)

EndFunc


Func _Extract_img_url_from_line($_arr_line)

    $Split = StringSplit($_arr_line, 'src="',1)
    $Split = StringSplit($Split[2],'"',1)

        $new_src = stringreplace($Split[1], 'thumb_100_','')
        _Download_prepare($new_src)
EndFunc


Func _Download_prepare($new_src)

    $ext = StringSplit($new_src,'.',1) ; get file extension
    $ext = '.' & $ext[$ext[0]]

    $split = StringSplit($new_src,'/',1) ; get filename
    $filetosave_as = $split[$split[0]]

        If $ln < 10 Then    $filetosave_as = '00' & $ln & $ext
        If $ln >= 10 Then   $filetosave_as = '0' & $ln & $ext
        If $ln >= 100 Then  $filetosave_as = '' & $ln & $ext

        $dir = 'lu.scio.us - Downloads'

        DirCreate($dir)
        DirCreate($dir & '\' & $gallery_name)

        If Not FileExists($dir & '\' & $gallery_name & '\' & $filetosave_as) Then
             _Download($new_src, $dir & '\' & $gallery_name & '\' & $filetosave_as, $filetosave_as)
            Sleep(200)
        EndIf

        $ln = $ln +1
EndFunc


Func _Download($new_src, $dest, $filetosave_as)

    $hDownload  = InetGet($new_src, $dest,1,1)

    Do
        $aData = InetGetInfo($hDownload)  ; Get all information.

        $trim_adata = stringleft($aData[0]/1024/1024, 5)
        ToolTip($trim_adata & ' Mb' , 0, 0)

    Sleep(250)
    Until InetGetInfo($hDownload, 2)    ; Check if the download is complete.

    InetClose($hDownload)   ; Close the handle to release resourcs.

    ;## check if filesize from server & after DL match
    $size = FileGetSize($dest)

    If $size == $aData[1] Then
        ConsoleWrite($filetosave_as & _
            " Size: " & $aData[1] & _
            " Complete->: " & $aData[2] & _
            " Successful->: " & $aData[3] &  _
            " @error: " & $aData[4] &  _
            " @extended: " & $aData[5] & @CRLF)
    Else
        ConsoleWrite('!' & $filetosave_as & _
            " Size: " & $aData[1] & _
            " Complete->: " & $aData[2] & _
            " Successful->: " & $aData[3] &  _
            " @error: " & $aData[4] &  _
            " @extended: " & $aData[5] & _
            " Filesize dont Match: " & $aData[1]/1024 & ' ' & $size/1024 & @CRLF) ; convert to KiloBytes
    EndIf
EndFunc
Edited by goldenix
My Projects:[list][*]Guide - ytube step by step tut for reading memory with autoitscript + samples[*]WinHide - tool to show hide windows, Skinned With GDI+[*]Virtualdub batch job list maker - Batch Process all files with same settings[*]Exp calc - Exp calculator for online games[*]Automated Microsoft SQL Server 2000 installer[*]Image sorter helper for IrfanView - 1 click opens img & move ur mouse to close opened img[/list]
Link to comment
Share on other sites

Not obvious that the way you check is correct.

I've had correct and repettive results with the following, aimed at a server that did have random latency issues.

You''l have to adapt it, but you get the idea. There's no risk trying, after all.

Local $TmpFile = _TempFile(@TempDir, '~basename~', '.html', 20)
    Local $timer, $hInet, $status
    For $retry = 1 to 10
        $timer = TimerInit()
        $hInet = InetGet($url & $NumColis, $TmpFile, 17, 1)     ; background direct download no cache
        Do
            Sleep(50)
            $status = InetGetInfo($hInet, -1)
            If $status[2] Then ExitLoop                         ; download is complete
        Until TimerDiff($timer) >= 30000                        ; allow 30s for download
        InetClose($hInet)                                       ; free handle
        If $status[3] Then                                      ; download is said successful
            $text = FileRead($TmpFile)
;~ ConsoleWrite($text & @LF)
;           $str = StringRight($text, 10)
;           If StringRegExp($str, "(\}\s*){4}", 0) Then         ; was my own check that html page got received in full. That won't work for you
;               ExitLoop
;           EndIf
        Else
ConsoleWrite("Website didn't answer within allowed time." & @LF)
        EndIf
;~      Sleep(500)
    Next
    FileDelete($TmpFile)
    If $retry > 10 Then
        _WarnBox("Server is stoned")    ; a warning MsgBox
        Return 1
    EndIf

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Not obvious that the way you check is correct.

I've had correct and repettive results with the following, aimed at a server that did have random latency issues.

You''l have to adapt it, but you get the idea. There's no risk trying, after all.

So latency 30 seconds? I tried, & every file was still corrupt & DL took insanely long. 20 minutes to DL 10, -1Mb files or so :(

My Projects:[list][*]Guide - ytube step by step tut for reading memory with autoitscript + samples[*]WinHide - tool to show hide windows, Skinned With GDI+[*]Virtualdub batch job list maker - Batch Process all files with same settings[*]Exp calc - Exp calculator for online games[*]Automated Microsoft SQL Server 2000 installer[*]Image sorter helper for IrfanView - 1 click opens img & move ur mouse to close opened img[/list]
Link to comment
Share on other sites

No, that seems to mean that for some reason, the server doesn't cause the "download is complete" flag aka $Status[2] and that's probably why it times out.

IMHO, the right way to understand what happens is to capture the full download session with Wireshark and dissect protocol to see what goes wrong.

I don't believe InetGet, InetRead are that broken, just to annoy you. Moreover, these must be wrappers around Windows functions and it's unlikely that everything is dead buggy at this level.

OTOH, broken or poorly setup servers are legion, so...

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

ok Re downloading the same file 61 times, seems to have solved the problem. But still its almost 60MB extra traffic per 30 files :(

Edited by goldenix
My Projects:[list][*]Guide - ytube step by step tut for reading memory with autoitscript + samples[*]WinHide - tool to show hide windows, Skinned With GDI+[*]Virtualdub batch job list maker - Batch Process all files with same settings[*]Exp calc - Exp calculator for online games[*]Automated Microsoft SQL Server 2000 installer[*]Image sorter helper for IrfanView - 1 click opens img & move ur mouse to close opened img[/list]
Link to comment
Share on other sites

That's strange. Must be an overloaded server.

This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...