Jump to content

File Scanning


Pandemic
 Share

Recommended Posts

I'm entirely confused... I've been looking at this for a bit, did a lot of debugging, and I'm still baffled.

The goal of this script is go to through and download all the xkcd comics, as well as the alt text. It gets the image for the comic perfectly fine, but then it goes to grab the alt text, and it fails horribly. Sometimes it gets too much, sometimes it doesn't get enough. Funny thing is, I used the same method for getting the alt-text as I did for getting the comic URL, and the comic URL hasn't broken yet.

$num = 1
$max = 677

while $num < $max
    ToolTip("" & $num & '/' & $max, 0,0)
    sleep(2000) ;KEEP THIS LINE. DoS attacks are not win.
    
    If $num <> 404 Then
        InetGet("http://xkcd.com/" & $num & "/", "htm") ;Gets the page code
        $str = FileReadLine("htm", 77) ;Line 77 is the line with the comic image (well, it's actually 75 in Chrome, but AutoIt is 2 lines off for who knows what reason...)
        $str = StringTrimLeft($str, 10) ;Trimming 10 characters eliminates the <img="
        
        $f = FileOpen("htm", 2) ;write
        FileWriteLine($f, $str)
        FileClose($f)
        $f = FileOpen("htm", 0) ;read
        $k = 0
        $c = " "
        
        while $c <> '"' ;Ends the URL for the comic
            $c = FileRead($f, 1)
            $k += 1
        WEnd

                ;CODE BREAKS SOMEWHERE AFTER THIS POINT
        
        $addr = StringLeft($str, $k-1) ;Comic address
        $str = fileread($f)
        $str = StringTrimLeft($str, 8) ;Trims off the title="
        FileWrite($f, $str)
        FileClose($f)
        $f = FileOpen("htm", 0) ;read
        $k = 0
        $c = ""     

        while $c <> '"' ;Gets the alt text
            $c = FileRead($f, 1)
            $k += 1
        WEnd        
        
        $alt = StringLeft($str, $k-1)
        FileClose($f)
        
        $f = fileopen($num & ".txt", 2) ;write
        FileWrite($f, $alt) ;write the alt-text
        FileClose($f)
        
        MsgBox(0, "Alt Text", "" & $alt & @LF & "Length: " & StringLen($alt))
        
        If $num > 116 Then ;117 changes from .jpg to .png
            InetGet($addr, $num & ".png")
        Else
            InetGet($addr, $num & ".jpg")
        EndIf
    EndIf
    $num += 1
WEnd
FileDelete("htm")

On a sidenote, if you know of a string function that scans the string for a character please let me know ;). I looked for a bit, but then figured that I need to download the comic's HTML code anyway, so I might as well keep using files.

-Pandemic

Link to comment
Share on other sites

Ya'know, Randall Monroe releases his comics under Copyleft (CC Attribution-NonCommercial ) so there may not be any issue with downloading the comics, but he supports the site by selling XKCD collections 'n stuff, too...

...and it is close to Christmas.

I'm just say'n.

;)

Valuater's AutoIt 1-2-3, Class... Is now in Session!For those who want somebody to write the script for them: RentACoder"Any technology distinguishable from magic is insufficiently advanced." -- Geek's corollary to Clarke's law
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...