Jump to content

Optimizing _INetGetSource. is it possible?


pillbug
 Share

Recommended Posts

Hi, I've been working on optimizing the _INetGetSource( and I am not seeing much improvement.

If I have a 1 MB download connection, and each html file is about 10K, I should be able to download 100 searched results per second. Unfortunately, I am not even close to this.

Although eventually I don't plan to use this for google searches, but a list of websites, I've created an example of searches in google with the search word "hello"

I suspect the main problem is having to connect to wininet.dll

If anyone sees ways to improve the program, please let me know on how to speed this process.

$begin = TimerInit()

$h_DLL = DllOpen('wininet.dll')
    $ai_IO = DllCall($h_DLL, 'int', 'InternetOpen', 'str', 'AutoIt v3', 'int', 0, 'int', 0, 'int', 0, 'int', 0)
    If @error Or $ai_IO[0] = 0 Then
        SetError(1)
        Return ''
    EndIf
$v_Struct = DllStructCreate('udword'); udword=unsigned 32 bit integer
for $i = 0 to 399

    $s_URL= 'http://www.google.com/search?hl=en&q=hello&start=' & $i*10 & '&sa=N'
    $ai_IOU = DllCall($h_DLL, 'int', 'InternetOpenUrl', 'int', $ai_IO[0], 'str', $s_URL, 'str', '', 'int', 0, 'int', 0x80000000, 'int', 0)
    If @error Or $ai_IOU[0] = 0 Then
        $source= ''
    EndIf

$s_Buf = ''
    DllStructSetData($v_Struct, 1, 1)
    for $z=1 to 3
        $ai_IRF = DllCall($h_DLL, 'int', 'InternetReadFile', 'int', $ai_IOU[0], 'str', '', 'int', 4096, 'ptr', DllStructGetPtr($v_Struct))
        $s_Buf &= StringLeft($ai_IRF[2], DllStructGetData($v_Struct, 1)); 
    next
    DllCall($h_DLL, 'int', 'InternetCloseHandle', 'int', $ai_IOU[0])
    $source= $s_Buf

TrayTip('', 'Source: ' & $i, 3)

if $i = 5 then TrayTip('', 'Source: ' & stringLeft($source, 5), 3)

next

DllCall($h_DLL, 'int', 'InternetCloseHandle', 'int', $ai_IO[0])
DllClose($h_DLL)

$dif =Timerdiff($begin)
msgbox(0,"Time Difference", $dif)
Edited by pillbug
Link to comment
Share on other sites

If I have a 1 MB download connection, and each html file is about 10K, I should be able to download 100 searched results per second.

Way to oversimplify ! I don't have the knowledge to help you, maybe someone else will provide more useful info. I just wanted to say that your thinking is wrong, there are are more connection latencies involved. And the fractions of a second needed to interface with a dll is negligible.

Link to comment
Share on other sites

Maybe something like this, but it was still way slower than what you posted. Maybe explore the forums and see if there is any other examples of getting the source of a page... :)

$begin = TimerInit()

$oHTTP = ObjCreate("winhttp.winhttprequest.5.1")

$term = "hello"

For $i = 0 To 10
    $s_URL = 'http://www.google.com/search?hl=en&q=' & $term & '&start=' & $i * 10 & '&sa=N'
    ClipPut ($s_URL)
    $oHTTP.Open("GET", $s_URL, False)
    $oHTTP.Send()
    $source = $oHTTP.ResponseText
    ConsoleWrite ("Source for document " & $i & @CRLF)
Next

$dif = TimerDiff($begin)
MsgBox(0, "Time Difference", $dif & @CRLF & "Average (ms) = " & Int ($dif/$i))
Link to comment
Share on other sites

I already tried that one Brett, For me, it ended up being 17 seconds for all 10; when his code was like, 4 seconds for 10. So I didnt bother posting.

My only suggestion, would be maybe making your own function using the TCP* stuff.

Something like;

TCPConnect()

For $i = 0 To 10

TCPSend()

While 1

;some other thing to log the source

WEnd

Next

So that your not having to reopen a connection every time, Though I dont know if that works, as ive never tried.

But I think it should work.

# MY LOVE FOR YOU... IS LIKE A TRUCK- #
Link to comment
Share on other sites

Remember this is single thread code your working with and it takes time to excute this code so right there your download rate is slowing down or I should say not fully being used. First off how fast is your computer because every mm adds and having 100 at about 10 mm is 1000 mm, oh a second more or less.

Anyways your forgets the handshake that you must pass through then the data your sending each taking up those precious mm, in the end you likely to get 5-25 but no more remember the more the more time it takes to retrieve them unless you go thread wise. Go to search and lookup "Thread" that should get you in the direction you should get to maximize your full download rate

0x576520616C6C206469652C206C697665206C69666520617320696620796F75207765726520696E20746865206C617374207365636F6E642E

Link to comment
Share on other sites

I already tried that one Brett, For me, it ended up being 17 seconds for all 10; when his code was like, 4 seconds for 10. So I didnt bother posting.

My only suggestion, would be maybe making your own function using the TCP* stuff.

Something like;

TCPConnect()

For $i = 0 To 10

TCPSend()

While 1

;some other thing to log the source

WEnd

Next

So that your not having to reopen a connection every time, Though I dont know if that works, as ive never tried.

But I think it should work.

Check on a topic concerning that, however it is for http but you want it to act like a ftp

http://www.autoitscript.com/forum/index.ph...p;hl=tcpCONNECT

0x576520616C6C206469652C206C697665206C69666520617320696620796F75207765726520696E20746865206C617374207365636F6E642E

Link to comment
Share on other sites

So I looked into TCP.

I modified some of the code from http://www.autoitscript.com/forum/index.ph...0617&hl=tcp

However, TCP is 2x slower:

TCP:6035.04243275544 DLL:3014.04479785984

Do any TCP experts see any ways to make TCP faster or the DLL faster?

$begin_tcp = TimerInit()
TCPFUNCTION()
$dif_tcp =Timerdiff($begin_tcp)

$begin_dll = TimerInit()
DLLFUNCTION()
$dif_dll =Timerdiff($begin_dll)

Consolewrite("TCP:" & $dif_tcp & "   DLL:" & $dif_dll)






Func TCPFunction()
    TCPStartup(); initializing service
    
for $i = 0 to 4

    $URL= 'http://www.google.com/search?hl=en&q=hello&start=' & $i*10 & '&sa=N' 
    
    $URL = StringRegExpReplace($URL, '\A(http://|https://)(.*?)/?\Z', '$2'); dropping http// https// if are there e.g. will return www.autoitscript.com/forum/index.php?showforum=9 for us
   

   
    Local $dom = StringRegExpReplace($URL, '\A(.*?)/.*', '$1'); this part is domain name (www.autoitscript.com)
    Local $ip = TCPNameToIP($dom); will need this to connect to server
    If $ip = "" Then Return -1
    Local $get = StringRegExpReplace($URL, '\A(.*?)/(.*)', '$2'); we want this (forum/index.php?showforum=9)
    If $get = $dom Then $get = ''; in case requiring main page
   
    Local $header = 'GET /' & $get & ' HTTP/1.1' & @CRLF _
             & 'User-Agent: Test' & @CRLF _
             & 'Host: 127.0.0.1' & @CRLF & @LF; something about us and what we want  ending with @CRLF & @LF

    Local $socket = TCPConnect($ip, 80); connecting to server
    If $socket = -1 Then Return -2; will not check any more errors from here on (you do it tongue.gif)
   
    TCPSend($socket, $header); sending request

    Local $rcv, $out, $x, $sw, $r, $lenght
   
    While 1
       
        If $rcv <> '' Then
            If $x <> 1 Then

                $lenght = Number(StringRegExpReplace($rcv, '(?s)(.*?)Content-Length: (\d+)(.*)', '$2') + StringLen(StringLeft($rcv, StringInStr($rcv, @CRLF & @CRLF))) + 3)
            EndIf
            $x = 1
        EndIf
       
        $rcv = TCPRecv($socket, 1024); receiving data from server
 
        $out &= $rcv; adding to what we already have
       
        If $x = 1 Then
            If $rcv = '' Then
                $sw += 1
                If $sw = 10000 Then ExitLoop; sometimes there is no end, so we'll have to end it
            Else
                $sw = 0
            EndIf
        EndIf
       
        If $lenght <> 0 Then
            If StringLen($out) = $lenght Then; some servers are done when they send ammount of data that they previously declared
                ExitLoop
            EndIf
        EndIf
       
        If StringRight($rcv, 5) = 0 & @CRLF & @CRLF Then; some servers will end with this
            ExitLoop
        EndIf
       
    WEnd
   

    TCPCloseSocket($socket); closing socket
  ;Return $out
   TrayTip('', 'Source: ' & $i & $out, 3)
next

    TCPShutdown(); stopping service
EndFunc

Func DLLFUNCTION()
$h_DLL = DllOpen('wininet.dll')
    $ai_IO = DllCall($h_DLL, 'int', 'InternetOpen', 'str', 'AutoIt v3', 'int', 0, 'int', 0, 'int', 0, 'int', 0)
    If @error Or $ai_IO[0] = 0 Then
        SetError(1)
        Return ''
    EndIf
$v_Struct = DllStructCreate('udword'); udword=unsigned 32 bit integer
for $i = 0 to 4

    $s_URL= 'http://www.google.com/search?hl=en&q=hello&start=' & $i*10 & '&sa=N'
    $ai_IOU = DllCall($h_DLL, 'int', 'InternetOpenUrl', 'int', $ai_IO[0], 'str', $s_URL, 'str', '', 'int', 0, 'int', 0x80000000, 'int', 0)
    If @error Or $ai_IOU[0] = 0 Then
        $source= ''
    EndIf

$s_Buf = ''
    DllStructSetData($v_Struct, 1, 1)
    for $z=1 to 3
        $ai_IRF = DllCall($h_DLL, 'int', 'InternetReadFile', 'int', $ai_IOU[0], 'str', '', 'int', 4096, 'ptr', DllStructGetPtr($v_Struct))
        $s_Buf &= StringLeft($ai_IRF[2], DllStructGetData($v_Struct, 1)); 
    next
    DllCall($h_DLL, 'int', 'InternetCloseHandle', 'int', $ai_IOU[0])
    $source= $s_Buf

TrayTip('', 'Source: ' & $i, 3)

if $i = 5 then TrayTip('', 'Source: ' & stringLeft($source, 5), 3)

next

DllCall($h_DLL, 'int', 'InternetCloseHandle', 'int', $ai_IO[0])
DllClose($h_DLL)

ENDFUNC
Edited by pillbug
Link to comment
Share on other sites

Try to split the DLL-function into 3 parts:

1): open DLL and call InternetOpen

2): Open URL, download and Close URL-Handle

2a): repeat step 2 as often as you want.

3): at the end of the Script, close InternetOpen and DLL

*GERMAN* [note: you are not allowed to remove author / modified info from my UDFs]My UDFs:[_SetImageBinaryToCtrl] [_TaskDialog] [AutoItObject] [Animated GIF (GDI+)] [ClipPut for Image] [FreeImage] [GDI32 UDFs] [GDIPlus Progressbar] [Hotkey-Selector] [Multiline Inputbox] [MySQL without ODBC] [RichEdit UDFs] [SpeechAPI Example] [WinHTTP]UDFs included in AutoIt: FTP_Ex (as FTPEx), _WinAPI_SetLayeredWindowAttributes

Link to comment
Share on other sites

Try to split the DLL-function into 3 parts:

1): open DLL and call InternetOpen

2): Open URL, download and Close URL-Handle

2a): repeat step 2 as often as you want.

3): at the end of the Script, close InternetOpen and DLL

I am not sure how my code is different from what you said...

Could you give me some code to show what you mean?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...