Jump to content
Sign in to follow this  
Herb191

How to detect if a URL is a download link before using _IENavigate?

Recommended Posts

Herb191

Is there a fast way of check to see if a URL is a download link before I use _IENavigate?

Also, does anyone know how to disable all popup windows (like the are your sure you want to navigate away windows) without disabling scripting?

Thanks

Edited by Herb191

Share this post


Link to post
Share on other sites
Robjong

Hey,

What do the links look like, are they just URI's to a file? Do you know what types of files to expect?

If so you could just use StringRegExp, something like this for example.

$file_url = "http://www.example.com/path/to/file.zip"
If StringRegExp($file_url, "/[^/]+\.(rar|zip)\z") Then
    ConsoleWrite("File URL: " & $file_url & @CRLF) ; do what you need want with the file URL here
EndIf
Edited by Robjong

Share this post


Link to post
Share on other sites
Herb191

Hey,

What do the links look like, are they just URI's to a file? Do you know what types of files to expect?

If so you could just use StringRegExp, something like this for example.

$file_url = "http://www.example.com/path/to/file.zip"
If StringRegExp($file_url, "/[^/]+\.(rar|zip)\z") Then
    ConsoleWrite("File URL: " & $file_url & @CRLF) ; do what you need want with the file URL here
EndIf

Hi Robjong,

Thanks for the response. I am actually trying to weed out any URL's that are download links. Unfortunately I never know what the URL's are going to look like because I am using a web crawler to get them from completely random websites. I have tried something similar to what you have above but there are thousands of possible download files and my script inevitably finds one and stops working.

Share this post


Link to post
Share on other sites
Robjong

What do you do with valid URL's? show them in an IE window? get the source?

If it is because you want to parse the URL's in a crawler like fashion, or show only html pages for example,

you could check the content-type of the URL by checking the headers...

Here is a rough example:

Global $aAcceptedContentTypes = "text/\w+" ; allows any text content type
;~ Global $aAcceptedContentTypes[2] = ["text/html", "text/plain"] ; if the type string  contains regex meta characters you can escape the string by putting it between \Q and \E(example: \Qtext/foo\E)
 
$url_text = "http://www.autoitscript.com"
$url_file = "http://www.autoitscript.com/cgi-bin/getfile.pl?autoit3/autoit-v3-setup.exe"
 
$result = _CheckContentType($url_text, $aAcceptedContentTypes)
ConsoleWrite("- " & $result & @CRLF)
 
$result = _CheckContentType($url_file, $aAcceptedContentTypes)
ConsoleWrite("- " & $result & @CRLF)
 
 
Func _CheckContentType($sURL, $mContentTypes)
    Local $oHTTP = ObjCreate("winhttp.winhttprequest.5.1")
    $oHTTP.open("HEAD", $sURL)
    $oHTTP.Send()
    If IsArray($mContentTypes) Then
        For $i = 0 To UBound($mContentTypes) - 1
            If StringRegExp($oHTTP.GetAllResponseHeaders, "(?i)content-type:\s*" & $mContentTypes[$i] & ";") Then Return SetError(0, 0, True)
        Next
    ElseIf StringRegExp($oHTTP.GetAllResponseHeaders, "(?i)content-type:\s*" & $mContentTypes & ";") Then
        Return SetError(0, 0, True)
    EndIf
    Return SetError(1, 0, False)
EndFunc

If you want download the files/source you could use 'GET' instead 'HEAD' for the open function and download if it passes the check, this would save you a request.

Edited by Robjong

Share this post


Link to post
Share on other sites
Herb191

That is nice bit of coding but I need to be able to show the page URL's in an IE window because I am processing some date after the server side scripts run...also I need to be able to run on just about any kind of page (except PDF).

Share this post


Link to post
Share on other sites
Robjong

In that case the example I provided should help you out since pages are served as HTML?!

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

  • Similar Content

    • nacerbaaziz
      By nacerbaaziz
      Hello
      Can we pause and resume the download in the InetGet function?
      If is possible, what is the solution please?
      I used this code To manage the download

      #include <INet.au3> func _downloader($name, $linc, $filepath, $RTLF = false, $link = false) global $downloader = GUICreate("downloader", 400, 200, -1, -1, $WS_CLIPCHILDREn, $RTLF, $link) global $path = $filePath $labelTxt = GUICtrlCreateLabel("downloading " & $name, 50, 10, 200, 20) global $labelTxt0 = GUICtrlCreateLabel("downloaded size 0 MB " & "OF 0 MB", 50, 60, 300, 20) global $Progress = "" global $sText = ""     For $i = 1 To Random(5, 20, 1) ; Return an integer between 5 and 20 to determine the length of the string.         $sText &= Chr(Random(65, 122, 1)) ; Return an integer between 65 and 122 which represent the ASCII characters between a (lower-case) to Z (upper-case). next global $labelTxt2 = GUICtrlCreateInput("0%", 50, 80, 50, 20) _GUICtrlEdit_SetReadOnly(-1, true) GUIStartGroup("") global $beep = GUICtrlCreateCheckBox("use the progress beep notification", 150, 120, 200, 20) GUIStartGroup("") $button = GUICtrlCreateButton("Cancel', 130, 150, 180, 25, 0x01) $iIndex = 0 global $Target global $url GUIStartGroup("") global $Progress = GUICtrlCreateProgress(50, 90, 150, 20) global $Target = $filepath global $url = $linc global $path = $filepath global $hDownloadNo = _RSMWare_GetData($url, $Target) global $status = false AdlibRegister("SetProgress") global $onprogress = false, $curent = false GUISetState(@sw_Show) While 1 Switch GUIGetMsg() Case $GUI_EVENT_CLOSE, $button $asc = MsgBox(4132,"exit download?","if you click yes the downloading will be cancel, do you want to cancel it ?") if $asc = 6 then AdlibUnRegister("SetProgress") GUIDelete() If $hDownloadNo <> 0 Then InetClose($hDownloadNo) exitLoop endIf EndSwitch if $status = -1 then $status = 0 $hDownloadNo = _RSMWare_GetData($url, $Target) $onprogress = false $curent = false elseIf $Status = 1 then $status = $path GUIDelete() AdlibUnRegister("SetProgress") exitLoop endIf WEnd return $status endFunc Func _RSMWare_GetData($url, $Target) Local $hDownload = InetGet($url, $Target, 1, 1) Return $hDownload EndFunc ;==>_RSMWare_GetData Func SetProgress() Local $state If $hDownloadNo <> 0 Then $state = InetGetInfo($hDownloadNo) If @error = 0 Then $infor = "downloaded size " & Round(Execute(InetGetInfo($hDownloadNo, $INET_DOWNLOADREAD) / 1048576), 2) & " MB of " & Round(Execute(InetGetInfo($hDownloadNo, $INET_DOWNLOADSIZE) / 1048576), 2) & " MB " $onprogress = Round(Ceiling(($state[0] / $state[1]) * 100)) if not (InetGetInfo($hDownloadNo, $INET_DOWNLOADSIZE) = 0) then if $onProgress <= 0 then $onProgress = 0 GUICtrlSetData($Progress, $onProgress) GUICtrlSetData($labelTxt0, $infor) GUICtrlSetData($labelTxt2, $onProgress & "%") if _isChecked($beep) then if $onprogress > $curent then beep((100 + $onprogress * 20), 100) $curent = $onprogress endIf endIf endIf If $state[2] Then If $state[3] Then InetClose($hDownloadNo) $status = 1 else InetClose($hDownloadNo) $status = -1 endIf endIf EndIf endIf EndFunc ;==>SetProgress
    • SkysLastChance
      By SkysLastChance
      I am having a hard time understanding why this is not working. I was hoping some one could help explain it to me. 
      $tags = $oIE.document.GetElementsByTagName("input") For $tag in $tags $class_value = $tag.GetAttribute("class") If string($class_value) = "fTs-p3298-l0 wplEditControl" Then $target = $tag ExitLoop EndIF Next MsgBox(0,"",$target) If $target = "fTs-p3298-l0 wplEditControl" THEN MsgBox(0,"","itworked") I have tried 
       MsgBox(0,"",$target.Attribute)  MsgBox(0,"",$target.Value)  MsgBox(0,"",$target.InnerText) I would expect to see this in the msgbox
      fTs-p3298-l0 wplEditControl  
    • SkysLastChance
      By SkysLastChance
      I am trying to grab the id "in2xk_26" however it the characters before the underscore always change. (in the name too)
      Is there a way I can find a id or name by the last 3 charcters?
      Using something like "stringright?"
      So I would want to search for just "_26" in this case.
      This is assuming that there are no other _26
       

      #include <Excel.au3> #include <IE.au3> #include <GUIConstantsEx.au3> Global $iMousespeed = 25,$target = "",$TagName = "",$Value = "",$Atrribute = "" $oIE = _IEAttach("MEDITECH") $TagName = "input" $Value = "in2xk_26" $Attribute = "id" $tags = $oIE.document.GetElementsByTagName($TagName) ;TagName ------ MAKE SURE TO NAME THESE For $tag in $tags $class_value = $tag.GetAttribute($Attribute) ;Attribute ------ MAKE SURE TO NAME THESE If string($class_value) = $Value Then ;Value ------ MAKE SURE TO NAME THESE $target = $tag $iScreenX = _IEPropertyGet($target, "screenx") $iScreenY = _IEPropertyGet($target, "screeny") $iWidth = _IEPropertyGet($target, "width") $iHeight = _IEPropertyGet($target, "height") $oMouseCords = MouseMove($iScreenX + $iWidth / 2, $iScreenY + $iHeight / 2, $iMousespeed) MouseClick($MOUSE_CLICK_LEFT) ExitLoop EndIf Next  
       
    • SkysLastChance
      By SkysLastChance
      I was wondering how I read data inside of a iframe. I would like to be able to click something inside a iframe. However, I can't even find the tag. 
      The id and name change all the time so I can't use those. 
      The code highlighted in blue is what I am trying to click. (second picture)
      Here is what I have tried.
      $target = "" $tags = $oIE.document.GetElementsByTagName("div") For $tag in $tags $class_value = $tag.GetAttribute("class") If string($class_value) = "s_92 altstyle s_93 s_94" Then $target = $tag ConsoleWrite("Tag Found " & $target.outerText&@CRLF) ExitLoop EndIf Next  

       
       

    • SkysLastChance
      By SkysLastChance
       
      WinActivate("MEDITECH - Internet Explorer") Sleep (500) $oIE = _IEAttach("MEDITECH") $oDiv1 = _IEGetObjById($oIE, "sysmenu-searchbarbutton") _IEAction($oDiv1, "click") I am just trying to click the little magnifying glass, next to the gear button with no luck. I was hoping someone might have an idea why this is not working?
       

×