Trong Posted February 21, 2015 Posted February 21, 2015 In _INetGetSource('www.autoitscript.com'): sdfuykrytxmlns:fb="http://www.facebook.com/2008/fbml".href='http://www.ciao.com/?favicon.ico' ,sdfg,content="ftp://www.ciao.com/?topic-to-it/" .sdfgfgjhdfg,src='http://www.ciao.com/?public/min/index.php?ipb1mp;g=js'.ipb.vars['base_url']= 'https://www.ciao.com/?index.php?s=fff586ef951e8e568150db1f085a2028&';ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';src='//static1.ciao.com/?public/style_images/master/bullet_black.png'sdfkj Out ConsoleWrite($Link[$n] & @CRLF): http://www.facebook.com/2008/fbml http://www.ciao.com/?favicon.ico ftp://www.ciao.com/?topic-to-it/ http://www.ciao.com/?public/min/index.php?ipb1mp;g=js https://www.ciao.com/?index.php?s=fff586ef951e8e568150db1f085a2028& google-analytics.com/ga.js //static1.ciao.com/?public/style_images/master/bullet_black.png How? Regards,
water Posted February 21, 2015 Posted February 21, 2015 What have you tried so far? My UDFs and Tutorials: Reveal hidden contents UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki
Trong Posted February 21, 2015 Author Posted February 21, 2015 On 2/21/2015 at 8:58 PM, water said: What have you tried so far? Too many problems, very hard for extraction URL Regards,
mikell Posted February 21, 2015 Posted February 21, 2015 Maybe... #Include <Array.au3> $txt = FileRead("1.txt") ; source $res = StringRegExp($txt, '((?:https?|ftp|(?<=src=''|''\.))[^''"]{10,})', 3) _ArrayDisplay($res) Trong 1
Trong Posted February 21, 2015 Author Posted February 21, 2015 On 2/21/2015 at 9:46 PM, mikell said: Maybe... #Include <Array.au3> $txt = FileRead("1.txt") ; source $res = StringRegExp($txt, '((?:https?|ftp|(?<=src=''|''\.))[^''"]{10,})', 3) _ArrayDisplay($res) Good job #Include <Array.au3> #include <Inet.au3> $txt = _INetGetSource('http://www.autoitscript.com/forum/topic/167636-list-of-link/') ; source $res = StringRegExp($txt, '((?:https?|ftp|(?<=src=''|''\.))[^''"]{10,})', 3) _ArrayDisplay($res) Some bug, but OK ! Thanks Regards,
kylomas Posted February 22, 2015 Posted February 22, 2015 Trong, Using the IE* library... #include <ie.au3> $oIE = _IECreate('http://www.google.com',0,0,1) $oLNKS = _IELinkGetCollection($oIE) if isobj($oLNKS) then for $oLNK in $oLNKS ConsoleWrite($oLNK.href & @CRLF) Next endif kylomas Trong 1 Forum Rules Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill
Solution mikell Posted February 22, 2015 Solution Posted February 22, 2015 (edited) kylomas, _IELinkGetCollection matches <a> and area <area> in the text and fails on the string provided in post #1 Edit This one is (a little) better #Include <Array.au3> #include <Inet.au3> $txt = _INetGetSource('http://www.autoitscript.com/forum/topic/167636-list-of-link/') ; source $res = StringRegExp($txt, '((?:https?:/|ftp:/|(?<=src=''|''\.))[^<''";\r\n]{10,})', 3) _ArrayDisplay($res) Edited February 22, 2015 by mikell Trong 1
kylomas Posted February 22, 2015 Posted February 22, 2015 @mikell - Yes, the example is based on post #5 where the OP is using a URL... #include <ie.au3> $oIE = _IECreate('http://www.autoitscript.com/forum/topic/167636-list-of-link/',0,0,1) $oLNKS = _IELinkGetCollection($oIE) if isobj($oLNKS) then for $oLNK in $oLNKS ConsoleWrite($oLNK.href & @CRLF) Next endif The function is indeed returning more than I expected, stuff like this... Quote javascript:void('') javascript:void('Remove Format') javascript:void('Special BBCode') javascript:void('Font') javascript:void('Size') javascript:void('Text Color') javascript:void('Smiley') javascript:void('My Media') javascript:void('Find') javascript:void('Replace') which I have no clue as to what it is. However, it is easy enough to post process the output to eliminate these. Trong 1 Forum Rules Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now