Trong Posted February 21, 2015 Posted February 21, 2015 In _INetGetSource('www.autoitscript.com'): sdfuykrytxmlns:fb="http://www.facebook.com/2008/fbml".href='http://www.ciao.com/?favicon.ico' ,sdfg,content="ftp://www.ciao.com/?topic-to-it/" .sdfgfgjhdfg,src='http://www.ciao.com/?public/min/index.php?ipb1mp;g=js'.ipb.vars['base_url']= 'https://www.ciao.com/?index.php?s=fff586ef951e8e568150db1f085a2028&';ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';src='//static1.ciao.com/?public/style_images/master/bullet_black.png'sdfkj Out ConsoleWrite($Link[$n] & @CRLF): http://www.facebook.com/2008/fbml http://www.ciao.com/?favicon.ico ftp://www.ciao.com/?topic-to-it/ http://www.ciao.com/?public/min/index.php?ipb1mp;g=js https://www.ciao.com/?index.php?s=fff586ef951e8e568150db1f085a2028& google-analytics.com/ga.js //static1.ciao.com/?public/style_images/master/bullet_black.png How? Regards,
water Posted February 21, 2015 Posted February 21, 2015 What have you tried so far? My UDFs and Tutorials: Spoiler UDFs: Active Directory (NEW 2024-07-28 - Version 1.6.3.0) - Download - General Help & Support - Example Scripts - Wiki ExcelChart (2017-07-21 - Version 0.4.0.1) - Download - General Help & Support - Example Scripts OutlookEX (2021-11-16 - Version 1.7.0.0) - Download - General Help & Support - Example Scripts - Wiki OutlookEX_GUI (2021-04-13 - Version 1.4.0.0) - Download Outlook Tools (2019-07-22 - Version 0.6.0.0) - Download - General Help & Support - Wiki PowerPoint (2021-08-31 - Version 1.5.0.0) - Download - General Help & Support - Example Scripts - Wiki Task Scheduler (2022-07-28 - Version 1.6.0.1) - Download - General Help & Support - Wiki Standard UDFs: Excel - Example Scripts - Wiki Word - Wiki Tutorials: ADO - Wiki WebDriver - Wiki
Trong Posted February 21, 2015 Author Posted February 21, 2015 What have you tried so far? Too many problems, very hard for extraction URL Regards,
mikell Posted February 21, 2015 Posted February 21, 2015 Maybe... #Include <Array.au3> $txt = FileRead("1.txt") ; source $res = StringRegExp($txt, '((?:https?|ftp|(?<=src=''|''\.))[^''"]{10,})', 3) _ArrayDisplay($res) Trong 1
Trong Posted February 21, 2015 Author Posted February 21, 2015 Maybe... #Include <Array.au3> $txt = FileRead("1.txt") ; source $res = StringRegExp($txt, '((?:https?|ftp|(?<=src=''|''\.))[^''"]{10,})', 3) _ArrayDisplay($res) Good job #Include <Array.au3> #include <Inet.au3> $txt = _INetGetSource('http://www.autoitscript.com/forum/topic/167636-list-of-link/') ; source $res = StringRegExp($txt, '((?:https?|ftp|(?<=src=''|''\.))[^''"]{10,})', 3) _ArrayDisplay($res) Some bug, but OK ! Thanks Regards,
kylomas Posted February 22, 2015 Posted February 22, 2015 Trong, Using the IE* library... #include <ie.au3> $oIE = _IECreate('http://www.google.com',0,0,1) $oLNKS = _IELinkGetCollection($oIE) if isobj($oLNKS) then for $oLNK in $oLNKS ConsoleWrite($oLNK.href & @CRLF) Next endif kylomas Trong 1 Forum Rules Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill
Solution mikell Posted February 22, 2015 Solution Posted February 22, 2015 (edited) kylomas, _IELinkGetCollection matches <a> and area <area> in the text and fails on the string provided in post #1 Edit This one is (a little) better #Include <Array.au3> #include <Inet.au3> $txt = _INetGetSource('http://www.autoitscript.com/forum/topic/167636-list-of-link/') ; source $res = StringRegExp($txt, '((?:https?:/|ftp:/|(?<=src=''|''\.))[^<''";\r\n]{10,})', 3) _ArrayDisplay($res) Edited February 22, 2015 by mikell Trong 1
kylomas Posted February 22, 2015 Posted February 22, 2015 @mikell - Yes, the example is based on post #5 where the OP is using a URL... #include <ie.au3> $oIE = _IECreate('http://www.autoitscript.com/forum/topic/167636-list-of-link/',0,0,1) $oLNKS = _IELinkGetCollection($oIE) if isobj($oLNKS) then for $oLNK in $oLNKS ConsoleWrite($oLNK.href & @CRLF) Next endif The function is indeed returning more than I expected, stuff like this... javascript:void('') javascript:void('Remove Format') javascript:void('Special BBCode') javascript:void('Font') javascript:void('Size') javascript:void('Text Color') javascript:void('Smiley') javascript:void('My Media') javascript:void('Find') javascript:void('Replace') which I have no clue as to what it is. However, it is easy enough to post process the output to eliminate these. Trong 1 Forum Rules Procedure for posting code "I like pigs. Dogs look up to us. Cats look down on us. Pigs treat us as equals." - Sir Winston Churchill
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now