AuToItItAlIaNlOv3R Posted March 23, 2009 Share Posted March 23, 2009 Hi, i have this problem. I woult like to capture a line in a text. For example i've copied a source of web page in a .txt file.In this txt are the source code of web pages.I would like to copy a part of this source.This part is enclose in a tag.For example i've this source:<html><title>Test</title> <font color="ff0000"><div id="link">THIS IS A TEST</div></font></center></body></html>i would like to copy the period "THIS IS A TEST" in anoter .txt file.how can I do?Hello Link to comment Share on other sites More sharing options...
Mat Posted March 23, 2009 Share Posted March 23, 2009 I'm not 100% but could you use _StringBetween where start = <div id="link"> and end = </div>? Check the helpfile under string management (UDF) for the full thing Good luck MDiesel AutoIt Project Listing Link to comment Share on other sites More sharing options...
PhilRip Posted March 23, 2009 Share Posted March 23, 2009 (edited) #Include <File.au3> dim $html $file=@ScriptDir & "\html.txt" ; here the file with the html _FileReadToArray($file, $html) For $i=1 to $html[0] If StringLeft($html[$i],1) <> "<" Then FileWriteLine("new.txt", $html[$i]) ; output in new.txt ExitLoop ; delete this for more/all lines not starting with "<" EndIf Next Edited March 23, 2009 by PhilRip Link to comment Share on other sites More sharing options...
AuToItItAlIaNlOv3R Posted March 24, 2009 Author Share Posted March 24, 2009 i would linke to copy in anoter file the string that start = <div id="link"> and end = </div>. How do it? Link to comment Share on other sites More sharing options...
AuToItItAlIaNlOv3R Posted March 24, 2009 Author Share Posted March 24, 2009 Up Link to comment Share on other sites More sharing options...
GEOSoft Posted March 24, 2009 Share Posted March 24, 2009 (edited) Please wait 24 hours before bumping a post. Using this as the source <html> <title>Test</title> <font color="ff0000"><div id="link"> THIS IS A TEST </div></font></center> <font color="ff0000"><div id="link2"> THIS IS ANOTHER TEST </div></font></center> </body> </html>$sStr = FileRead("Somefile.htm.") $aStr = StringRegExp($sStr, "(?i)(?s)<div id=\x22?link.*?\x22?>\v?(.*?)\v?</?", 3) For $i = 0 to Ubound($aStr) -1 MsgBox(0, "RESULTS", $aStr[$i] NextNOTE: if you notice it will only work if you have the word link in the id. If that's not going to work then we will need to see some actual html code. Edited March 24, 2009 by GEOSoft George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
AuToItItAlIaNlOv3R Posted March 24, 2009 Author Share Posted March 24, 2009 (edited) I've this source code of web page: <HTML><HEAD><TITLE>Insert image</TITLE><LINK href="www.imageshack.us" rel="SHORTCUT ICON"> <script language=javascript src="Footer.js" type=text/javascript></SCRIPT> <LINK title="Default Theme" media=all href="style.css" type=text/css rel=stylesheet></HEAD> <BODY onload=addFooter();> <CENTER> <DIV id=container><IMG style="WIDTH: 342px; HEIGHT: 38px" alt="" src="http://img136.imageshack.us/img/img.png"> </DIV> <DIV id=container> <META content=reeky name=copyright> <STYLE type=text/css> /*<![CDATA[*/ body,input{font: small Verdana, Geneva, Arial, Helvetica, sans-serif;} a{color:#00C;} h2{margin-bottom:0;} /*]]>*/ </STYLE> <CENTER> <H2><FONT color=#ffffff>Official Web Site</FONT> </H2><BR><BR><FONT color=#ff0000> <DIV id=premium_link>http://www.EXAMPLELINK.EXAMPLE </DIV></FONT></CENTER><BR> <DIV id=container> <DIV style="TEXT-ALIGN: center"> <FORM id=a name=a method=post><B><FONT color=#ffffff>Link: (<FONT color=#ffffff>Insert Image Path</FONT>)</B><BR></FONT><BR><INPUT size=50 value=" " name=link> <INPUT type=submit value=Go!> </FORM></DIV></DIV></CENTER></DIV></BODY></HTML> I would link to copy in anoter .txt file, the string in bold (http://www.EXAMPLELINK.EXAMPLE). how can I do? Edited March 24, 2009 by AuToItItAlIaNlOv3R Link to comment Share on other sites More sharing options...
Authenticity Posted March 24, 2009 Share Posted March 24, 2009 "(?i)<div[^>]*+>([^\r\n<]*+)" This is not a task for StringRegExp alone, you need first to enumerate the document <DIV> tags and check each one is _IEPropertyGet($oDiv, 'innerText') or 'outerText, if I'm not wrong, then use string regexp to match link pattern. Link to comment Share on other sites More sharing options...
GEOSoft Posted March 24, 2009 Share Posted March 24, 2009 (edited) Change the regexp I gave you to $aStr = StringRegExp($sStr, "(?i)(?s)<div\sid=.*?link.*?>\v?(.*?)\v?</.+>", 3) I should have added that you change the MsgBox() to FileWriteLine("myfile.txt", $aStr[$i]) Edited March 24, 2009 by GEOSoft George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
AuToItItAlIaNlOv3R Posted March 24, 2009 Author Share Posted March 24, 2009 @Geosoft this code don't work : $sStr = FileRead(@DesktopDir & "\sourcecode.txt") $aStr = StringRegExp($sStr, "(?i)(?s)<div\sid=.*?link.*?>\v?(.*?)\v?</.+>", 3) For $i = 0 to Ubound($aStr) -1 FileWriteLine("myfile.txt", $aStr[$i]) Next The myfile.txt is empty! How can i do this script? Link to comment Share on other sites More sharing options...
GEOSoft Posted March 25, 2009 Share Posted March 25, 2009 I tested it using the HTML thatyou posted and it worked fine. replace the FileWrite line with this just to see what it returns. MsgBox(0, "TEST RESULTS", "Error = " & @Error & @CRLF & $aStr[$i]) George Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.*** The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number. Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else. "Old age and treachery will always overcome youth and skill!" Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now