Jump to content

Recommended Posts

Posted

Hi, i have this problem. I woult like to capture a line in a text. For example i've copied a source of web page in a .txt file.In this txt are the source code of web pages.I would like to copy a part of this source.This part is enclose in a tag.For example i've this source:

<html>

<title>Test</title>

<font color="ff0000"><div id="link">

THIS IS A TEST

</div></font></center>

</body>

</html>

i would like to copy the period "THIS IS A TEST" in anoter .txt file.

how can I do?

Hello :D

Posted (edited)

Please wait 24 hours before bumping a post.

Using this as the source

<html>

<title>Test</title>

<font color="ff0000"><div id="link">

THIS IS A TEST

</div></font></center>

<font color="ff0000"><div id="link2">

THIS IS ANOTHER TEST

</div></font></center>

</body>

</html>

$sStr = FileRead("Somefile.htm.")
$aStr = StringRegExp($sStr, "(?i)(?s)<div id=\x22?link.*?\x22?>\v?(.*?)\v?</?", 3)
For $i = 0 to Ubound($aStr) -1
    MsgBox(0, "RESULTS", $aStr[$i]
Next
NOTE: if you notice it will only work if you have the word link in the id. If that's not going to work then we will need to see some actual html code. Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Posted (edited)

I've this source code of web page:

<HTML><HEAD><TITLE>Insert image</TITLE><LINK href="www.imageshack.us" rel="SHORTCUT ICON">

<script language=javascript src="Footer.js" type=text/javascript></SCRIPT>

<LINK title="Default Theme" media=all href="style.css" type=text/css rel=stylesheet></HEAD>

<BODY onload=addFooter();>

<CENTER>

<DIV id=container><IMG style="WIDTH: 342px; HEIGHT: 38px" alt="" src="http://img136.imageshack.us/img/img.png"> </DIV>

<DIV id=container>

<META content=reeky name=copyright>

<STYLE type=text/css>

/*<![CDATA[*/

body,input{font: small Verdana, Geneva, Arial, Helvetica, sans-serif;}

a{color:#00C;}

h2{margin-bottom:0;}

/*]]>*/

</STYLE>

<CENTER>

<H2><FONT color=#ffffff>Official Web Site</FONT> </H2><BR><BR><FONT color=#ff0000>

<DIV id=premium_link>http://www.EXAMPLELINK.EXAMPLE

</DIV></FONT></CENTER><BR>

<DIV id=container>

<DIV style="TEXT-ALIGN: center">

<FORM id=a name=a method=post><B><FONT color=#ffffff>Link: (<FONT color=#ffffff>Insert Image Path</FONT>)</B><BR></FONT><BR><INPUT size=50 value=" " name=link> <INPUT type=submit value=Go!> </FORM></DIV></DIV></CENTER></DIV></BODY></HTML>

I would link to copy in anoter .txt file, the string in bold (http://www.EXAMPLELINK.EXAMPLE).

how can I do?

Edited by AuToItItAlIaNlOv3R
Posted

"(?i)<div[^>]*+>([^\r\n<]*+)"

This is not a task for StringRegExp alone, you need first to enumerate the document <DIV> tags and check each one is _IEPropertyGet($oDiv, 'innerText') or 'outerText, if I'm not wrong, then use string regexp to match link pattern.

Posted (edited)

Change the regexp I gave you to

$aStr = StringRegExp($sStr, "(?i)(?s)<div\sid=.*?link.*?>\v?(.*?)\v?</.+>", 3)

I should have added that you change the MsgBox() to FileWriteLine("myfile.txt", $aStr[$i])

Edited by GEOSoft

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Posted

@Geosoft this code don't work :

$sStr = FileRead(@DesktopDir & "\sourcecode.txt")

$aStr = StringRegExp($sStr, "(?i)(?s)<div\sid=.*?link.*?>\v?(.*?)\v?</.+>", 3)

For $i = 0 to Ubound($aStr) -1

FileWriteLine("myfile.txt", $aStr[$i])

Next

The myfile.txt is empty!

How can i do this script?

Posted

I tested it using the HTML thatyou posted and it worked fine.

replace the FileWrite line with this just to see what it returns.

MsgBox(0, "TEST RESULTS", "Error = " & @Error & @CRLF & $aStr[$i])

George

Question about decompiling code? Read the decompiling FAQ and don't bother posting the question in the forums.

Be sure to read and follow the forum rules. -AKA the AutoIt Reading and Comprehension Skills test.***

The PCRE (Regular Expression) ToolKit for AutoIT - (Updated Oct 20, 2011 ver:3.0.1.13) - Please update your current version before filing any bug reports. The installer now includes both 32 and 64 bit versions. No change in version number.

Visit my Blog .. currently not active but it will soon be resplendent with news and views. Also please remove any links you may have to my website. it is soon to be closed and replaced with something else.

"Old age and treachery will always overcome youth and skill!"

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...