zackrspv Posted June 16, 2009 Share Posted June 16, 2009 Hi! Let's say i have the following HTML <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Strict//EN"> <HTML> <HEAD> <META HTTP_EQUIV="Generator" CONTENT="RTF 2 HTML Converter 0.2, (C) Sergey A. Galin, 2001-2002"> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=Windows-1252"> <STYLE TYPE="text/css"><!-- p{ margin: 0; } table{ border-collapse: collapse; border: none; } td{ border: solid 1px black; } .cl1 { color: #FFFFFF; } .lft{ text-align: left; } .rgt{ text-align: right; } .cen{ text-align: center; } .jus{ text-align: justify; } .wn{ font-weight: normal; } .sn{ font-style: normal; } .u{ text-decoration: underline; } .lst{ font-family: Wingdings, fantasy; } .strike{ text-decoration: line-through; } .sz8{ font-size: 8pt; } .f0{ font-family: MS Shell Dlg; } body{ font-size: 12pt; color: black; background-color: white; @f0; } --></STYLE> </HEAD> <BODY> <SPAN CLASS="cl1 f0 sz8">1 2 3 4 5 6 7 8 9 10<BR><B>1 2 3 4 5 6 7 8 9 10<BR><SPAN CLASS="u"><B>1 2 3 4 5 6 7 8 9 10<SPAN CLASS="nd"><BR><I>1 2 3 4 5 6 7 8 9 10<I><BR></I></I></SPAN></B></SPAN></B></SPAN> </BODY> </HTML> This code is generated from my _R2H() function which converts RTF to HTML. The issue that I have, however, is that, the HTML code above, is not useable for me in the current form. Instead, I need to be able to send the entire body AND style all in one body string, no head elements. Example, the above would be re-written too: <font color=#FFFFFF face="MS Shell DLG" size=8>1 2 3 4 5 6 7 8 9 10<br><b>1 2 3 4 5 6 7 8 9 10<br><u>1 2 3 4 5 6 7 8 9 10</u><br><i>1 2 3 4 5 6 7 8 9 10</i><br></i></i></font> Now, i have tried to do this with a REGEX, but the problem is that, I have no idea what will pass to this function at any one time. A person could send 1 color, 8 colors, 5 font sizes, whatever they want, so the initial span: <SPAN CLASS="cl1 f0 sz8"> Could be very much more complex: SPAN CLASS="cl1 f0 sz8 ls1 u wn cen strike"> So, i'd need to actually PARSE each one of those elements, and translate them from their CSS code header into standard HTML font code. I could do that with a regex to just pull the items between the " " after the span class= call, StringReplace each one, after looking up the elements, but my god that would take forver to code EVERYTHING for. I dont think that's the best way, because the CSS code is included in the <head></head> elements. All formats, colors, sizes, tables, codes, etc are all predefined in the passed HTML. So ther should be a way to capure that, and translate the <body><span></span></body> codes to make sense and easy to send cross the stream. The reason I cannot send the HTML, as it is, is because i'm going from RTF entry boxes to HTML output for my chat application. Meaning that te user enters text and formatting into an RTF box (which I created) using a rebar/toolbar control (to handle coding). Once they press 'send', it saves the RTF code to a variable, saves that variable to a file, converts that file to HTML, and sends the HTML code across the server to the chat room, which, at the receving end, the chat application takes the HTML code and dispays it on screen. THe issue with passing the complete <head> and <body> codes that the _R2H() function above returns, is that each send of HTML has the potential to override any other text elements anyone else would send, which isn't desierable. THis s why I eed to condense these <head> and <body> elements into a single <body> compliant commands. There has to be a way? Even if i have to replace the _R2H() function with some other method to replace the RTF code. Please help RTF code {\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 MS Shell Dlg;}} {\colortbl;\red255\green255\blue255;} {\*\generator Msftedit 5.41.21.2508;}\viewkind4\uc1\pard\cf1\f0\fs17 1 2 3 4 5 6 7 8 9 10\par \b 1 2 3 4 5 6 7 8 9 10\par \ul\b0 1 2 3 4 5 6 7 8 9 10\ulnone\par \i 1 2 3 4 5 6 7 8 9 10\i0\par } Please see my example in the examples scrpt form for how _R2H() works -_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë 맧ëñ§ë øƒ !ïƒë. Link to comment Share on other sites More sharing options...
Zedna Posted June 16, 2009 Share Posted June 16, 2009 I think StringRegExp() and StringRegExpReplace() will help you to do this task. Resources UDF ResourcesEx UDF AutoIt Forum Search Link to comment Share on other sites More sharing options...
zackrspv Posted June 17, 2009 Author Share Posted June 17, 2009 (edited) Hi! Let's say i have the following HTML <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Strict//EN"> <HTML> <HEAD> <META HTTP_EQUIV="Generator" CONTENT="RTF 2 HTML Converter 0.2, (C) Sergey A. Galin, 2001-2002"> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=Windows-1252"> <STYLE TYPE="text/css"><!-- p{ margin: 0; } table{ border-collapse: collapse; border: none; } td{ border: solid 1px black; } .cl1 { color: #FFFFFF; } .lft{ text-align: left; } .rgt{ text-align: right; } .cen{ text-align: center; } .jus{ text-align: justify; } .wn{ font-weight: normal; } .sn{ font-style: normal; } .u{ text-decoration: underline; } .lst{ font-family: Wingdings, fantasy; } .strike{ text-decoration: line-through; } .sz8{ font-size: 8pt; } .f0{ font-family: MS Shell Dlg; } body{ font-size: 12pt; color: black; background-color: white; @f0; } --></STYLE> </HEAD> <BODY> <SPAN CLASS="cl1 f0 sz8">1 2 3 4 5 6 7 8 9 10<BR><B>1 2 3 4 5 6 7 8 9 10<BR><SPAN CLASS="u"><B>1 2 3 4 5 6 7 8 9 10<SPAN CLASS="nd"><BR><I>1 2 3 4 5 6 7 8 9 10<I><BR></I></I></SPAN></B></SPAN></B></SPAN> </BODY> </HTML> I was able to do it in a very dirty method; note, it has to be in the format above. But it works nicely, as it converts the arry into a nice string It first converts the RTF into an array containing the head elements, an array containing the body, and then sends the various span class id's to another routine, that renames the class ID's into inline code. It then recombines the array, and sends the output to a variable for use. Thankfully, it worked Of course, the issue i had with the r2h.dll was that it wasn't flattening the actual html tags like it was supposed too. for example: This is bold And this is not Would show: <b>This is bold <b> And this is not </span></b></span></b> Because of that, I had to switch to a different method to convert rtf to html. There was a nice bit of code compiled by a 3rd party awhile ago, that works for this purpose, so I have now just the rtf code to that, sent that to an html file, and then just parsed out the body. The cool thing is that this new exe already places the css inline Dont have to convert it anymore The problem, tho,is that it does it in <p> blocks, so I had to replace all the <P>'s with <font> blocks and add manual <Br/s> after the </font> calls lol. It sounds complex, but turns out wonderfully. I've attached the new exe to convert from rtf 2 html, to this post. i'd still lOVE to do this via Autoit only, but I honestly have no idea how to tackle the rtf2html conversion in pure autoit lol. i'm sure it's simliar to how I flattened the below codes, but eesh, having to code for each possible rtf element, etc, can be daunting, i'd assume. expandcollapse popup#include <array.au3> #include-once local $test, $outHTML, $body, $tmp, $testhtml, $pcode, $vArray, $vars, $test, $sArray, $bArray, $c, $oResult, $total func processC2H($msg) $test = getStyle($msg) $outHTML = convertCSS2Inline($test, $msg) return $outHTML EndFunc Func convertCSS2Inline($tArray, $msg) If Not IsArray($tArray) Then Return -1 If $msg = "" Then Return -2 $body = StringRegExp($msg, "(?i)(?s)(?:.*?<body>(.*?)</body>)", 3, 1) If IsArray($body) Then $outHTML = StringSplit($body[0], "<", 2) If IsArray($outHTML) Then For $i = 0 To UBound($outHTML) - 1 If StringInStr($outHTML[$i], 'SPAN CLASS="') Then $tmp = replaceCSS2HTML($outHTML[$i]) $outHTML[$i] = StringReplace($outHTML[$i], 'SPAN CLASS="'&$tmp[1]&'">', $tmp[0]) Else if StringInStr($outHTML[$i], "/SPAN>") Then $outHTML[$i] = "</span>" EndIf EndIf Next Else Return -4 EndIf Else Return -3 EndIf for $i = 1 to UBound($outHTML) - 1 if StringLeft($outHTML[$i], 1) <> "<" then $outHTML[$i] = "<"&$outHTML[$i] Next $testhtml = _ArrayToString($outHTML, " ") Return $testhtml EndFunc;==>convertCSS2Inline func replaceCSS2HTML($eID) $pcode = '<span style="' if $eID = "" then return -5 $vArray = StringRegExp($eID, '(?i)(?s)(?:SPAN CLASS="(.*?)")', 3, 1) if IsArray($vArray) Then $vars = StringSplit($vArray[0], " ", 2) for $i = 0 to UBound($vars) - 1 $tmp = _ArraySearch($test, $vars[$i], 0, 0, 0, 1) if $tmp = -1 then if $vars[$i] = "nd" then $pcode = '</span></span><span style="' Else $pcode = $pcode&$test[$tmp][1] EndIf Next $pcode = $pcode & '">' Else return -6 EndIf local $tcode[2] = [$pcode, $vArray[0]] return $tcode EndFunc Func getStyle($msg) $sArray = StringRegExp($msg, "(?i)(?s)(?:<STYLE.*?>\<\!--(.*?)--></style>)", 3, 1) If IsArray($sArray) Then $bArray = StringRegExp($sArray[0], "(?i)(?s)(?:(.*?){(.*?)})", 3, 1) If IsArray($bArray) Then Local $cssArray[UBound($bArray) - 1][2] $c = 0 For $i = 0 To UBound($bArray) - 2 If $c = 0 Then $cssArray[$i][0] = $bArray[$i] $cssArray[$i][1] = $bArray[$i + 1] $c = 1 Else $c = 0 EndIf Next EndIf Else EndIf _ArraySort($cssArray, 1) $oResult = UBound($cssArray) - 1 For $i = 0 To $oResult $test = UBound($cssArray) - 1 If $i > $test Then Else If $cssArray[$i][0] = "" Then Else $total = $i EndIf EndIf Next $total += 1 Local $css2Array[$total][2] For $i = 0 To $total - 1 $css2Array[$i][0] = $cssArray[$i][0] $css2Array[$i][1] = $cssArray[$i][1] Next Return $css2Array EndFunc;==>getStyle Note, you use the attached exe to go from RTF to HTML, and the HTML is coded for inline CSS (execpt for the table and margin marks, but u can leave those out), if you just pull the text from within <body>....</body> tags, using regex, u can get what you want to do inline css You can use it as: [*] r2h.exe $RTF $HTML (where $RTF is the RTF file location, and $HTML is the desired .html output file location) [*] r2h.exe $RTF (which will default to out.html, this is slower, and has the ability to miss multiple calls) [*] r2h.exe (which will launch a process, waiting for code to be entered, and then waiting for the Ctrl+Z command (^Z) to be sent, after which it will convert and place the converted HTML in the STDOUT buffer for you to read and process. (This is a nice fast way to do the input w/o having to save an RTF and you dont need to save the HTML file either, as the data is exported to the screen.) But, of course, you still DO need to relaunch r2h.exe anytime that you need to convert. In the case of a chat app, it's best to use the first option, where you specify both rtf and html files, as it's much faster. Enjoy r2h.zip Edited June 17, 2009 by zackrspv -_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë 맧ëñ§ë øƒ !ïƒë. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now