Jump to content

Complex Conversion HELP!


Recommended Posts

Hi!

Let's say i have the following HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Strict//EN">
<HTML>
<HEAD>
<META HTTP_EQUIV="Generator" CONTENT="RTF 2 HTML Converter 0.2, (C) Sergey A. Galin, 2001-2002">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=Windows-1252">
<STYLE TYPE="text/css"><!--
p{ margin: 0; }
table{ border-collapse: collapse; border: none; }
td{ border: solid 1px black; }
.cl1 { color: #FFFFFF; }
.lft{ text-align: left; }
.rgt{ text-align: right; }
.cen{ text-align: center; }
.jus{ text-align: justify; }
.wn{ font-weight: normal; }
.sn{ font-style: normal; }
.u{ text-decoration: underline; }
.lst{ font-family: Wingdings, fantasy; }
.strike{ text-decoration: line-through; }
.sz8{ font-size: 8pt; }
.f0{ font-family: MS Shell Dlg; }
body{ font-size: 12pt; color: black; background-color: white; @f0; }
--></STYLE>
</HEAD>
<BODY>
<SPAN CLASS="cl1 f0 sz8">1 2 3 4 5 6 7 8 9 10<BR><B>1 2 3 4 5 6 7 8 9 10<BR><SPAN CLASS="u"><B>1 2 3 4 5 6 7 8 9 10<SPAN CLASS="nd"><BR><I>1 2 3 4 5 6 7 8 9 10<I><BR></I></I></SPAN></B></SPAN></B></SPAN>

</BODY>
</HTML>

This code is generated from my _R2H() function which converts RTF to HTML. The issue that I have, however, is that, the HTML code above, is not useable for me in the current form. Instead, I need to be able to send the entire body AND style all in one body string, no head elements.

Example, the above would be re-written too:

<font color=#FFFFFF face="MS Shell DLG" size=8>1 2 3 4 5 6 7 8 9 10<br><b>1 2 3 4 5 6 7 8 9 10<br><u>1 2 3 4 5 6 7 8 9 10</u><br><i>1 2 3 4 5 6 7 8 9 10</i><br></i></i></font>

Now, i have tried to do this with a REGEX, but the problem is that, I have no idea what will pass to this function at any one time. A person could send 1 color, 8 colors, 5 font sizes, whatever they want, so the initial span:

<SPAN CLASS="cl1 f0 sz8">

Could be very much more complex:

SPAN CLASS="cl1 f0 sz8 ls1 u wn cen strike">

So, i'd need to actually PARSE each one of those elements, and translate them from their CSS code header into standard HTML font code. I could do that with a regex to just pull the items between the " " after the span class= call, StringReplace each one, after looking up the elements, but my god that would take forver to code EVERYTHING for.

I dont think that's the best way, because the CSS code is included in the <head></head> elements. All formats, colors, sizes, tables, codes, etc are all predefined in the passed HTML. So ther should be a way to capure that, and translate the <body><span></span></body> codes to make sense and easy to send cross the stream.

The reason I cannot send the HTML, as it is, is because i'm going from RTF entry boxes to HTML output for my chat application. Meaning that te user enters text and formatting into an RTF box (which I created) using a rebar/toolbar control (to handle coding). Once they press 'send', it saves the RTF code to a variable, saves that variable to a file, converts that file to HTML, and sends the HTML code across the server to the chat room, which, at the receving end, the chat application takes the HTML code and dispays it on screen. THe issue with passing the complete <head> and <body> codes that the _R2H() function above returns, is that each send of HTML has the potential to override any other text elements anyone else would send, which isn't desierable.

THis s why I eed to condense these <head> and <body> elements into a single <body> compliant commands.

There has to be a way? Even if i have to replace the _R2H() function with some other method to replace the RTF code. Please help :D

RTF code

{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 MS Shell Dlg;}}
{\colortbl;\red255\green255\blue255;}
{\*\generator Msftedit 5.41.21.2508;}\viewkind4\uc1\pard\cf1\f0\fs17 1 2 3 4 5 6 7 8 9 10\par
\b 1 2 3 4 5 6 7 8 9 10\par
\ul\b0 1 2 3 4 5 6 7 8 9 10\ulnone\par
\i 1 2 3 4 5 6 7 8 9 10\i0\par
}

Please see my example in the examples scrpt form for how _R2H() works :D

-_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë 맧ëñ§ë øƒ !ïƒë.

Link to comment
Share on other sites

Hi!

Let's say i have the following HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Strict//EN">
<HTML>
<HEAD>
<META HTTP_EQUIV="Generator" CONTENT="RTF 2 HTML Converter 0.2, (C) Sergey A. Galin, 2001-2002">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=Windows-1252">
<STYLE TYPE="text/css"><!--
p{ margin: 0; }
table{ border-collapse: collapse; border: none; }
td{ border: solid 1px black; }
.cl1 { color: #FFFFFF; }
.lft{ text-align: left; }
.rgt{ text-align: right; }
.cen{ text-align: center; }
.jus{ text-align: justify; }
.wn{ font-weight: normal; }
.sn{ font-style: normal; }
.u{ text-decoration: underline; }
.lst{ font-family: Wingdings, fantasy; }
.strike{ text-decoration: line-through; }
.sz8{ font-size: 8pt; }
.f0{ font-family: MS Shell Dlg; }
body{ font-size: 12pt; color: black; background-color: white; @f0; }
--></STYLE>
</HEAD>
<BODY>
<SPAN CLASS="cl1 f0 sz8">1 2 3 4 5 6 7 8 9 10<BR><B>1 2 3 4 5 6 7 8 9 10<BR><SPAN CLASS="u"><B>1 2 3 4 5 6 7 8 9 10<SPAN CLASS="nd"><BR><I>1 2 3 4 5 6 7 8 9 10<I><BR></I></I></SPAN></B></SPAN></B></SPAN>

</BODY>
</HTML>
I was able to do it in a very dirty method; note, it has to be in the format above. But it works nicely, as it converts the arry into a nice string :D

It first converts the RTF into an array containing the head elements, an array containing the body, and then sends the various span class id's to another routine, that renames the class ID's into inline code. It then recombines the array, and sends the output to a variable for use.

Thankfully, it worked :D

Of course, the issue i had with the r2h.dll was that it wasn't flattening the actual html tags like it was supposed too. for example:

This is bold And this is not

Would show: <b>This is bold <b> And this is not </span></b></span></b>

Because of that, I had to switch to a different method to convert rtf to html. There was a nice bit of code compiled by a 3rd party awhile ago, that works for this purpose, so I have now just the rtf code to that, sent that to an html file, and then just parsed out the body. The cool thing is that this new exe already places the css inline ;) Dont have to convert it anymore :P The problem, tho,is that it does it in <p> blocks, so I had to replace all the <P>'s with <font> blocks and add manual <Br/s> after the </font> calls lol. It sounds complex, but turns out wonderfully.

I've attached the new exe to convert from rtf 2 html, to this post. i'd still lOVE to do this via Autoit only, but I honestly have no idea how to tackle the rtf2html conversion in pure autoit lol. i'm sure it's simliar to how I flattened the below codes, but eesh, having to code for each possible rtf element, etc, can be daunting, i'd assume.

#include <array.au3>
#include-once
local $test, $outHTML, $body, $tmp, $testhtml, $pcode, $vArray, $vars, $test, $sArray, $bArray, $c, $oResult, $total

func processC2H($msg)
$test = getStyle($msg)
$outHTML = convertCSS2Inline($test, $msg)
return $outHTML
EndFunc

Func convertCSS2Inline($tArray, $msg)
    If Not IsArray($tArray) Then Return -1
    If $msg = "" Then Return -2
    $body = StringRegExp($msg, "(?i)(?s)(?:.*?<body>(.*?)</body>)", 3, 1)
    If IsArray($body) Then
        $outHTML = StringSplit($body[0], "<", 2)
        If IsArray($outHTML) Then
            For $i = 0 To UBound($outHTML) - 1
                If StringInStr($outHTML[$i], 'SPAN CLASS="') Then
                    $tmp = replaceCSS2HTML($outHTML[$i])
                    $outHTML[$i] = StringReplace($outHTML[$i], 'SPAN CLASS="'&$tmp[1]&'">', $tmp[0])
                Else
                    if StringInStr($outHTML[$i], "/SPAN>") Then
                        $outHTML[$i] = "</span>"
                    EndIf
                EndIf
            Next
        Else
            Return -4
        EndIf
    Else
        Return -3
    EndIf
    for $i = 1 to UBound($outHTML) - 1
        if StringLeft($outHTML[$i], 1) <> "<" then $outHTML[$i] = "<"&$outHTML[$i]
    Next
    $testhtml = _ArrayToString($outHTML, " ")
    Return $testhtml
EndFunc;==>convertCSS2Inline


func replaceCSS2HTML($eID)
    $pcode = '<span style="'
    if $eID = "" then return -5
    $vArray = StringRegExp($eID, '(?i)(?s)(?:SPAN CLASS="(.*?)")', 3, 1)
    if IsArray($vArray) Then
        $vars = StringSplit($vArray[0], " ", 2)
        for $i = 0 to UBound($vars) - 1
        $tmp = _ArraySearch($test, $vars[$i], 0, 0, 0, 1)
        if $tmp = -1 then 
            if $vars[$i] = "nd" then $pcode = '</span></span><span style="'
        Else
                $pcode = $pcode&$test[$tmp][1]
        EndIf
        Next
        $pcode = $pcode & '">'
    Else
        return -6
    EndIf
    local $tcode[2] = [$pcode, $vArray[0]]
    return $tcode
EndFunc

Func getStyle($msg)
    $sArray = StringRegExp($msg, "(?i)(?s)(?:<STYLE.*?>\<\!--(.*?)--></style>)", 3, 1)
    If IsArray($sArray) Then
        $bArray = StringRegExp($sArray[0], "(?i)(?s)(?:(.*?){(.*?)})", 3, 1)
        If IsArray($bArray) Then
            Local $cssArray[UBound($bArray) - 1][2]
            $c = 0
            For $i = 0 To UBound($bArray) - 2
                If $c = 0 Then
                    $cssArray[$i][0] = $bArray[$i]
                    $cssArray[$i][1] = $bArray[$i + 1]
                    $c = 1
                Else
                    $c = 0
                EndIf
            Next
        EndIf
    Else
    EndIf
    _ArraySort($cssArray, 1)
    $oResult = UBound($cssArray) - 1
    For $i = 0 To $oResult
        $test = UBound($cssArray) - 1
        If $i > $test Then
        Else
            If $cssArray[$i][0] = "" Then
            Else
                $total = $i
            EndIf
        EndIf
    Next
    $total += 1
    Local $css2Array[$total][2]
    For $i = 0 To $total - 1
        $css2Array[$i][0] = $cssArray[$i][0]
        $css2Array[$i][1] = $cssArray[$i][1]
    Next
    Return $css2Array
EndFunc;==>getStyle

Note, you use the attached exe to go from RTF to HTML, and the HTML is coded for inline CSS (execpt for the table and margin marks, but u can leave those out), if you just pull the text from within <body>....</body> tags, using regex, u can get what you want to do inline css ;)

You can use it as:

[*] r2h.exe $RTF $HTML (where $RTF is the RTF file location, and $HTML is the desired .html output file location)

[*] r2h.exe $RTF (which will default to out.html, this is slower, and has the ability to miss multiple calls)

[*] r2h.exe (which will launch a process, waiting for code to be entered, and then waiting for the Ctrl+Z command (^Z) to be sent, after which it will convert and place the converted HTML in the STDOUT buffer for you to read and process. (This is a nice fast way to do the input w/o having to save an RTF and you dont need to save the HTML file either, as the data is exported to the screen.) But, of course, you still DO need to relaunch r2h.exe anytime that you need to convert. In the case of a chat app, it's best to use the first option, where you specify both rtf and html files, as it's much faster.

Enjoy :D

r2h.zip

Edited by zackrspv

-_-------__--_-_-____---_-_--_-__-__-_ ^^€ñ†®øÞÿ ë×阮§ wï†høµ† ƒë@®, wï†høµ† †ïmë, @ñd wï†høµ† @ †ïmïdï†ÿ ƒø® !ïƒë. €×阮 ñø†, bµ† ïñ§†ë@d wï†hïñ, ñ@ÿ, †h®øµghøµ† †hë 맧ëñ§ë øƒ !ïƒë.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...