# Method for extracting text from RTF file

## Recommended Posts

I've worked with RTF files and rich edit controls for years. But recently I needed to simply extract the text from an RTF file (i.e., without any of the formatting).

I found a few suggested methods, but none were as simple as what I would like.

A short investigation revealed that although all the necessary pieces are in the standard Au3 function set, there is no macro Extract Text function.

What I'm providing below is a working tool for ferreting out and tuning any subtle aspects to any processing you might need.

THE IMPORTANT THING TO KNOW IS THIS: the file's ENCODING is key to everything.

If you're certain of the file's encoding, then just specify it in your _GUICtrlRichEdit_StreamFromFile() call.

If you don't know it, use FileGetEncoding() and use the result in your StreamFrom call.

BUT HERE'S THE CAVEAT: determining a file's encoding is tricky. There's a wide range of programs writing RTFs ... and the specification(s) for a file's encoding can be rather loosely implemented. As a result, there's a note in the Au3 function that it will return Binary (code = 16) if the encoding isn't clear. But you can never specify Binary in your StreamFrom call or you will get gibberish.

A fairly reliable "rule" is that if your don't get a clear encoding indication—like UTF-8 or UTF-16—then it's pretty safe to assume ANSI for an RTF file on a windows PC ... so replace any code=16 with code=512.

Feel free to suggest alternatives ... or ways to make the process more robust.

For anyone who's interested, I found this related discussion on StackExchange: link

#include <GUIConstantsEx.au3>
#include <GuiRichEdit.au3>
#include <WindowsConstants.au3>
#include <WinAPISysWin.au3>
#include <String.au3>
#include <Array.au3>
#include <File.au3>

Global $watch = "C:\path\to\file.rtf" ; path to RTF file$hGui = GUICreate("Extract text from RTF", 660, 320, -1, -1)
$lblMask = GUICtrlCreateLabel("", 10, 10, 300, 220) GUICtrlSetBkColor($lblMask, $GUI_BKCOLOR_TRANSPARENT)$hRichEdit = _GUICtrlRichEdit_Create($hGui, "This is a test.", 10, 20, 300, 220, BitOR($ES_MULTILINE, $WS_VSCROLL))$normal = GUICtrlCreateEdit("initial text", 330, 20, 320, 240)
$cButton = GUICtrlCreateButton("Process the file", 80, 270, 180, 30)$eButton = GUICtrlCreateButton("Examine first 500", 400, 270, 180, 30)
GUICtrlSetState($cButton,$GUI_FOCUS)
GUISetState(@SW_SHOW)

While True
$iMsg = GUIGetMsg() Select Case$iMsg = $GUI_EVENT_CLOSE _GUICtrlRichEdit_Destroy($hRichEdit) ; needed unless script crashes
Exit
Case $iMsg =$cButton
$encoding = FileGetEncoding($watch)
If $encoding = 16 Then$encoding = 512
;            MsgBox(0, "Encoding is ", $encoding) _GUICtrlRichEdit_StreamFromFile($hRichEdit, $watch,$encoding)
GUICtrlSetData($normal, _GUICtrlRichEdit_GetText($hRichEdit, True))
ConsoleWrite("Processed" & @CRLF)
Case $iMsg =$eButton
$readText = StringLeft(GUICtrlRead($normal), 500)
MsgBox(0, "2: ", $readText & @CRLF & _StringRepeat("-", 80) & @CRLF & _StringToHex($readText))
EndSelect
WEnd

## Create an account

Register a new account

• ### Recently Browsing   0 members

×

• Wiki

• Back

• #### Beta

• Git
• FAQ
• Our Picks
×
• Create New...