Sign in to follow this  
Followers 0
alanstone

Word SaveAs question

8 posts in this topic

#1 ·  Posted (edited)

Hi,

To save Word documents as UTF-8 text files,

from what I understand from the Word 2003 VBA help

or http://msdn.microsoft.com/en-us/library/aa220734(office.11).aspx

this corresponds to oDoc.SaveAs

.FileFormat = wdFormatUnicodeText

.Encoding = msoEncodingUTF8

So I've got...

#include <Word.au3>

Local $doc = @ScriptDir & "\" & "doc2txt_test.doc"

Local $txt = @ScriptDir & "\" & "doc2txt_test.txt"

Local $oWord , $oWordDocs , $oWordDoc

$oWord = ObjCreate("Word.Application")

$oWord.Visible = True

$oWordDocs = $oWord.Documents

$oWordDoc = $oWordDocs.Open($doc)

$oWordDoc.SaveAs($txt, ???) ; <--- wd*/mso* constants don't seem to work, what do you put there ?

Is this then saved as UTF-8 with or without BOM ( I need without BOM) ?

- AI 3.3.4.0

- WXP Home SP3

Edited by alanstone

Share this post


Link to post
Share on other sites



#2 ·  Posted (edited)

$oWordDoc.SaveAs($txt, ???) ; <--- wd*/mso* constants don't seem to work, what do you put there ?

You might want to consider using the actual values of the constants instead of the constant name. The actual values can vary depending on version of Word so you'd have to add logic to check for it, via the $oWord.Version property.

You can find out the constant values via the Word's Visual Basic Script Editor Object Browser. Simply look for wdFormatUnicodeText, etc., select it and the bottom pane of object explorer will tell you what the value is.

For Word 2003 and likely Word XP (2002), wdFormatUnicodeText = 7. So substitute the constant with the actual integer value.

For older Word versions, you'd have to check with the VBA object explorer in those versions of Word.

Edited by daluu

Share this post


Link to post
Share on other sites

Thanks for this precious hint.

So now I have...

#include <File.au3>

#include <Word.au3>

Local $doc = @ScriptDir & "\" & "doc2txt_test.doc"

Local $txt = @ScriptDir & "\" & "doc2txt_test.txt"

Local $oWord , $oWordDocs , $oWordDoc

; Word 2003 constant value,

; found through Word's Visual Basic Script Editor Object Browser

Local $def = 0 ; default value

Local $fileformat = 7 ;wdFormatUnicodeText constant value

Local $encoding = 65001 ; (&HFDE9) msoEncodingUTF8 constant value

$oWord = ObjCreate("Word.Application")

$oWord.Visible = True

$oWordDocs = $oWord.Documents

$oWordDoc = $oWordDocs.Open($doc)

With $oWordDoc

.FileName = @ScriptDir & "\" & "doc2txt_test.txt" ; <-- generates an error (*)

; .FileName = $txt ; <-- generates a similar error

; .FileName = '"' & $txt & '"' ; <-- generates a similar error

.FileFormat = $fileformat

.Encoding = $encoding

.Close

EndWith

$oWord.Quit

(*) J:\$AutoIt\doc2txt_utf8.au3 (32) : ==> The requested action with this object has failed.:

.FileName = @ScriptDir & "\" & "doc2txt_test.txt"

.FileName = @ScriptDir & "\" & "doc2txt_test.txt"^ ERROR

Share this post


Link to post
Share on other sites

#4 ·  Posted (edited)

Thanks for this precious hint.

With $oWordDoc

.FileName = @ScriptDir & "\" & "doc2txt_test.txt" ; <-- generates an error (*)

; .FileName = $txt ; <-- generates a similar error

; .FileName = '"' & $txt & '"' ; <-- generates a similar error

.FileFormat = $fileformat

.Encoding = $encoding

.Close

EndWith

$oWord.Quit

(*) J:\$AutoIt\doc2txt_utf8.au3 (32) : ==> The requested action with this object has failed.:

.FileName = @ScriptDir & "\" & "doc2txt_test.txt"

.FileName = @ScriptDir & "\" & "doc2txt_test.txt"^ ERROR

Hmm...sorry, can't really help you there. I've never used that save feature, nor AutoIt to do Word automation. I use JScript (a version of Javascript) + COM because it offer error handling of the Word COM object.

One suggestion you could try is to do your intended automation of saving file to UTF8 unicode plain text format inside Word with the macro recorder, then clean up the VBA code & port that to AutoIt code. If it works inside Word, the recorded code should work, and if ported correctly, that ported code should work as well.

But sometimes the Word COM object may misbehave, which is why I use JScript to fail "gracefully" (or you could ignore error instead, etc.). It may be that the SaveAs function doesn't always succeed and you have to try several times. I don't recall specifically as I worked on that a few years back. I wasn't saving to unicode text but rather to HTML format. I ran across similar errors until I switched from VBScript to using JScript to manipulate the Word object. With JScript try/catch block you can catch the error and do stuff to fail gracefully (e.g. close Word document, release Word object, etc.) or continually try in a loop, etc. until the save succeeds etc. With VBScript, the COM error isn't detected and handled (well) because VBScript can't really handle COM errors, it primarily handles errors on the VBScript side. The error I had and likely yours may be on the Word side. Not sure if AutoIt can handle COM errors well, and whether the error message details is useful enough to find a real fix or workaround.

Edited by daluu

Share this post


Link to post
Share on other sites

#5 ·  Posted (edited)

This makes more sense...

Local $def = 0 ; default value

With $oWordDoc

.SaveAs($txt,$fileformat,$def,"",$def,"",$def,$def,$def,$def,$def,$encoding)

.Close

EndWith

Edited by alanstone

Share this post


Link to post
Share on other sites

This makes more sense...

Local $def = 0 ; default value

With $oWordDoc

.SaveAs($txt,$fileformat,$def,"",$def,"",$def,$def,$def,$def,$def,$encoding)

.Close

EndWith

Yes, that does make more sense.

This method of supplying the extra blank/zero valued parameters is also required when using Word automation via .NET. And to make matters worse, the exact parameter count of required blank/zero valued parameters varies by version of Word. And with .NET at least, you have to compile/link against a certain version of Word installed on your development machine, so you can't support multiple versions. Because of that, I swapped over to VBScript/JScript, as I originally started in .NET.

Interestingly, with VBScript and JScript over the Word COM interface, you don't have to specify all those blank/zero value parameters. You just follow the API as presented in the Word VBScript editor / object browser. Which makes things a lot easier.

Share this post


Link to post
Share on other sites

daluu wrote:

> I use JScript (a version of Javascript) + COM

Do you mean this http://msdn.microsoft.com/en-us/library/hbxc2t98(VS.85).aspx ?

Yes, that is correct.

Here's a snippet of code for my SaveAs to HTML function, for your reference.

WordObj = new ActiveXObject("Word.Application");
if(WordObj.Version > 9.0){ //Office 2003 & XP
    HTMLFrmt = 8;
}else{
    //should check values for Office 97 & Office 2000, etc.
    HTMLFrmt = 18;
}
webFilename = "someName.htm";
function SaveActiveDocAsHTML(pFilePath){
    var ParentPath, webFile;
    filesys = new ActiveXObject("Scripting.FileSystemObject");
    ParentPath = filesys.GetParentFolderName(pFilePath);
    webFile = ParentPath + "\\" + webFilename;
    try{
        WordObj.Activedocument.SaveAs(webFile,HTMLFrmt);
        //WordObj.WdSaveFormat.wdFormatHTML
    }catch(err){
        WordErrHandler("Could not save the file as HTML/webpage. Operation aborted.");
    }
    filesys = null;
}
//WORDERRHANDLER
//description: outputs supplied error message, close Word, disposes word object, exits script
//input: error message
//output: none
function WordErrHandler(msg){
    //WScript.Echo("MS Word threw an exception");
    WScript.Echo(msg);
    CloseWord();
    WScript.Quit(true);
}
//CLOSEWORD
//description: quits Word & clears the Word Application Object
//input: none
//output: none
function CloseWord(){
    WordObj.Quit(false);
    WordObj = null;
}

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0