Jump to content
Sign in to follow this  
alanstone

_WordDocSaveAs Unicode text: BOM'ed or not ?

Recommended Posts

alanstone

from AI help: $i_Format 7 = Unicode text format or Encoded text format

is that Unicode ( UTF-8 ? ) with or without BOM ?

if saved with BOM, how to save without BOM ?

- AI v3.3.4.0

Edited by alanstone

Share this post


Link to post
Share on other sites
jchd

from AI help: $i_Format 7 = Unicode text format or Encoded text format

is that Unicode ( UTF-8 ? ) with or without BOM ?

if saved with BOM, how to save without BOM ?

Why not try and reports us the answers?

I did that for you (you lazy boy!) and here is:

Saving a Word doc with $i_Format = 7 means ANSI. That imply that every Unicode character whose codepoint is > 0xFF is converted to '?', not very useful.

If you open a blank document manually, insert plain Unicode like:

This is some text to insert.

Sant Julià de Lòria

Skrýchov u Opařan

Žíšov

БОЛЬШОЕ ГРИДИНО

МЫТИЩИ-ДТИ

歴史的仮名遣

変体仮名

فرنسيّ عربيّ

सभी मनुष्यों को गौरव और अधिकारों के मामले में जन्मजात स्वतन्त्रता और समानता प्राप्त है। उन्हें बुद्धि और अन्तरात्मा की देन है और परस्पर उन्हें भाईचारे के भाव से बर्ताव करना चाहिये।

เขาจะได้ไปเที่ยวเมืองลาว

and save the beef as "Raw text (*.txt)" [soit "Texte brut" en bon français] then a dialog pops up with options to save under a number of encodings ANSI (various codepages) and Unicode UTF-16BE, -16LE, -8 with or without BOM.

This works well, but might not be easy to manage from the COM object.

Moral: either ask a COM guru to come up with a method to enter options in the raw text dialog, or use Send or CtrlSend to automate Word for that task.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
alanstone

Why not try and reports us the answers?

I did that for you (you lazy boy!) and here is:

Saving a Word doc with $i_Format = 7 means ANSI. That imply that every Unicode character whose codepoint is > 0xFF is converted to '?', not very useful.

I wasn't lazy :D

I tried it too, got the same result and was wondering whether

I did something erroneous or there was something with my editor.

So either the help file is wrong, or _WordDocSaveAs is bugged.

I have a collection of *.doc files (of 3rd party origin) which

must be cleaned up and converted to *.txt UTF-8 without BOM.

Edited by alanstone

Share this post


Link to post
Share on other sites
jchd

So either the help file is wrong, or _WordDocSaveAs is bugged.

Bug? Not obvious. How is described the COM interface? It may be using the same phrasing.

I guess that something needs to be done with the object so that further options are passed to the COM layer. The question being is it at all possible?

Try asking in the COM forum.

If this is routine job, then best approach will be to use the pedestrian ctrlsend way. It should work as fine.


This wonderful site allows debugging and testing regular expressions (many flavors available). An absolute must have in your bookmarks.
Another excellent RegExp tutorial. Don't forget downloading your copy of up-to-date pcretest.exe and pcregrep.exe here
RegExp tutorial: enough to get started
PCRE v8.33 regexp documentation latest available release and currently implemented in AutoIt beta.

SQLitespeed is another feature-rich premier SQLite manager (includes import/export). Well worth a try.
SQLite Expert (freeware Personal Edition or payware Pro version) is a very useful SQLite database manager.
An excellent eBook covering almost every aspect of SQLite3: a must-read for anyone doing serious work.
SQL tutorial (covers "generic" SQL, but most of it applies to SQLite as well)
A work-in-progress SQLite3 tutorial. Don't miss other LxyzTHW pages!
SQLite official website with full documentation (may be newer than the SQLite library that comes standard with AutoIt)

Share this post


Link to post
Share on other sites
alanstone

I'll explore your suggestions.

Thanks for your friendly help and

enlightening me about those

encoding thingies.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×