Sign in to follow this  
Followers 0
supergg02

convert *.doc to *.txt files

11 posts in this topic

#1 ·  Posted (edited)

Hi !

is there a way to convert doc files to txt files ?

The solution must be without using ms word because it use a lot of ram and cpu if there is a lot files to convert.

Thinks for your help

My goal is to make statistics about ponctuations, words, paragraphs in a given book

Edited by supergg02

Share this post


Link to post
Share on other sites



Hi !

is there a way to convert doc files to txt files ?

The solution must be without using ms word because it use a lot of ram and cpu if there is a lot files to convert.

Thinks for your help

My goal is to make statistics about ponctuations, words, paragraphs in a given book

Searching on google came up with this amongst many others.


Get Beta versions Here Get latest SciTE editor Here AutoIt 1-2-3 by Valuater - A great starting point.

Time you enjoyed wasting is not wasted time ......T.S. Elliot
Suspense is worse than disappointment................Robert Burns
God help the man who won't help himself, because no-one else will...........My Grandmother

Share this post


Link to post
Share on other sites

as i understand if you are in word and want to just save as a text file, word will prompt you saying " you will loose formatting"

thus your goal is defeated

8)


NEWHeader1.png

Share this post


Link to post
Share on other sites

as i understand if you are in word and want to just save as a text file, word will prompt you saying " you will loose formatting"

thus your goal is defeated

8)

No the doc files are generated automaticly by an other ocr software and i search a solution to convert them without opening them by word or other

Share this post


Link to post
Share on other sites

No the doc files are generated automaticly by an other ocr software and i search a solution to convert them without opening them by word or other

The solution that I gave you can be used in command line and therefore can easily be scripted.


Get Beta versions Here Get latest SciTE editor Here AutoIt 1-2-3 by Valuater - A great starting point.

Time you enjoyed wasting is not wasted time ......T.S. Elliot
Suspense is worse than disappointment................Robert Burns
God help the man who won't help himself, because no-one else will...........My Grandmother

Share this post


Link to post
Share on other sites

I think your missing the point... doc files include formatting

i tried

FileCopy("C:\Questions.doc", "C:\Questions.txt")

and the txt file is gibberish.... because of doc formatting

8)


NEWHeader1.png

Share this post


Link to post
Share on other sites

The solution that I gave you can be used in command line and therefore can easily be scripted.

thinks a lot ! ;) i will try it now....

Share this post


Link to post
Share on other sites

I found an old mail about this subject:

>wvWare might help you out. It's a library (the one used in Abiword)

>and a set of command-line tools for reading and converting MS Word

>documents. The URL is http://wvware.sourceforge.net/ . Good luck.

HTH


My UDFs and Tutorials:

Spoiler

UDFs:
Active Directory (NEW 2017-04-18 - Version 1.4.8.0) - Download - General Help & Support - Example Scripts - Wiki
OutlookEX (NEW 2017-02-27 - Version 1.3.1.0) - Download - General Help & Support - Example Scripts - Wiki
ExcelChart (2015-04-01 - Version 0.4.0.0) - Download - General Help & Support - Example Scripts
Excel - Example Scripts - Wiki
Word - Wiki
PowerPoint (2015-06-06 - Version 0.0.5.0) - Download - General Help & Support

Tutorials:
ADO - Wiki

 

Share this post


Link to post
Share on other sites

I think your missing the point... doc files include formatting

i tried

FileCopy("C:\Questions.doc", "C:\Questions.txt")

and the txt file is gibberish.... because of doc formatting

8)

If the doc is run through a convertor the formatting is striped and only the text remains. What you tried was just renaming the file.


Get Beta versions Here Get latest SciTE editor Here AutoIt 1-2-3 by Valuater - A great starting point.

Time you enjoyed wasting is not wasted time ......T.S. Elliot
Suspense is worse than disappointment................Robert Burns
God help the man who won't help himself, because no-one else will...........My Grandmother

Share this post


Link to post
Share on other sites

#10 ·  Posted (edited)

Searching on google came up with this amongst many others.

BigDod- I dl'd AntiWord for DOS and tested on 3 doc files. It works perfectly! Good call. (P.S. they "don't do Windows", but they do have a precompiled version for Windows if you want to spend a lot more time on it...

This is a sample of the output. It would let you count words, punctuation, paragraphs, etc.:

expires and you still need to use Outlook Web Access, refresh your browser

and log on again.

Supported browsers and operating systems

You can use Outlook Web Access with Microsoft Internet Explorer or Netscape

Navigator Web browsers from many UNIX, Apple Macintosh, or Microsoft

Windows-based computers. To use the complete set of features available with

Edited by jefhal

...by the way, it's pronounced: "JIF"... Bob Berry --- inventor of the GIF format

Share this post


Link to post
Share on other sites

#11 ·  Posted (edited)

If the doc is run through a convertor the formatting is striped and only the text remains. What you tried was just renaming the file.

thx.... i understand that

8)

Edited by Valuater

NEWHeader1.png

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now
Sign in to follow this  
Followers 0