If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > General > Applications & Tools > Batch utility to convert Word docs to Unicode or UTF8 text?

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 02-11-05, 07:07
swarin swarin is offline
Registered User
 
Join Date: Feb 2005
Posts: 1
Batch utility to convert Word docs to Unicode or UTF8 text?

Hi,

Does anyone know of a utility that can batch convert Word documents to Unicode or UTF8 format text files? It needs to iterate through sub-directories and be capable of naming output text files with a specified file extension other than ".txt".

For instance... if I had directories and files as below:

rootdir\
........subdir1\
................file1.doc
................file2.doc
........subdir2\
................file1.doc
................file2.doc

... I could point it at directory rootdir and it would produce (for example) Unicode text files with file extension ".uni" as below:

rootdir\
........subdir1\
................file1.doc
................file1.uni
................file2.doc
................file2.uni
........subdir2\
................file1.doc
................file1.uni
................file2.doc
................file2.uni

So far I have tried the following with no success:
1/ An old command line utility called DOC2TXT.EXE which creates files of the correct names, in the correct places, but unfortunately the option to output as Unicode text does not work (the files created are Western European encoded, not Unicode). Is there anyone out there that has got this to work? If so, please tell me how.
2/ "Convert Doc" by Softinterface.inc. This actually creates good Unicode files in the right places, but insists on naming them with a ".txt" extension. When I email Softinterface to ask about this they consistently ignore my emails, which is a pity really....
3/ "WordConverterEXE" also from Softinterface, appears to be a GUI front end wrapped around the old DOC2TXT.EXE mentioned above, so fails in the same way.
4/ Something called EZ-doc2txt (I think) which also did not create Unicode output.

Any suggestions anybody?

Thanks in advance.
Reply With Quote
  #2 (permalink)  
Old 02-11-05, 08:13
Pat Phelan Pat Phelan is offline
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
A Word (actually VBA) macro?

-PatP
Reply With Quote
  #3 (permalink)  
Old 02-22-05, 09:32
sco08y sco08y is offline
Registered User
 
Join Date: Oct 2002
Location: Baghdad, Iraq
Posts: 697
Try Antiword. Renaming the individual files will require some kind of batch file. I haven't done Windows batch files in ages, so you'll have to figure that out on your own.
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On