Chinese / Japanese Native Encoding to Unicode Converter FAQ

This table describes the contents of the archive: "cj2utf8.zip" which can be accessed from the CHGIS working papers downloads page.

The ZIP archive contains three parts:

1) converter.bat
2) cocochino folder
3) readme.txt

This application, which converts TEXT FILES (.TXT) from native Chinese or Japanese encodings to Unicode UTF-8, is written in Java and can be run on Windows, Unix, Linux. Thanks to Miho Nakanishi for writing this!

REQUIREMENTS: you must first install Java Runtime Environment for the correct operating system. You can download the latest version of JRE from this website: http://java.sun.com/j2se/1.3/jre/

Once you have installed the JRE, determine which directory it is located in. On Windows machines, this will normally be someplace like:

C:\Program Files\JavaSoft\JRE\1.3.1

Copy the file "converter.bat" AND the folder "cocochino" to the same directory (similar to the one shown above). in other words you should have the file and folder located as follows:
C:\Program Files\JavaSoft\JRE\1.3.1\converter.bat
C:\Program Files\JavaSoft\JRE\1.3.1\cocochino

HOW TO RUN THE PROGRAM:
Step Description
1 Open the COMMAND PROMPT
2 In the COMMAND PROMPT window, change directories to the folder where "converter.bat" is located
3 type "set java_home=." then hit ENTER
(don't type the quotes " ", the dot . sets java_home to the current directory)
4 type "set classpath=." then hit ENTER
5 type "converter" then hit ENTER
(this should launch the converter application. If it doesn't check over the above steps.)
6 Once the converter is running, place a COPY of the source file, which MUST BE TEXT (.txt) format, into the SAME FOLDER as "converter.bat" You should NEVER USE YOUR ONLY COPY OF THE SOURCE FILE HERE! Make a copy and put the copy in this folder.
7 Select the appropriate NATIVE ENCODING type from the drop-down list. You must know the original encoding for this to work.
8 Type the name of your INPUT FILE into the top box of the form.
9 Type a DIFFERENT NAME of your new OUTPUT FILE into the bottom box of the form. If you TYPE THE SAME FILE NAME IN THE BOTTOM BOX IT WILL OVERWRITE YOUR INPUT FILE AND YOU WILL NEVER BE ABLE TO RECOVER IT! Use a DIFFERENT NAME for the Output File.
10 Hit the GO button.
11 Look in the same folder as the converter.bat file and you will find the new OUTPUT File, which will now be in UTF-8. (Note: both the Tan Atlas Index and CHGIS 1820 Placenames search engines were converted to UTF-8 using this application.

Sources: