pTeX and Japanese typesetting

Overview

pTeX (formerly known as ASCII Nihongo TeX) is a 16-bit extension of TeX developed by ASCII Corporation (a big Japanese publisher). It is designed for high-quality Japanese book publishing (the p of pTeX stands for publishing).

pTeX doesn't pass the TRIP TeX compatibility test, but for practical purposes it is upper-compatible with TeX 3.x, with the ability to typeset Japanese (JIS X 0208) characters encoded in either JIS (ISO-2022-JP), Shift-JIS, or EUC-JP. With the help of the OTF package developed by Shuzaburo Saito, it can typeset all CJK characters in OpenType format.

Japanese characters

Most Japanese characters are designed on a square canvas. The width of a square canvas of the current font is denoted by 1zw (for zenkaku width -- zenkaku stands for full-width). zw is a new length unit introduced by pTeX. There is another length unit -- zh (for zenkaku height) -- but its use is deprecated.

For horizontal (left-to-right) typesetting mode, the baseline of a Japanese character usually divides the square canvas in 0.12 (lower) : 0.88 (upper).

Some characters (like punctuations and parentheses) are designed on a half-width canvas: its width is 0.5zw. For ease of implementation, however, actual glyphs may be designed on square canvases. We can use virtual font mechanism to map the logical shape and the actual implementation.

JFM

pTeX supports both traditional TFM (TeX Font Metric) and new JFM (Japanese Font Metric) files. JFM (filename extension is still .tfm) can handle thousands of characters in groups.

Traditionally, pTeX used min*.tfm for mincho (serif) kanji fonts and goth*.tfm for gothic (sans serif) kanji fonts, where `*' stands for either 5, 6, 7, 8, 9, or 10. However, these JFMs turned out to be not quite suited for many Japanese fonts now in use.

I recommend jis.tfm, developed by Hajime Kobayashi of Tokyo Shoseki Printing, based on Japan Industrial Standard JIS X 4051:1995, ``Line composition rules for Japanese documents'' -- 日本語文書の行組版方法 (you can order the standard from Japanese Standards Association -- 日本規格協会). A somewhat modified version of jis.tfm is distributed by Shuzaburo Saito with his UTF and OTF packages. Since Japanese font metrics do not depend on serifs, jis.tfm and a copy of it, say jisg.tfm, may be used for mincho and gothic fonts.

jis.tfm

jis.tfm is based on JIS X 4051, but somewhat simplified for use with pTeX.

It divides Japanese characters in six classes:

Class 1
Left parens: ‘“(〔[{〈《「『【
They are half width. They may be designed on square canvases flush right. In that case we ignore the left half and pretend they are half-width, e.g. \hbox to 0.5zw{\hss 「}. If a class-1 character is followed by a class-3 character, then a \hskip 0.25zw minus 0.25zw is inserted in between.
Class 2
Right parens: 、,’”)】]}〉》」』】
Half width, may be designed flush left on square canvases. If a class-2 character is followed by a class-0, -1, or -5 character, then a \hskip 0.5zw minus 0.5zw is inserted in between. If a class-2 character is followed by a class-3 character, then a \hskip 0.25zw minus 0.25zw is inserted in between.
Class 3
Centered points: ・:;
Half width, may be designed centered on square canvases. If a class-3 character is followed by a class-0, -1, -2, -4, or -5 character, then a \hskip 0.25zw minus 0.25zw is inserted in between. If a class-3 character is followed by a class-3 character, then a \hskip 0.5zw minus 0.25zw is inserted in between.
Class 4
Periods: 。.
Half width, may be designed flush left on square canvases. If a class-4 character is followed by a class-0, -1, or -5 character, then a \hskip 0.5zw is inserted in between. If a class-4 character is followed by a class-3 character, then a \hskip 0.75zw minus 0.25zw is inserted in between.
Class 5
Leaders: ―…‥
Full width. If a class-5 character is followed by a class-1 character, then a \hskip 0.5zw minus 0.5zw is inserted in between. If a class-5 character is followed by a class-3 character, then a \hskip 0.25zw minus 0.25zw is inserted in between. If a class-5 character is followed by a class-5 character, then a \kern 0zw is inserted in between.
Class 0
Everything else.
Full width. If a class-0 character is followed by a class-1 character, then a \hskip 0.5zw minus 0.5zw is inserted in between. If a class-0 character is followed by a class-3 character, then a \hskip 0.25zw minus 0.25zw is inserted in between.

A variant of jis.tfm that comes with utf and otf packages classifies ? and ! as class 6: If a class-6 character is followed by a class-0 or -1 character, then a \hskip 0.5zw minus 0.5zw is inserted in between. If a class-6 character is followed by a class-3 character, then a \hskip 0.25zw minus 0.25zw is inserted in between. If a class-6 character is followed by a class-0 character in vertical writing, then a \hskip 1zw minus 0.5zw is inserted in between.

Automatically inserted glues

Where the JFM doesn't put a glue/kern, or if such a glue/kern is inhibited by a pTeX primitive \inhibitglue, then pTeX automatically inserts \hskip\kanjiskip between Japanese characters. pTeX sets \kanjiskip to 0pt plus .4pt minus .5pt, but I tend to make it more stretchable than shrinkable, say 0zw plus .1zw minus .01zw.

Likewise, pTeX inserts \hskip\xkanjiskip between Japanese and latin printable characters, except after latin left parens and quotes, and before latin right parens, quotes, and punctuations, where line breaks cannot occur. These \xkanjiskip insertion rules can be controled by pTeX primitives \inhibitxspcode and \xspcode. Traditional typesetting inserted a quarter of a fullwidth (zw), but recent practices favor around 15% of a fullwidth, or even zero (especially for magazines). pTeX sets \xkanjiskip to .25zw plus/minus 1pt. Another possible value which I often employ is 0.25em plus 0.15em minus 0.06em, equivalent to a latin space character; this makes ``第1章'' and ``第 1 章'' equivalent.

\xspcode`-=0  % Don't insert \xkanjiskip before/after hyphen
\xspcode`(=1  % Insert \xkanjiskip before latin left paren
\xspcode`)=2  % Insert \xkanjiskip after latin right paren
% default \xspcode is 3: insert \xkanjiskip on both sides
% For Japanese, use \inhibitxspcode.
\inhibitxspcode`〒=2  % Don't insert \xkanjiskip after 〒

Line-breaking penalties

Line breaks cannot occur after ¥, £, and class-1 characters, and before ?, !, °, ′, ″, %, ‰, and class-2, -3, -4 characters. Line breaks may be discouraged or prohibited before ヽ, ヾ, ゝ, ゞ, 々, ー, and small letters such as ぁ, ぃ, ぅ.

These line-breaking demerits can be controled by pTeX primitives \prebreakpenalty and \postbreakpenalty:

\postbreakpenalty`(=10000  % Never break after Japanese left paren
\prebreakpenalty`.=10000    % Never break before latin period
\prebreakpenalty`っ=150     % Discourage break before small っ

Another penalty, \jcharwidowpenalty, is inserted just before the last normal (i.e. other than punctuations, parens, quotes) Japanese letter of the paragraph to discourage a one-letter line.

See the Line Breaking Parameters page for more detail.

End-of-line rule

Latin spaces (0x20) can be inserted between Japanese letters to make extra spaces. But the usual TeX rule that identifies an end-of-line (e.g. 0x0a) with a space is not suitable for Japanese, because spurious spaces occur at such points.

Outside of verbatim environments, pTeX ignores an end-of-line after a Japanese letter. In fact, pTeX ignores trailing latin spaces (and tabs) after a Japanese letter, too. For example, [J][SPACE][SPACE][EOL][SPACE][SPACE][anything] is equivalent to [J][anything], where [J] is a Japanese letter. (Spaces at the beginning of a physical line is ignored by TeX; e.g. foo%[EOL][SPACE][SPACE]bar is equivalent to foobar, except in verbatim environments.) Note that [J][SPACE][anything] is not equivalent to [J][anything].


Haruhiko Okumura

Last modified: 2004-01-04 10:23:33