\documentclass[fleqn]{article}

\newcommand{\OMEGA}{$\Omega$}
\newcommand{\mymathtt}[1]{\mbox{\texttt{#1}}}
\newcommand{\mymathit}[1]{\mbox{\emph{#1}}}

\begin{document}
\title{Draft documentation for the \OMEGA\ system}
\author{John Plaice\thanks{D\'epartement d'informatique,
Universit\'e Laval, Ste-Foy (Qu\'ebec) Canada~G1K~7P4. 
\texttt{John.Plaice@ift.ulaval.ca}}
\and Yannis Haralambous\thanks{187,~rue Nationale, F-59000~Lille, France.
\texttt{Yannis.Haralambous@univ-lille1.fr}}}
\date{26~February~1995}
\maketitle

\section{Introduction}

This document is version~0.0 of the documentation for the
\OMEGA~typesetting system, designed and developed by
the authors.  This draft document accompanies the 1.1~release
of~\OMEGA, which is available at
\begin{verbatim}
     ftp://ftp.ift.ulaval.ca/cours/omega
\end{verbatim}
or at
\begin{verbatim}
     ftp://ftp.ens.fr/pub/tex/yannis/omega
\end{verbatim}
This documentation should be considered cursory, the bare minimum for
those who wish to do $\alpha$- and $\beta$-testing.

\section{Sixteen-bit fonts, registers, etc.}

One of the fundamental limitations of \TeX3 is that most quantities
can only range between 0~and~255.  Fonts are limited to 256~characters
each, only 256~fonts are allowed simultaneously, only 256~of any given
kind of register can be used simultaneously, etc.  \OMEGA\ loosens these 
restrictions, allowing up to 65~536~of each of these entities to be used.  

\subsection{Changes to \TeX\ to produce \OMEGA}

\paragraph{Characters.}
Each font can allow up to 65~536 characters, ranging between 0~and 65~535.
Unless other means are provided, using \OMEGA\ Translation Processes
(see section~\ref{otp}), the input and output mechanisms for
characters between 256 (hex~\texttt{100}) and 65~535~(hex~\texttt{ffff})
use four circumflexes.  For example, \verb|^^^^cab0| means hex 
value~\verb|cab0| and \verb|^^^^0020| is the space character.

\paragraph{Fonts.}
The number of possible fonts can be changed through the
compile-time constant \texttt{NUMBERofFONTS}, which must lie
between 256~and~65536.  One can access fonts numbering between
0~and~$\mathtt{NUMBERofFONTS}-1$.

\paragraph{Registers.}
The number of posible registers of each kind can be changed through
the compile-time constant \texttt{NUMBERofREGISTERS}, which must also 
range between 256~and~65536.  For each kind of register, one can 
access fonts numbering between 0~and~$\mathtt{NUMBERofREGISTERS}-1$.

\paragraph{Font metric files.}

The \verb|.tfm| files used by \TeX3 only allow 256~characters each.
Like \TeX, \OMEGA\ uses \verb|.tfm| files, but it also uses
\emph{extended font metric} (\verb|.xfm|) files, which are
generalizations of \verb|.tfm| files for fonts of up to
65~536~characters each.

The description below focuses on the differences between \verb|.tfm|
files and \verb|.xfm| files.  The standard definition of \verb|.tfm|
files is in the second volume of Knuth's \emph{Computers and
Typesetting} series.

The first 52 bytes (13 words) of an \verb|.xfm| file contain thirteen
32-bit integers that give the lengths of the various subsequent
portions of the file.  These thirteen integers are, in order:

\begin{tabular}{ll}
$0$ &empty word to designate \verb|.xfm| file;\\
\emph{lf}&length of the entire file, in words;\\
\emph{lh}&length of the header data, in words;\\
\emph{bc}&smallest character code in the font;\\
\emph{ec}&largest character code in the font;\\
\emph{nw}&number of words in the width table;\\
\emph{nh}&number of words in the height table;\\
\emph{nd}&number of words in the depth table;\\
\emph{ni}&number of words in the italic correction table;\\
\emph{nl}&number of words in the lig/kern table;\\
\emph{nk}&number of words in the kern table;\\
\emph{ne}&number of words in the extensible character table;\\
\emph{np}&number of font parameter words.\\
\end{tabular}

The first word is~0 (future versions of
\verb|.xfm| files could have different values;  what is important is that
the first two bytes be~0 to differentiate \verb|.tfm| and \verb|.xfm| files).
The next twelve integers are as above, all non-negative and less
than~$2^{31}$.  The inequality $\mathit{bc}-1\leq\mathit{ec}\leq65535$
must hold, as must the equality
\[\mathit{lf}=13+
\mathit{lh}+
2(\mathit{ec}\!-\!\mathit{bc}\!+\!1)+
\mathit{nw}+
\mathit{nh}+
\mathit{nd}+
\mathit{ni}+
\mathit{nl}+
\mathit{nk}+
\mathit{ne}+
\mathit{np}.\]
Note that an \verb|.xfm| font may contain as many as 65~536 characters 
(if $\mathit{bc}=0$ and $\mathit{ec}=65535$), and as few as 0~characters 
(if $\mathit{bc}=\mathit{ec}+1$).

The rest of the \verb|.xfm| file is, like in \verb|.tfm| files, a
sequence of ten data arrays.  Three of the arrays are different:
\emph{char\_info}, \emph{lig\_kern} and \emph{exten}.

The \emph{char\_info} array contains one \emph{char\_info\_word} entry
per character.  Each \emph{char\_info\_word} in an \verb|.xfm| file
takes 2~words (8~octets), packed as follows:

\begin{description}
\item[octets 0--1:] \emph{width\_index} (16~bits);
\item[octet 2:]  \emph{height\_index} (8~bits);
\item[octet 3:] \emph{depth\_index} (8~bits);
\item[octets 4--5:]
\emph{italic\_index} (14 bits) times 4, plus \emph{tag} (2~bits);
\item[octets 6--7:] \emph{remainder} (16 bits).
\end{description}

Therefore the \verb|.xfm| format imposes a limit of 256~different heights,
256~different depths, and 16~384~different italic corrections.

The \emph{lig\_kern} array consists of a sequence of
\emph{lig\_kern\_command} entries.  Each \emph{lig\_kern\_command}
in an \verb|.xfm| file takes 2~words (8~octets), packed as follows:

\begin{description}
\item[octets 0--1:] \emph{skip\_byte}, indicates that this is the final
program step if the byte is 128 or more, otherwise the next step is obtained
by skipping this number of intervening steps.
\item[octets 2--3:] \emph{next\_char}, ``if \emph{next\_char}
follows the current character, then perform the operation and stop,
otherwise continue.''
\item[octets 4--5:] \emph{op\_byte}, indicates a ligature step if less
than~128, a kern step otherwise.
\item[octets 6--7:] \emph{remainder}.
\end{description}

For \verb|.tfm| files, if the very first instruction of a
character's \emph{lig\_kern} program has $\mathit{skip\_byte}>128$, 
the program actually begins in location
$256*\mathit{op\_byte}+\mathit{remainder}$.
This feature allows access to large \emph{lig\_kern} arrays,
because the first instruction must otherwise appear in a location $\leq255$.
For \verb|.xfm| files, the latter value is $\leq65535$.

Extensible characters are specified by an \emph{extensible\_recipe},
which consists of four 2-octet words called \emph{top}, \emph{mid},
\emph{bot}, and \emph{rep} (in this order). These bytes are the
character codes of individual pieces used to build up a large symbol.
If \emph{top}, \emph{mid}, or \emph{bot} are zero, they are not
present in the built-up result. For example, an extensible vertical
line is like an extensible bracket, except that the top and bottom
pieces are missing.

\paragraph{Font offsets.}

When switching from one alphabet to another in Unicode, one passes
from one Unicode page to another.  However, the corresponding fonts will
normally all be numbered from~0.  To deal with this situation, a 
new keyword, \texttt{offset}, is introduced.  In the \verb|\font|
command, $\mathtt{offset}\;n$ states that character~$c$ in the 
font is referred to in \OMEGA\ by $n+c$.  For example, 
\begin{verbatim}
     \font\ARfont=oar10 scaled 1728 offset 256 %% an X-font
\end{verbatim}
states that the font \texttt{oar10} is to be loaded, using a scaling
factor of~1728, and that character~$c$ in the font will be referred to
in \OMEGA\ as $c+256$ or, equivalently, that character~$C$ in 
\OMEGA\ refers to character $C-256$ in the font.

\paragraph{Implementation.}

The implementation of the changes presented in this section can
be found in file \texttt{om16bit.ch}.  The only known problem is 
that the current implementation creates very large formats.  This 
can be alleviated by reducing the \texttt{NUMBERofFONTS} and
\texttt{NUMBERofREGISTERS} compile-time constants.

\subsection{Changes to \texttt{vptovf} to produce \texttt{xvptoxvf}}

For the moment, \textsc{metafont} continues to produce 8-bit fonts.
Given that most \texttt{.dvi} drivers can only handle 8-bit fonts,
we took the soft approach of providing the means to develop 16-bit
\emph{virtual} fonts that use 8-bit \emph{real} fonts.  To do this,
two new file formats, in addition to the \texttt{.xfm} files, had to
be introduced: extended virtual property (\texttt{.xvp}) files and
extended virtual font (\texttt{.xvf}) files.

\paragraph{Extended virtual property files.}

The \texttt{.xvp} files are the same as \texttt{.vpl} files, except that 
characters are no longer limited to 8~bits, but to 16~bits.

\paragraph{Extended virtual font files.}

The \texttt{.vf} file format already supports fonts with large
numbers of characters.  However, not all drivers that read
\texttt{.vf} files properly support large fonts.  Therefore, the
files generated from \texttt{.xvp} files are labeled \texttt{.xvf}
rather than~\texttt{.vf}.

\paragraph{The \texttt{xvptoxvf} program.}

The \texttt{vptovf} program reads in a virtual property
(\texttt{.vpl}) file and generates a font metric (\texttt{.tfm})
file and a virtual font (\texttt{.vf}) file.  In so doing, it reads
in the \texttt{.tfm} files of all the fonts that it uses.

The \texttt{xvptoxvf} program is the extended version of
\texttt{vptovf}.  It reads in an extended virtual property
(\texttt{.xvp}) file and generates an extended font metric (\texttt{.xfm})
file and an extended virtual font (\texttt{.xvf}) file.  In so doing, it reads
in the \texttt{.tfm} and \texttt{.xfm} files of all the fonts that it uses.

\paragraph{Implementation.}

The changes to the \texttt{vptovf} program are all in \texttt{xvpvf.ed}
in directory \texttt{fontutil}.  There are currently no known
problems.

\subsection{Changes to \texttt{dvicopy} to produce \texttt{xdvicopy}}

The \texttt{dvicopy} program is used to \emph{de-virtualize} a
\texttt{.dvi} file, it reads in a \texttt{.dvi} file, and replaces all
references to virtual fonts with references to the appropriate real fonts.

The \texttt{xdvicopy} program does the same as \texttt{dvicopy}, except
that it is also capable of reading \texttt{.xvf} and \texttt{.xfm} files.

\paragraph{Implementation.}

The changes to the \texttt{dvicopy} program are all in \texttt{xdvicp.ed}
in directory \texttt{dviutil}.  The current implementation is not
optimal, in that it requires access to all \texttt{.tfm} files referred
to in a virtual file, even if no characters in a \texttt{.tfm} file
are needed to print a document.  A demand-driven mechanism would work
better and save users's disk space.

\section{Bi-directional typesetting}

\OMEGA\ currently includes Peter Breitenlohner's TeX--XeT, which is a
modified form of Knuth and Mackay's TeX-XeT.  There are two
primitives (\verb|\beginR| and \verb|\endR|) to bracket right-to-left
text in left-to-right text and two primitives (\verb|\beginL| and
\verb|\endL|) to bracket left-to-right text in right-to-left text.
See Knuth and Mackay's paper for more details.

These primitives were essentially created for inserting bits of
right-to-left texts into left-to-right documents.  They are not
really suitable for real mixed-direction typesetting.  This topic
is still under research, as is mixing horizontal and vertical 
typesetting.

\paragraph{Implementation.}

The implementation of the changes presented in this section can
be found in file \texttt{tex--xet1415.ch}.  There is one known
problem, which also shows up if \TeX\ has been modified with the change
file.  The following file will cause the system to hang:
\begin{verbatim}
     \documentclass{article}
     \begin{document}
     \tableofcontents
     \section{Ceci: ``\beginR titre\endR'' est \`a l'envers}
     \end{document}
\end{verbatim}
The problem seems to be the interaction between the \verb|``| ligature
and the \verb|\beginR| primitive.

\section{Character dimensions}

To simplify the acrobatics necessary for diacritic placement for
certain alphabets, four new primitives (\verb|\charwd|, \verb|\chardp|,
\verb|\charht|, and \verb|\charit|) are provided.  When followed by
a integer designating a character, they respectively provide the
width, the depth, the height and the italic correction of the
character.  For example,
\begin{verbatim}
     \charwd120
\end{verbatim}
can be considered to be an abbreviation of
\begin{verbatim}
     \setbox250=\hbox{P}\wd250
\end{verbatim}
but without the side effect of creating a box and putting something inside it.

\paragraph{Implementation.}

These changes are implemented in the \texttt{omchar.ch} file.\\
There are currently no known problems.

\section{New infinity}

To allow for inter-letter stretching in calligraphic scripts, such as
Arabic, without having to rewrite macro packages, a new infinity
level, \texttt{fi} has been added.  It is smaller than \texttt{fil}
but bigger than any finite quantity.  There is therefore a new
keyword, \texttt{fi} and there are two new primitives, \verb|\hfi| and
\verb|vfi|.

\paragraph{Implementation.}

These changes are implemented in the \texttt{omfi.ch} file.  There are
currently no known problems.

\section{\OMEGA\ Translation Processes}
\label{otp}

The changes described above are very useful, and allow the resolution
of several problems.  However, they do not radically alter the
structure of \TeX.  This is not the case for the \OMEGA\ Translation
Processes, which allow text to be passed through any number of finite
state automata, in order to impose the required effects.  

These processes are necessary for translating one character set to
another.  They are also used to choose the various forms of letters in
Arabic, or to create consonental clusters in Khmer, or to rearrange
letter order in Indic scripts.  They could also offer alternative
means of changing texts to upper or lower case or to hyphenate texts.

Each translation process is placed in a file with the suffix \verb|.otp|.
Its syntax is similar but not identical to a \texttt{lex} or
\texttt{flex} file on Unix.  Examples of translation processes can 
be found in the \texttt{otpexs} directory of the \OMEGA\ distribution.

An \verb|.otp| file defines a finite state automaton that transforms
an input character stream into an output character stream.  
It consists of six parts:

\begin{tabular}{l}
\emph{Input}\\
\emph{Output}\\
\emph{Tables}\\
\emph{States}\\
\emph{Aliases}\\
\emph{Expressions}\\
\end{tabular}

\noindent
where the \emph{Expressions} actually state what translations take
place and in what situation.

In what follows, $n$ refers to a positive integer between 0~and
65~535. It can be given in decimal form, octal form (preceded by
\texttt{@'}) or hexadecimal form (preceded by \texttt{@"}).
Hexadecimal numbers can use both minuscule and majuscule letters
to express the digits~\emph{a--f}.  Numbers can also be given in
character form:  a printable \textsc{ascii} character, when placed
inside a pair of quotes, generates the \textsc{ascii} code for that
character. For example, \verb|`a'| is equivalent to~\verb|@"61|.

The \emph{Input} part states how many octets are in each input character.
If the section is empty, then the default value is~2, since we hope 
that Unicode will become the standard means of communication in the future.
If the section is not empty, it must be of the form
\[ \mymathtt{input:}\;\mymathit{in}\mymathtt{;} \]
where \emph{in} states how many octets are in each input character.

The \emph{Output} part states how many octets are in each output character.
If the section is empty, then the default value is~2, since we hope 
that Unicode will become the standard means of communication in the future.
If the section is not empty, it must be of the form
\[ \mymathtt{output:}\;\mymathit{out}\mymathtt{;} \]
where \emph{out} states how many octets are in each output character.

The \emph{Tables} part is used for defining tables that will be
referred to later in the expressions.  Often, translations from one
character set to another are most efficiently presented through table
lookup.  This section can be empty, in which case no tables have been
defined.  If it is not empty, it is of the form 
\[ \mymathtt{tables:}\; \mymathit{table}^+ \]
where each \emph{table} is of the form
\[ \mymathit{id}\mymathtt{[}n\mymathtt{]=\{}n^+\mymathtt{\};} \]
where the numbers in $n^+$ are comma-separated.

The \emph{States} part is used to separate out the expressions.  Not
all expressions will necessarily be applicable in all situations.
To do this, the user can name states and identify expressions with
state names, in order to express what expressions apply when.
This section can be empty, in which case there is only one state.  If
it is not empty, it is of the form
\[ \mymathtt{states:}\; \mymathit{id}^+\mymathtt{;} \]
where the identifiers in $\mymathit{id}^+$ are comma-separated.

The \emph{Aliases} part is used to simplify the definition of the left
hand sides of the expressions.  Each expression consists of a 
left-hand side, in the form of a simplified regular expression, and of a
right-hand side, which states what should be done with a recognized
string.  To simplify the definitions of the left-hand sides,
aliases can be used. This section can be empty, in which case there
are no aliases.  If it is not empty, it is of the form
\[ \mymathtt{aliases:}\; \mymathit{alias}^+ \]
where each \emph{alias} is of the form
\[ \mymathit{id}\;\mymathtt{=}\;\mymathit{left}\mymathtt{;}\]
and \emph{left} is defined below.

The \emph{Expressions} part is the very reason for an \verb|.otp|
file.  It states what translations must take place, and when.  It
cannot be empty, and its syntax is
\[ \mymathtt{expressions:}\; \mymathit{expr}^+ \]
Each \emph{expr} is of the form
\[
   \mymathit{leftState}\; \mymathit{totalLeft}\; 
   \mymathit{right} \; \mymathit{pushBack} \; \mymathit{rightState} \mymathtt{;}
\]
where \emph{leftState} defines the state for which this expression is
applicable, \emph{totalLeft} defines the left-hand-side regular
expression, \emph{right} defines the characters to be output,
\emph{pushBack} states what characters must be added to the input
stream and \emph{rightState} gives the new state.

Intuitively, if the automaton is in macro-state \emph{leftState} and
the regular expression \emph{totalLeft} corresponds to a prefix of the current
input stream, then (1)~the input stream is advanced to the end of the recognized
prefix, (2)~the characters generated by the \emph{right}
expression are put onto the output stream, (3)~the characters
generated by the \emph{pushBack} stream are placed at the beginning
of the input stream and (4)~the system changes to the macro-state
defined by \emph{rightState}.

The \emph{leftState} field can be empty.  If it is not, its syntax is
\[ \mymathtt{<} \mymathit{id} \mymathtt{>} \]

The syntax for \emph{totalLeft} is
\[ \mymathtt{beg:}? \; \mymathit{left}^+ \; \mymathtt{end:}? \]
The \texttt{beg:}, if present, will only match the string if it is
at the beginning of the input.  The \texttt{end:}, if present, will
only match the string if it is at the end of the input.

The syntax for \emph{left} is given by
\begin{eqnarray*}
\mymathit{left} & ::= & n\\
& \mid & n\mymathtt{-}n\\
& \mid & \mymathtt{.}\\
& \mid & \mymathtt{(}\mymathit{left}^+\mymathtt{)}\\
& \mid & \mymathtt{\char94(}\mymathit{left}^+\mymathtt{)}\\
& \mid & \{\mymathit{id}\}\\
& \mid & \mymathit{left}\mymathtt{<}n\mymathtt{,}n?\mymathtt{>}\\
\end{eqnarray*}
where the $\mymathit{left}^+$ means a series of \emph{left} separated
by vertical bars.  Therefore, $n$ means a single number, $n\mymathtt{-}n$ is a
range, $\mymathtt{.}$~is a wildcard character, 
$\mymathtt{(}\mymathit{left}^+\mymathtt{)}$ is a choice,
$\mymathtt{\char94(}\mymathit{left}^+\mymathtt{)}$ is the negation of a choice,
$\mymathtt{\{}\mymathit{id}\mymathtt{\}}$ is the use of an alias and
$\mymathit{left}\mymathtt{<}n\mymathtt{,}n?\mymathtt{>}$
means between $n$~and $n'$~occurrences of \emph{left}.  Should there
be no~$n'$, then the expression means at least $n$~occurrences.

The syntax for \emph{right} is
\[ \mymathtt{=>}\; \mymathit{stringExpr}^+ \]
while that for \emph{pushBack}, if it is not empty, is
\[ \mymathtt{<=}\; \mymathit{stringExpr}^+ \]
The \emph{right} expression corresponds to the characters that are to
be output.  The \emph{pushBack} expression corresponds to the
characters that are put back onto the input stream.

A \emph{stringExpr} defines a string of characters, using the
characters in the recognized input stream as arguments.  It is of the form

\begin{tabular}{ll}
& $s$\\
$\mid$ & $n$\\
$\mid$ & \verb|\|$n$\\
$\mid$ & \verb|\$|\\
$\mid$ & \verb|\($-|$n$\verb|)|\\
$\mid$ & \verb|\*|\\
$\mid$ & \verb|\(*-|$n$\verb|)|\\
$\mid$ & \verb|\(*+|$n$\verb|)|\\
$\mid$ & \verb|\(*+|$n$\verb|-|$n'$\verb|)|\\
$\mid$ & \verb|#|\emph{arithExpr}\\
\end{tabular}

\noindent
where $s$~is an \textsc{ascii} character string enclosed in double
quotation marks.
The \verb|\|$n$ means the $n$-th character (starting from 1)
in the recognized prefix; the \verb|\$| means the last character in the 
prefix; \verb|\($-|$n$\verb|)| the $n$-th, counting from the end.
The \verb|\*| means the entire recognized prefix;
\verb|\(*-|$n$\verb|)| the prefix without the last $n$~characters;
\verb|\(*+|$n$\verb|)| without the first $n$~characters;
\verb|\(*+|$n$\verb|-|$n'$\verb|)| removes the first~$n$ and last~$n'$
characters.

For example, Indic scripts are encoded with vowels at the end of a
syllable, but the vowel is actually printed first on the page.  Up
to six consonants can precede a vowel, yielding the following
transliteration:
\begin{verbatim}
     {consonant}<1,6> {vowel}  =>  \$ \(*-1);
\end{verbatim}

The \emph{arithExpr} entry allows for calculations to actually be
effected on the characters in the prefix. Their syntax is as follows:

\begin{tabular}{ll}
 & $n$\\
$\mid$ & \verb|\|$n$\\
$\mid$ & \verb|\$|\\
$\mid$ & \verb|\($-|$n$\verb|)|\\
$\mid$ & \emph{arithExpr}\verb| + |\emph{arithExpr}\\
$\mid$ & \emph{arithExpr}\verb| - |\emph{arithExpr}\\
$\mid$ & \emph{arithExpr}\verb| * |\emph{arithExpr}\\
$\mid$ & \emph{arithExpr}\verb| div: |\emph{arithExpr}\\
$\mid$ & \emph{arithExpr}\verb| mod: |\emph{arithExpr}\\
$\mid$ & \emph{id}\verb|[|\emph{arithExpr}\verb|]|\\
$\mid$ & \verb|(|\emph{arithExpr}\verb|)|\\
\end{tabular}

\noindent
where \emph{id}\verb|[|\emph{arithExpr}\verb|]| means a table lookup:
the \emph{id} must be a table defined in the \emph{Tables} section.
The other operations should be clear.

The following example shows the use of tables.
\label{gb:unicode}
\begin{verbatim}
% File inbig5.otp
% Conversion to Unicode from Chinese Big 5 (HKU)
% Copyright (c) 1995 John Plaice and Yannis Haralambous
% This file is part of the Omega project.
%
% This file was derived from data in the tcs program
% ftp://plan9.att.com/plan9/unixsrc/tcs.shar.Z, 16 November 1994
%

input:  1;
output: 2;

tables:

in_big5_a1[@"9d] = {
@"20,   @"2c,   @"2ce,  @"2e,   @"2219, @"2219, @"3b,   @"3a,
...
@"2199, @"2198, @"2225, @"2223, @"2215
};

in_big5[@"3695] = {
@"3000, @"ff0c, @"3001, @"3002, @"ff0e, @"30fb, @"ff1b, @"ff1a,
...
@"fffd, @"fffd, @"fffd, @"fffd, @"fffd
};

expressions:

@"1a                    => @"0a;
@"00-@"a0               => \1;
@"a1(@"40-@"7e)         => #(in_big5_a1[\2-@"40]);
@"a1(@"a1-@"fe)         => #(in_big5_a1[\2-@"62]);
(@"a2-@"fe)(@"40-@"7e)  => #(in_big5[(\1-@"a2)*@"9d + \2-@"40]);
(@"a2-@"fe)(@"a1-@"fe)  => #(in_big5[(\1-@"a2)*@"9d + \2-@"62]);
. .                     => @"fffd;
\end{verbatim}

In the future, more operations may well be added.  Research is still
under way for such things as providing means for defining functions, 
local variables, error handling and other functionality.

The \emph{pushBack} part, which serves to put characters back onto the
input stream, uses the same syntax as the \emph{right} part.  When
characters are placed back onto the input stream, they will be looked
at upon the next iteration of the automaton.

Finally, the \emph{rightState} can be empty or one of the following
three forms:

\begin{tabular}{ll}
& \verb|<|\emph{id}\verb|>|\\
$\mid$ & \verb|<push: |\emph{id}\verb|>|\\
$\mid$ & \verb|<pop:>|\\
\end{tabular}

\noindent
If it is empty, the automaton stays in the same state.
If it is of the form \verb|<|\emph{id}\verb|>|, then the automaton
changes to state~\emph{id}. The \verb|<push: |\emph{id}\verb|>|
means change to state~\emph{id}, but remembering the current state.
The \verb|<pop:>| means return to the previously saved state.

There are a number of example \texttt{.otp} files in the
\texttt{otpexs} directory in the \OMEGA~distribution.
Most of them serve to convert national character sets to Unicode
and back.

\section{Compiled Translation Processes.}

\OMEGA\ does not know anything about \OMEGA\ Translation Processes.
It actually reads a compiled form of these filters, known as
Compiled Translation Processes (file suffix \texttt{.ctp}).
Essentially, the CTPs can be considered to be portable assembler
programs, and \OMEGA\ includes an interpreter for the generated
instructions.

The command for reading in a CTP file is similar to a font
declaration.  The example
\begin{verbatim}
     \ctp\TexUni=TeXArabicToUnicode
\end{verbatim}
means that the file \verb|TeXArabicToUnicode.ctp| is read 
in by~\OMEGA\ and that internally the translation process is
referred to as \verb|\TeXUni|.

The CTPs consist of a sequence of 4-octet words.  The first seven
words have the following form:

\begin{tabular}{ll}
\emph{lf}&length of the entire file, in words;\\
\emph{in}&number of octets in an input character;\\
\emph{ot}&number of octets in an output character;\\
\emph{nt}&number of tables;\\
\emph{lt}&number of words allocated for tables;\\
\emph{ns}&number of states;\\
\emph{ls}&number of words allocated for states;\\
\end{tabular}

\noindent
The header words are followed by four arrays:
\begin{eqnarray*}
\mathit{table\_length} & : & 
   \mathbf{array} \; [0..\mathit{nt}-1] \; \mathbf{of} \; \mathit{word}\\
\mathit{tables} & : & 
   \mathbf{array} \; [0..\mathit{lt}-1] \; \mathbf{of} \; \mathit{word}\\
\mathit{state\_length} & : & 
   \mathbf{array} \; [0..\mathit{ns}-1] \; \mathbf{of} \; \mathit{word}\\
\mathit{tables} & : & 
   \mathbf{array} \; [0..\mathit{ls}-1] \; \mathbf{of} \; \mathit{word}
\end{eqnarray*}

The \emph{table\_length} array states how many words are used for each
of the tables in the~CTP.  For the GB~$\rightarrow$~Unicode example on
page~\pageref{gb:unicode}, the \emph{table\_length} would have two
entries: hex values \texttt{9d} and~\texttt{3695}.

The \emph{tables} array is simply the concatenation of the tables in
the OTP file.

The \emph{state\_length} array states how many words are used for each
of the states in the~CTP.  For the GB~$\rightarrow$~Unicode example on
page~\pageref{gb:unicode}, the \emph{state\_length} would have one
entry.

The \emph{states} array is simply the concatenation of the sequence of
instructions for each state in the OTP file.  Each instruction takes
one or two 4-octet words.  Zero- and one-argument instructions use
one word. If the instruction consists of one word,
then the actual instruction is in the first two octets and the
argument is in the last two octets.  If the instruction consists of
two words, then the actual instruction is in the first two octets,
the first argument is in the next two octets and the last argument is
in the last two octets.  The instructions are as follows:

\begin{tabbing}
\makebox[1cm][r]{99} \= \quad \verb|OTP_GOTO_NO_ADVANCE| \= \quad 2 arguments\kill
\makebox[1cm][r]{1} \> \quad \verb|OTP_RIGHT_OUTPUT| \> \quad 0 arguments\\
\makebox[1cm][r]{2} \> \quad \verb|OTP_RIGHT_NUM| \> \quad 1 argument\\
\makebox[1cm][r]{3} \> \quad \verb|OTP_RIGHT_CHAR| \> \quad 1 argument\\
\makebox[1cm][r]{4} \> \quad \verb|OTP_RIGHT_LCHAR| \> \quad 1 argument\\
\makebox[1cm][r]{5} \> \quad \verb|OTP_RIGHT_SOME| \> \quad 2 arguments\\
\\
\makebox[1cm][r]{6} \> \quad \verb|OTP_PBACK_OUTPUT| \> \quad 0 arguments\\
\makebox[1cm][r]{7} \> \quad \verb|OTP_PBACK_NUM| \> \quad 1 argument\\
\makebox[1cm][r]{8} \> \quad \verb|OTP_PBACK_CHAR| \> \quad 1 argument\\
\makebox[1cm][r]{9} \> \quad \verb|OTP_PBACK_LCHAR| \> \quad 1 argument\\
\makebox[1cm][r]{10} \> \quad \verb|OTP_PBACK_SOME| \> \quad 2 arguments\\
\\
\makebox[1cm][r]{11} \> \quad \verb|OTP_ADD| \> \quad 0 arguments\\
\makebox[1cm][r]{12} \> \quad \verb|OTP_SUB| \> \quad 0 arguments\\
\makebox[1cm][r]{13} \> \quad \verb|OTP_MULT| \> \quad 0 arguments\\
\makebox[1cm][r]{14} \> \quad \verb|OTP_DIV| \> \quad 0 arguments\\
\makebox[1cm][r]{15} \> \quad \verb|OTP_MOD| \> \quad 0 arguments\\
\makebox[1cm][r]{16} \> \quad \verb|OTP_LOOKUP| \> \quad 0 arguments\\
\makebox[1cm][r]{17} \> \quad \verb|OTP_PUSH_NUM| \> \quad 1 argument\\
\makebox[1cm][r]{18} \> \quad \verb|OTP_PUSH_CHAR| \> \quad 1 argument\\
\makebox[1cm][r]{19} \> \quad \verb|OTP_PUSH_LCHAR| \> \quad 1 argument\\
\\
\makebox[1cm][r]{20} \> \quad \verb|OTP_STATE_CHANGE| \> \quad 1 argument\\
\makebox[1cm][r]{21} \> \quad \verb|OTP_STATE_PUSH| \> \quad 1 argument\\
\makebox[1cm][r]{22} \> \quad \verb|OTP_STATE_POP| \> \quad 1 argument\\
\\
\makebox[1cm][r]{23} \> \quad \verb|OTP_LEFT_START| \> \quad 0 arguments\\
\makebox[1cm][r]{24} \> \quad \verb|OTP_LEFT_RETURN| \> \quad 0 arguments\\
\makebox[1cm][r]{25} \> \quad \verb|OTP_LEFT_BACKUP| \> \quad 0 arguments\\
\\
\makebox[1cm][r]{26} \> \quad \verb|OTP_GOTO| \> \quad 1 argument\\
\makebox[1cm][r]{27} \> \quad \verb|OTP_GOTO_NE| \> \quad 2 arguments\\
\makebox[1cm][r]{28} \> \quad \verb|OTP_GOTO_EQ| \> \quad 2 arguments\\
\makebox[1cm][r]{29} \> \quad \verb|OTP_GOTO_LT| \> \quad 2 arguments\\
\makebox[1cm][r]{30} \> \quad \verb|OTP_GOTO_LE| \> \quad 2 arguments\\
\makebox[1cm][r]{31} \> \quad \verb|OTP_GOTO_GT| \> \quad 2 arguments\\
\makebox[1cm][r]{32} \> \quad \verb|OTP_GOTO_GE| \> \quad 2 arguments\\
\makebox[1cm][r]{33} \> \quad \verb|OTP_GOTO_NO_ADVANCE| \> \quad 1 argument\\
\makebox[1cm][r]{34} \> \quad \verb|OTP_GOTO_BEG| \> \quad 1 argument\\
\makebox[1cm][r]{35} \> \quad \verb|OTP_GOTO_END| \> \quad 1 argument\\
\makebox[1cm][r]{36} \> \quad \verb|OTP_STOP| \> \quad 0 arguments\\
\end{tabbing}

The \verb|OTP_LEFT|, \verb|OTP_GOTO| and \verb|OTP_STOP| instructions
are used for recognizing prefixes in an input stream.  The \verb|OTP_RIGHT|
instructions place characters on the output stream, while the
\verb|OTP_PBACK| instructions place characters back onto the input
stream.  The instructions \verb|OTP_ADD| through to
\verb|OTP_PUSH_LCHAR| are used for internal computations in preparation
for \verb|OTP_RIGHT| or \verb|OTP_PBACK| instructions.  Finally, the
\verb|OTP_STATE| instructions are for changing macro-states.

The system that reads from the input stream uses two pointers, which
we will call \emph{first} and \emph{last}. The \emph{first} value
points to the beginning of the input prefix that is currently being
identified.  The \emph{last} value points to the end of the input
prefix that has been read.  When a prefix has been recognized, then
\emph{first} points to~\verb|\1| and \emph{last} points to~\verb|\$|.

The \verb|OTP_LEFT_START| instruction, called at the beginning of
the parsing of a prefix, advances \emph{first} to $\emph{last}+1$;  
\verb|OTP_LEFT_RETURN| resets the \emph{last} value to
$\emph{first}-1$ (it is called when a particular \emph{left} pattern
does not correspond to the prefix); \verb|OTP_LEFT_BACKUP| backs up
the \emph{last} pointer by~1.

Internally, a CTP program uses a program counter (PC), which is simply an
index into the appropriate state array.  Like for all assembler
programs, this counter is normally incremented by 1 or~2, depending on
the size of the instruction, but it can be abruptly changed through
an \verb|OTP_GOTO| instruction.

The argument in single-argument \verb|OTP_GOTO| instructions is the
new~PC.  For the two-argument instructions, the first is the comparand
and the second is the new~PC should the test succeed. The
\verb|OTP_GOTO| instruction itself is an unconditional branch;
\verb|OTP_GOTO_NO_ADVANCE| advances \emph{last} by~1, and branches if
has reached the end of input; \verb|OTP_GOTO_BEG| branches at the
beginning of input and \verb|OTP_GOTO_END| branches at the end of
input.  As for \verb|OTP_GOTO_|\emph{cond}, it succeeds if the
character pointed to by \emph{last} (we'll call it
\verb|*|\emph{last}) satisfies the test
\emph{cond}(\verb|*|\emph{last}, \emph{firstArg}).

The \verb|OTP_STOP| instruction stops processing of the currently
recognized prefix.  Normally the automaton will be restarted with an
\verb|OTP_LEFT_START| instruction.

When computations are undertaken for the \verb|OTP_RIGHT| and
\verb|OTP_PBACK| instructions, a computation stack is used.
This stack is accessed through instructions \verb|OTP_ADD| through
to \verb|OTP_PUSH_LCHAR|, as well as through the instructions
\verb|OTP_RIGHT_OUTPUT| and \verb|OTP_PBACK_OUTPUT|.

Since the \verb|OTP_RIGHT| and \verb|OTP_PBACK| instructions are
analogous, only the former are described.
The \verb|OTP_RIGHT_OUTPUT| instruction pops a value of the top of the
stack and outputs it; \verb|OTP_RIGHT_NUM|$(n$) simply places $n$
on the output stream; \verb|OTP_RIGHT_CHAR|$(n)$ places the $n$-th input 
character on the output stream; \verb|OTP_RIGHT_LCHAR| does the same,
but from the back;  finally, \verb|OTP_RIGHT_SOME| places a substring
onto the output stream.

Three instructions are used for placing values on the stack:
\verb|OTP_PUSH_NUM|$(n)$ pushes $n$ onto the stack,
\verb|OTP_PUSH_CHAR|$(n)$ pushes the $n$-th character and
\verb|OTP_PUSH_LCHAR|$(n)$ does the same from the end.

The arithmetic operations of the form \verb|OTP_|\emph{op} apply the
operation 
\begin{eqnarray*}
\mathit{stack}[\mathit{top}-1] & := & 
\mathit{stack}[\mathit{top}-1] \; \mathit{op} \;
\mathit{stack}[\mathit{top}]
\end{eqnarray*}
where \emph{top} is the stack pointer, and then decrement the stack
pointer.  Finally, the \verb|OTP_LOOKUP| instruction applies the
operation
\begin{eqnarray*}
\mathit{stack}[\mathit{top}-1] & := & 
\mathit{stack}[\mathit{top}-1][\mathit{stack}[\mathit{top}]]
\end{eqnarray*}
and then decrements the pointer.

Last, but not least, are the \verb|OTP_STATE| instructions, which
manipulate a stack of macro-states.  The initial state is always~0.
The \verb|OTP_STATE_CHANGE|$(n)$ changes the current state 
state~$n$; \verb|OTP_STATE_PUSH|$(n)$ pushes the current state onto
the state stack before changing the current state;
\verb|OTP_STATE_POP| pops the state at the top of the state stack into
the current state.

\section{Translation process lists}

Translation processes can be used for a number of different purposes.
Since not all uses can be foreseen, we have decided to offer a means
to dynamically reconfigure the set of translation processes that are
passing over the input text.  This is done using stacks of translation
process lists.

For any single purpose, for example to process a given language,
several CTPs might be required.  If one makes a context switch,
such as processing a different language, then one would to be able
to quickly replace \emph{all} of the CTPs that are currently being
used.  This is done using CTP lists.

A CTP list is actually a list of pairs.  Each pair consists of a
positive scaled value and a doubly ended queue of CTPs.  For
example,
\begin{verbatim}
     \ctplist\ArabicCTP=[(1.0 : \TexUni,\UniUniTwo,\UniTwoFont)]
\end{verbatim}
the output from \OMEGA\ once the CTP list \verb|\ArabicCTP| has
been typed, shows that that list has one element, namely the pair
with the scaled value~1.0 and the doubly ended queue with three
CTPs, \verb|\TexUni|, \verb|\UniUniTwo| and \verb|\UniTwoFont|.

CTP lists are built up using the five operators \verb|\nullctlist|,
\verb|\addbefore|\-\verb|ctp|\-\verb|list|, \verb|\addafterctplist|,
\verb|\removebeforectplist| and \verb|\removeafter|\-\verb|ctp|\-\verb|list|.
For example, the above output was generated by the following
sequence of \OMEGA\ statements:
\begin{verbatim}
     \ctp\TexUni=TeXArabicToUnicode
     \ctp\UniUniTwo=UnicodeToContUnicode
     \ctp\UniTwoFont=ContUnicodeToTeXArabicOut

     \ctplist\ArabicCTP=
     \addbeforectplist 1 \TexUni
     \addbeforectplist 1 \UniUniTwo
     \addbeforectplist 1 \UniTwoFont
     \nullctplist
\end{verbatim}

The \verb|\ctplist| command is similar to the \verb|\ctp| command:\\
\verb|\ctplist|~\emph{listName}~\verb|=|~\emph{ctpListExpr}.

All \emph{ctpListExpr} are built up from either the empty CTP list,
\verb|\nullctplist|, or from an already existing CTP list.  In the
latter case, the list is completely copied, to ensure that the named
list is not itself modified.  Given a list~$l$, the instruction
\verb|\addbeforectplist|~$n$~\emph{ctp}~$l$ states that the CTP
\emph{ctp} is added at the head of the doubly ended queue for
value~$n$ in list~$l$.  If that queue does not exist, it is created
and inserted in the list so that the scaled values are all in
increasing order.  The instruction
\verb|\addafterctplist|~$n$~\emph{ctp}~$l$ does the same, except the
addition takes place at the tail of the doubly ended queue.  The
instruction
\verb|\removebeforectplist|~$n$~$l$ removes the CTP at the head of the
doubly ended queue numbered~$n$.  The instruction
\verb|\removeafterctplist|~$n$~$l$ does the same at the tail of the
doubly ended queue.  See the next section for more examples.

\section{Input Filters}

Here we come to the crucial parts of \OMEGA.  What happens to the
input stream as it passes through translation processes?  What is
the interaction between \TeX's macro-expansion and \OMEGA's translation 
processes?

When \OMEGA\ is in horizontal mode and encounters a
\emph{letter}, \emph{other\_char}, \emph{char\_given} or
\emph{char\_num}, that character and all the successive 
characters in those categories are read into a buffer.
The currently active CTP is applied to the buffer, and 
the result is placed back onto the input, to be reread
by the standard \TeX\ input routines, including macro
expansion.

The currently active CTP is designated by a pair $(v,i)$, 
where $v$~is a scaled value and $i$~is an integer.  If all the
enabled CTPs are in a CTP list, then the~$v$ designates the index into
the CTP list and the~$i$ designates which element in the $v$-queue is
currently active.

Once a CTP has been used, the~$i$ is incremented; if it points to the
end of the current queue, then $v$~is set to the next queue, and
$i$~is reset to~1.

When the last enabled CTP has been used, then the standard techniques
for treating letters and other characters are used, namely generating
paragraphs, etc.

What this means is that it is now possible to apply a filter on the 
\emph{text} of a file without macro-expansion, generate a new text,
possibly with macros to be expanded, macro-expand, re-apply filters,
etc.  All this without active characters, and without breaking macro
packages.

How are CTP lists enabled?  CTP lists are placed on a stack, each
numbered queue in a given list masking the queues with the same number
for the lists below that one on the stack.

There are three commands, which all respect the grouping mechanism.
The \verb|\clearctplists| command disables all CTP lists.
The \verb|\pushctplist|~\emph{CTPlist} command pushes \emph{CTPlist}
onto the stack. The \verb|\popctplist| command pops the last list 
from the stack.

For example, consider the following purely hypothetical situations:
\begin{verbatim}
     \ctplist\FrenchCTP = \addbeforectplist 1 \ctpA
                          \addbeforectplist 2 \ctpB
                          \addbeforectplist 3 \ctpC
                          \nullctplist
\end{verbatim}

\begin{verbatim}
     \ctplist\GermanCTP = \addbeforectplist 1 \ctpD
                          \addbeforectplist 2 \ctpE
                          \addbeforectplist 3 \ctpF
                          \nullctplist
\end{verbatim}

\begin{verbatim}
     \ctplist\ArabicCTP = \addbeforectplist 1 \ctpG
                          \addbeforectplist 2 \ctpH
                          \addbeforectplist 2 \ctpI
                          \addbeforectplist 3 \ctpJ
                          \nullctplist
\end{verbatim}

\begin{verbatim}
     \ctplist\SpecialArabicCTP =
                          \addafterctplist 3 \ctpK
                          \ArabicCTP
\end{verbatim}

\begin{verbatim}
     \ctplist\UpperCaseCTP =
                          \addbeforectplist 2.5 \ctpL
                          \nullctplist
\end{verbatim}
There are now 5 CTP lists \emph{defined}, but none of them are
\emph{enabled}.  The defined lists are:
\begin{verbatim}
     \ctplist\FrenchCTP =
         [(1.0:\ctpA), (2.0:\ctpB), (3.0:\ctpC)]
     \ctplist\GermanCTP =
         [(1.0:\ctpD), (2.0:\ctpE), (3.0:\ctpF)]
     \ctplist\ArabicCTP =
         [(1.0:\ctpG), (2.0:\ctpH,\ctpI), (3.0:\ctpJ)]
     \ctplist\SpecialArabicCTP =
         [(1.0:\ctpG), (2.0:\ctpH,\ctpI), (3.0:\ctpJ,\ctpK)]
     \ctplist\UpperCaseCTP =
         [(2.5:\ctpL)]
\end{verbatim}
Consider now the sequence of instructions
\begin{verbatim}
     \clearctplists
     \pushctplist\FrenchCTP
     \pushctplist\UpperCaseCTP
     \pushctplist\GermanCTP
     \popctplist
     \popctplist
     \pushctplist\ArabicCTP
     \pushctplist\SpecialArabicCTP
     \pushctplist\GermanCTP
\end{verbatim}
The effective enabled CTP list is, in turn:
\begin{verbatim}
     []
     [(1.0:\ctpA), (2.0:\ctpB), (3.0:\ctpC)]
     [(1.0:\ctpA), (2.0:\ctpB), (2.5:\ctpL), (3.0:\ctpC)]
     [(1.0:\ctpD), (2.0:\ctpE), (2.5:\ctpL), (3.0:\ctpF)]
     [(1.0:\ctpA), (2.0:\ctpB), (2.5:\ctpL), (3.0:\ctpC)]
     [(1.0:\ctpA), (2.0:\ctpB), (3.0:\ctpC)]
     [(1.0:\ctpG), (2.0:\ctpH,\ctpI), (3.0:\ctpJ)]
     [(1.0:\ctpG), (2.0:\ctpH,\ctpI), (3.0:\ctpJ,\ctpK)]
     [(1.0:\ctpD), (2.0:\ctpE), (3.0:\ctpF)]
\end{verbatim}
     
The first test of the CTP lists was for Arabic.  The text was typed
in \textsc{ascii}, using a Latin transliteration.  This text was first
transformed into Unicode, the official 16-bit encoding for the 
world's character sets.  These letters were then translated into
their appropriate visual forms (isolated, initial, medial or final)
and then the text was translated into the font encoding.  During the
second translation, inter-letter black spacing is inserted, since Arabic
typesetting calls for word expansion to fill out a line.  Here is the
input:
\begin{verbatim}
     \font\ARfont=oar10 scaled 1728 offset 256 %% an X-font
     \def\keshideh{%
     \begingroup\penalty10000%
     \clearctplists\xleaders\hbox{\char'767}\hskip0ptplus1fi%
     \endgroup}
     \ctp\TexUni=TeXArabicToUnicode
     \ctp\UniUniTwo=UnicodeToContUnicode
     \ctp\UniTwoFont=ContUnicodeToTeXArabicOut
     \ctplist\ArabicCTP=%
     \addbeforectplist 1 \TexUni
     \addbeforectplist 1 \UniUniTwo
     \addbeforectplist 1 \UniTwoFont
     \nullctplist
     \def\AR#1{\begingroup\noindent\pushctplist \ArabicCTP%
     \ARfont\language=255\beginR\quad #1\hfill\endR\endgroup}
\end{verbatim}
Notice that the \verb|\keshideh|, which is dynamically inserted
between letters by the \verb|\UniUniTwo| CTP, uses the \verb|fi|
infinity.  It also disables all of the CTPs, within a group.

\section{Automatic detection of character sets}

Most character sets belong to one of three groups:
\begin{enumerate}
\item 8-bit character sets (including shift character sets) that
include \textsc{ascii}; 
\item 8-bit character sets (including shift character sets) that include
\textsc{ebcdic}; and
\item 16-bit character sets that include \textsc{ascii} as the first
128~characters, such as Unicode.
\end{enumerate}

In a multilingual, heterogeneous environment, it it inevitable that
different files will be written using different character sets.  It
is even possible that the same file might have different parts that
use different character sets.  How is it possible to tag
these files internally so that \OMEGA\ can apply the right translations?

\OMEGA\ has two basic modes of input: the old \TeX\ style, or the
automatic \OMEGA\ style.  The old \TeX\ style, is turned on when
the \verb|\noInputMode| command is read.  The default mechanism is
to use the automatic \OMEGA\ style.

If the \OMEGA\ style is being used, there are three modes,
\texttt{ascii}, \texttt{ebcdic} and \texttt{unicode}, which correspond
to the three situations above.  Upon opening a file, \OMEGA\ reads the
first two characters.  If the first character is hex~\texttt{25}
(\texttt{ascii}~\verb|%|), \OMEGA\ assumes that the input character
set is \texttt{ascii}.  If the first character is
hex~\texttt{6c} (\texttt{ebcdic}~\verb|%|), \OMEGA\ assumes that the
input character set is \texttt{ebcdic}.  Finally, if the
first two characters form hex~\texttt{0025} (Unicode~|%|),
\OMEGA\ assumes that the input character set is Unicode.  If none of
these three situations occurs, then the default input mode is assumed.

Here are the instructions for specifying modes.  All of these
instructions apply only after the carriage return terminating the
current input line.
The \verb|\inputMode|~\emph{mode} command, where \emph{mode} is
one of \texttt{ascii}, \texttt{ebcdic} or \texttt{unicode}, states
that after the carriage return, the input mode is~\emph{mode}. 
The \verb|\noInput|\-\verb|Mode| command states that the old \TeX\ style
should be used.
The \verb|\default|\-\verb|Input|\-\verb|Mode|~\emph{mode} instruction
states that the default mode --- when there is no comment character at
the beginning of a file --- should be \emph{mode}.  As for
\verb|\noDefaultInputMode|,
it states that there is no default mode, and that whatever settings 
existed when opening the file should remain.

The default mode when the system begins is \OMEGA\ style,
assuming \texttt{ascii}.  This is sufficient for all the
\texttt{iso-8859} character sets, many national character sets,
and most mixed-length character sets used in East Asia.

Once the basic family of character sets has been determined, 
\OMEGA\ can read the files, and actually interpret control sequences.
It is then possible to be more specific and to specify exactly what
translation process must be applied to the entire file to convert
the input to Unicode.

For the moment, input translations are simply single CTPs, which
differ from input filters in that they apply to \emph{all} characters
in a file, not simply the letters and other characters in horizontal
mode.  For each kind of mode, there can be a default input
translation.  

As for the mode instructions, each instruction only applies after the 
carriage return terminating the current line.
The \verb|\inputTranslation|~\emph{ctp} command states that after this
line, all input will be passed through translation process~\emph{ctp}.
The \verb|\noInputTranslation| states that no input will be
translated.

The \verb|\defaultAsciiInputTranslation|~\emph{ctp},
\verb|\defaultEbcdicInputTrans|\-\verb|lation|~\emph{ctp} and
\verb|\defaultUnicodeInputTranslation|~\emph{ctp}
commands state what the default translations will be for each
of the modes.  Finally, the
\verb|\noDefault|\-\verb|Ascii|\-\verb|Input|\-\verb|Translation|,
\verb|\noDefault|\-\verb|Ebcdic|\-\verb|Input|\-\verb|Translation| and
\verb|\noDef|\-\verb|ault|\-\verb|Unicode|\-\verb|Input|\-\verb|Translation| commands
remove default translations.

Upon startup, there is no default translation for
\texttt{ascii} or \texttt{unicode} modes, but there is one for
\texttt{ebcdic}, namely 
\begin{verbatim}
     \ctp\InputEBCDIC=inebcdic
     \defaultEbcdicInputTranslation\InputEBCDIC
\end{verbatim}

\section{Further work}

Translations should be applied to output and to \verb|\special|
sequences as well.  This has not yet been implemented, but will
be soon.  Furthermore, the standard \verb|^^| and \verb|^^^^|
forms used by \OMEGA\ will soon be implemented as CTPs.  To do that,
however, requires that input translations be CTP lists rather than
CTPs.  This requires more thought for implementation.

We hope that this cursory documentation suffices to experiment 
with~\OMEGA.  More detailed documentation will follow.

\end{document}