% This program is copyright (C) 1982 by D. E. Knuth; all rights are reserved. % Copying of this file is authorized only if (1) you are D. E. Knuth, or if % (2) you make absolutely no changes to your copy. (The WEB system provides % for alterations via an auxiliary file; the master file should stay intact.) % See Appendix H of the WEB manual for hints on how to install this program. % And see Appendix A of the TRIP manual for details about how to validate it. % TeX is a trademark of the American Mathematical Society. % METAFONT is a trademark of Addison-Wesley Publishing Company. % Version 0 was released in September 1982 after it passed a variety of tests. % Version 1 was released in November 1983 after thorough testing. % Version 1.1 fixed ``disappearing font identifiers'' et alia (July 1984). % Version 1.2 allowed `0' in response to an error, et alia (October 1984). % Version 1.3 made memory allocation more flexible and local (November 1984). % Version 1.4 fixed accents right after line breaks, et alia (April 1985). % Version 1.5 fixed \the\toks after other expansion in \edefs (August 1985). % Version 2.0 (almost identical to 1.5) corresponds to "Volume B" (April 1986). % Version 2.1 corrected anomalies in discretionary breaks (January 1987). % Version 2.2 corrected "(Please type...)" with null \endlinechar (April 1987). % Version 2.3 avoided incomplete page in premature termination (August 1987). % Version 2.4 fixed \noaligned rules in indented displays (August 1987). % Version 2.5 saved cur_order when expanding tokens (September 1987). % Version 2.6 added 10sp slop when shipping leaders (November 1987). % Version 2.7 improved rounding of negative-width characters (November 1987). % Version 2.8 fixed weird bug if no \patterns are used (December 1987). % Version 2.9 made \csname\endcsname's "relax" local (December 1987). % Version 2.91 fixed \outer\def\a0{}\a\a bug (April 1988). % Version 2.92 fixed \patterns, also file names with complex macros (May 1988). % Version 2.93 fixed negative halving in allocator when mem_min<0 (June 1988). % Version 2.94 kept open_log_file from calling fatal_error (November 1988). % Version 2.95 solved that problem a better way (December 1988). % Version 2.96 corrected bug in "Infinite shrinkage" recovery (January 1989). % Version 2.97 corrected blunder in creating 2.95 (February 1989). % Version 2.98 omitted save_for_after at outer level (March 1989). % Version 2.99 caught $$\begingroup\halign..$$ (June 1989). % Version 2.991 caught .5\ifdim.6... (June 1989). % Version 2.992 introduced major changes for 8-bit extensions (September 1989). % Version 2.993 fixed a save_stack synchronization bug et alia (December 1989). % Version 3.0 fixed unusual displays; was more \output robust (March 1990). % Version 3.1 fixed nullfont, disabled \write{\the\prevgraf} (September 1990). % A reward of $327.68 will be paid to the first finder of any remaining bug, % not counting changes introduced after August 1989. % Although considerable effort has been expended to make the TeX program % correct and reliable, no warranty is implied; the author disclaims any % obligation or liability for damages, including but not limited to % special, indirect, or consequential damages arising out of or in % connection with the use or performance of this software. This work has % been a ``labor of love'' and the author hopes that users enjoy it. % Here is TeX material that gets inserted after \input webmac \def\hang{\hangindent 3em\noindent\ignorespaces} \def\textindent#1{\hangindent2.5em\noindent\hbox to2.5em{\hss#1 }\ignorespaces} \font\ninerm=cmr9 \let\mc=\ninerm % medium caps for names like SAIL \def\PASCAL{Pascal} \def\ph{\hbox{Pascal-H}} \def\pct!{{\char`\%}} % percent sign in ordinary text \font\logo=logo10 % font used for the METAFONT logo \def\MF{{\logo META}\-{\logo FONT}} \def\<#1>{$\langle#1\rangle$} \def\section{\mathhexbox278} \def\(#1){} % this is used to make section names sort themselves better \def\9#1{} % this is used for sort keys in the index via @@:sort key}{entry@@> \outer\def\N#1. \[#2]#3.{\MN#1.\vfil\eject % begin starred section \def\rhead{PART #2:\uppercase{#3}} % define running headline \message{*\modno} % progress report \edef\next{\write\cont{\Z{\?#2]#3}{\modno}{\the\pageno}}}\next \ifon\startsection{\bf\ignorespaces#3.\quad}\ignorespaces} \let\?=\relax % we want to be able to \write a \? \def\title{\TeX82} \def\topofcontents{\hsize 5.5in \vglue 0pt plus 1fil minus 1.5in \def\?##1]{\hbox to 1in{\hfil##1.\ }} } \def\botofcontents{\vskip 0pt plus 1fil minus 1.5in} \pageno=3 \def\glob{13} % this should be the section number of "" \def\gglob{20, 26} % this should be the next two sections of "" @* \[1] Introduction. This is \TeX, a document compiler intended to produce typesetting of high quality. The \PASCAL\ program that follows is the definition of \TeX82, a standard @:PASCAL}{\PASCAL@> @!@:TeX82}{\TeX82@> version of \TeX\ that is designed to be highly portable so that identical output will be obtainable on a great variety of computers. The main purpose of the following program is to explain the algorithms of \TeX\ as clearly as possible. As a result, the program will not necessarily be very efficient when a particular \PASCAL\ compiler has translated it into a particular machine language. However, the program has been written so that it can be tuned to run efficiently in a wide variety of operating environments by making comparatively few changes. Such flexibility is possible because the documentation that follows is written in the \.{WEB} language, which is at a higher level than \PASCAL; the preprocessing step that converts \.{WEB} to \PASCAL\ is able to introduce most of the necessary refinements. Semi-automatic translation to other languages is also feasible, because the program below does not make extensive use of features that are peculiar to \PASCAL. A large piece of software like \TeX\ has inherent complexity that cannot be reduced below a certain level of difficulty, although each individual part is fairly simple by itself. The \.{WEB} language is intended to make the algorithms as readable as possible, by reflecting the way the individual program pieces fit together and by providing the cross-references that connect different parts. Detailed comments about what is going on, and about why things were done in certain ways, have been liberally sprinkled throughout the program. These comments explain features of the implementation, but they rarely attempt to explain the \TeX\ language itself, since the reader is supposed to be familiar with {\sl The \TeX book}. @.WEB@> @:TeXbook}{\sl The \TeX book@> @ The present implementation has a long ancestry, beginning in the summer of~1977, when Michael~F. Plass and Frank~M. Liang designed and coded a prototype @^Plass, Michael Frederick@> @^Liang, Franklin Mark@> @^Knuth, Donald Ervin@> based on some specifications that the author had made in May of that year. This original proto\TeX\ included macro definitions and elementary manipulations on boxes and glue, but it did not have line-breaking, page-breaking, mathematical formulas, alignment routines, error recovery, or the present semantic nest; furthermore, it used character lists instead of token lists, so that a control sequence like \.{\\halign} was represented by a list of seven characters. A complete version of \TeX\ was designed and coded by the author in late 1977 and early 1978; that program, like its prototype, was written in the {\mc SAIL} language, for which an excellent debugging system was available. Preliminary plans to convert the {\mc SAIL} code into a form somewhat like the present ``web'' were developed by Luis Trabb~Pardo and the author at the beginning of 1979, and a complete implementation was created by Ignacio~A. Zabala in 1979 and 1980. The \TeX82 program, which @^Zabala Salelles, Ignacio Andres@> was written by the author during the latter part of 1981 and the early part of 1982, also incorporates ideas from the 1979 implementation of @^Guibas, Leonidas Ioannis@> @^Sedgewick, Robert@> @^Wyatt, Douglas Kirk@> \TeX\ in {\mc MESA} that was written by Leonidas Guibas, Robert Sedgewick, and Douglas Wyatt at the Xerox Palo Alto Research Center. Several hundred refinements were introduced into \TeX82 based on the experiences gained with the original implementations, so that essentially every part of the system has been substantially improved. After the appearance of ``Version 0'' in September 1982, this program benefited greatly from the comments of many other people, notably David~R. Fuchs and Howard~W. Trickey. A final revision in September 1989 extended the input character set to eight-bit codes and introduced the ability to hyphenate words from different languages, based on some ideas of Michael~J. Ferguson. @^Fuchs, David Raymond@> @^Trickey, Howard Wellington@> @^Ferguson, Michael John@> No doubt there still is plenty of room for improvement, but the author is firmly committed to keeping \TeX82 ``frozen'' from now on; stability and reliability are to be its main virtues. On the other hand, the \.{WEB} description can be extended without changing the core of \TeX82 itself, and the program has been designed so that such extensions are not extremely difficult to make. The |banner| string defined here should be changed whenever \TeX\ undergoes any modifications, so that it will be clear which version of \TeX\ might be the guilty party when a problem arises. @^extensions to \TeX@> @^system dependencies@> If this program is changed, the resulting system should not be called `\TeX'; the official name `\TeX' by itself is reserved for software systems that are fully compatible with each other. A special test suite called the ``\.{TRIP} test'' is available for helping to determine whether a particular implementation deserves to be known as `\TeX' [cf.~Stanford Computer Science report CS1027, November 1984]. @d banner=='This is TeX, Version 3.1' {printed when \TeX\ starts} @ Different \PASCAL s have slightly different conventions, and the present @!@:PASCAL H}{\ph@> program expresses \TeX\ in terms of the \PASCAL\ that was available to the author in 1982. Constructions that apply to this particular compiler, which we shall call \ph, should help the reader see how to make an appropriate interface for other systems if necessary. (\ph\ is Charles Hedrick's modification of a compiler @^Hedrick, Charles Locke@> for the DECsystem-10 that was originally developed at the University of Hamburg; cf.\ {\sl SOFTWARE---Practice \AM\ Experience \bf6} (1976), 29--42. The \TeX\ program below is intended to be adaptable, without extensive changes, to most other versions of \PASCAL, so it does not fully use the admirable features of \ph. Indeed, a conscious effort has been made here to avoid using several idiosyncratic features of standard \PASCAL\ itself, so that most of the code can be translated mechanically into other high-level languages. For example, the `\&{with}' and `\\{new}' features are not used, nor are pointer types, set types, or enumerated scalar types; there are no `\&{var}' parameters, except in the case of files; there are no tag fields on variant records; there are no assignments |real:=integer|; no procedures are declared local to other procedures.) The portions of this program that involve system-dependent code, where changes might be necessary because of differences between \PASCAL\ compilers and/or differences between operating systems, can be identified by looking at the sections whose numbers are listed under `system dependencies' in the index. Furthermore, the index entries for `dirty \PASCAL' list all places where the restrictions of \PASCAL\ have not been followed perfectly, for one reason or another. @!@^system dependencies@> @!@^dirty \PASCAL@> @ The program begins with a normal \PASCAL\ program heading, whose components will be filled in later, using the conventions of \.{WEB}. @.WEB@> For example, the portion of the program called `\X\glob:Global variables\X' below will be replaced by a sequence of variable declarations that starts in $\section\glob$ of this documentation. In this way, we are able to define each individual global variable when we are prepared to understand what it means; we do not have to define all of the globals at once. Cross references in $\section\glob$, where it says ``See also sections \gglob, \dots,'' also make it possible to look at the set of all global variables, if desired. Similar remarks apply to the other portions of the program heading. Actually the heading shown here is not quite normal: The |program| line does not mention any |output| file, because \ph\ would ask the \TeX\ user to specify a file name if |output| were specified here. @^system dependencies@> @d mtype==t@&y@&p@&e {this is a \.{WEB} coding trick:} @f mtype==type {`\&{mtype}' will be equivalent to `\&{type}'} @f type==true {but `|type|' will not be treated as a reserved word} @p @t\4@>@@/ program TEX; {all file names are defined dynamically} label @@/ const @@/ mtype @@/ var @@/ @# procedure initialize; {this procedure gets things started properly} var @@/ begin @@; end;@# @t\4@>@@/ @t\4@>@@/ @ The overall \TeX\ program begins with the heading just shown, after which comes a bunch of procedure declarations and function declarations. Finally we will get to the main program, which begins with the comment `|start_here|'. If you want to skip down to the main program now, you can look up `|start_here|' in the index. But the author suggests that the best way to understand this program is to follow pretty much the order of \TeX's components as they appear in the \.{WEB} description you are now reading, since the present ordering is intended to combine the advantages of the ``bottom up'' and ``top down'' approaches to the problem of understanding a somewhat complicated system. @ Three labels must be declared in the main program, so we give them symbolic names. @d start_of_TEX=1 {go here when \TeX's variables are initialized} @d end_of_TEX=9998 {go here to close files and terminate gracefully} @d final_end=9999 {this label marks the ending of the program} @= start_of_TEX@t\hskip-2pt@>, end_of_TEX@t\hskip-2pt@>,@,final_end; {key control points} @ Some of the code below is intended to be used only when diagnosing the strange behavior that sometimes occurs when \TeX\ is being installed or when system wizards are fooling around with \TeX\ without quite knowing what they are doing. Such code will not normally be compiled; it is delimited by the codewords `$|debug|\ldots|gubed|$', with apologies to people who wish to preserve the purity of English. Similarly, there is some conditional code delimited by `$|stat|\ldots|tats|$' that is intended for use when statistics are to be kept about \TeX's memory usage. The |stat| $\ldots$ |tats| code also implements diagnostic information for \.{\\tracingparagraphs} and \.{\\tracingpages}. @^debugging@> @d debug==@{ {change this to `$\\{debug}\equiv\null$' when debugging} @d gubed==@t@>@} {change this to `$\\{gubed}\equiv\null$' when debugging} @f debug==begin @f gubed==end @# @d stat==@{ {change this to `$\\{stat}\equiv\null$' when gathering usage statistics} @d tats==@t@>@} {change this to `$\\{tats}\equiv\null$' when gathering usage statistics} @f stat==begin @f tats==end @ This program has two important variations: (1) There is a long and slow version called \.{INITEX}, which does the extra calculations needed to @.INITEX@> initialize \TeX's internal tables; and (2)~there is a shorter and faster production version, which cuts the initialization to a bare minimum. Parts of the program that are needed in (1) but not in (2) are delimited by the codewords `$|init|\ldots|tini|$'. @d init== {change this to `$\\{init}\equiv\.{@@\{}$' in the production version} @d tini== {change this to `$\\{tini}\equiv\.{@@\}}$' in the production version} @f init==begin @f tini==end @= @@/ @!init @@;@+tini @ If the first character of a \PASCAL\ comment is a dollar sign, \ph\ treats the comment as a list of ``compiler directives'' that will affect the translation of this program into machine language. The directives shown below specify full checking and inclusion of the \PASCAL\ debugger when \TeX\ is being debugged, but they cause range checking and other redundant code to be eliminated when the production system is being generated. Arithmetic overflow will be detected in all cases. @^system dependencies@> @^Overflow in arithmetic@> @= @{@&$C-,A+,D-@} {no range check, catch arithmetic overflow, no debug overhead} @!debug @{@&$C+,D+@}@+ gubed {but turn everything on when debugging} @ This \TeX\ implementation conforms to the rules of the {\sl Pascal User @:PASCAL}{\PASCAL@> @^system dependencies@> Manual} published by Jensen and Wirth in 1975, except where system-dependent @^Wirth, Niklaus@> @^Jensen, Kathleen@> code is necessary to make a useful system program, and except in another respect where such conformity would unnecessarily obscure the meaning and clutter up the code: We assume that |case| statements may include a default case that applies if no matching label is found. Thus, we shall use constructions like $$\vbox{\halign{\ignorespaces#\hfil\cr |case x of|\cr 1: $\langle\,$code for $x=1\,\rangle$;\cr 3: $\langle\,$code for $x=3\,\rangle$;\cr |othercases| $\langle\,$code for |x<>1| and |x<>3|$\,\rangle$\cr |endcases|\cr}}$$ since most \PASCAL\ compilers have plugged this hole in the language by incorporating some sort of default mechanism. For example, the \ph\ compiler allows `|others|:' as a default label, and other \PASCAL s allow syntaxes like `\&{else}' or `\&{otherwise}' or `\\{otherwise}:', etc. The definitions of |othercases| and |endcases| should be changed to agree with local conventions. Note that no semicolon appears before |endcases| in this program, so the definition of |endcases| should include a semicolon if the compiler wants one. (Of course, if no default mechanism is available, the |case| statements of \TeX\ will have to be laboriously extended by listing all remaining cases. People who are stuck with such \PASCAL s have, in fact, done this, successfully but not happily!) @d othercases == others: {default for cases not listed explicitly} @d endcases == @+end {follows the default case in an extended |case| statement} @f othercases == else @f endcases == end @ The following parameters can be changed at compile time to extend or reduce \TeX's capacity. They may have different values in \.{INITEX} and in production versions of \TeX. @.INITEX@> @^system dependencies@> @= @!mem_max=30000; {greatest index in \TeX's internal |mem| array; must be strictly less than |max_halfword|; must be equal to |mem_top| in \.{INITEX}, otherwise |>=mem_top|} @!mem_min=0; {smallest index in \TeX's internal |mem| array; must be |min_halfword| or more; must be equal to |mem_bot| in \.{INITEX}, otherwise |<=mem_bot|} @!buf_size=500; {maximum number of characters simultaneously present in current lines of open files and in control sequences between \.{\\csname} and \.{\\endcsname}; must not exceed |max_halfword|} @!error_line=72; {width of context lines on terminal error messages} @!half_error_line=42; {width of first lines of contexts in terminal error messages; should be between 30 and |error_line-15|} @!max_print_line=79; {width of longest text lines output; should be at least 60} @!stack_size=200; {maximum number of simultaneous input sources} @!max_in_open=6; {maximum number of input files and error insertions that can be going on simultaneously} @!font_max=75; {maximum internal font number; must not exceed |max_quarterword| and must be at most |font_base+256|} @!font_mem_size=20000; {number of words of |font_info| for all fonts} @!param_size=60; {maximum number of simultaneous macro parameters} @!nest_size=40; {maximum number of semantic levels simultaneously active} @!max_strings=3000; {maximum number of strings; must not exceed |max_halfword|} @!string_vacancies=8000; {the minimum number of characters that should be available for the user's control sequences and font names, after \TeX's own error messages are stored} @!pool_size=32000; {maximum number of characters in strings, including all error messages and help texts, and the names of all fonts and control sequences; must exceed |string_vacancies| by the total length of \TeX's own strings, which is currently about 23000} @!save_size=600; {space for saving values outside of current group; must be at most |max_halfword|} @!trie_size=8000; {space for hyphenation patterns; should be larger for \.{INITEX} than it is in production versions of \TeX} @!trie_op_size=500; {space for ``opcodes'' in the hyphenation patterns} @!dvi_buf_size=800; {size of the output buffer; must be a multiple of 8} @!file_name_size=40; {file names shouldn't be longer than this} @!pool_name='TeXformats:TEX.POOL '; {string of length |file_name_size|; tells where the string pool appears} @.TeXformats@> @ Like the preceding parameters, the following quantities can be changed at compile time to extend or reduce \TeX's capacity. But if they are changed, it is necessary to rerun the initialization program \.{INITEX} @.INITEX@> to generate new tables for the production \TeX\ program. One can't simply make helter-skelter changes to the following constants, since certain rather complex initialization numbers are computed from them. They are defined here using \.{WEB} macros, instead of being put into \PASCAL's |const| list, in order to emphasize this distinction. @d mem_bot=0 {smallest index in the |mem| array dumped by \.{INITEX}; must not be less than |mem_min|} @d mem_top==30000 {largest index in the |mem| array dumped by \.{INITEX}; must be substantially larger than |mem_bot| and not greater than |mem_max|} @d font_base=0 {smallest internal font number; must not be less than |min_quarterword|} @d hash_size=2100 {maximum number of control sequences; it should be at most about |(mem_max-mem_min)/10|} @d hash_prime=1777 {a prime number equal to about 85\pct! of |hash_size|} @d hyph_size=307 {another prime; the number of \.{\\hyphenation} exceptions} @^system dependencies@> @ In case somebody has inadvertently made bad settings of the ``constants,'' \TeX\ checks them using a global variable called |bad|. This is the first of many sections of \TeX\ where global variables are defined. @= @!bad:integer; {is some ``constant'' wrong?} @ Later on we will say `\ignorespaces|if mem_max>=max_halfword then bad:=14|', or something similar. (We can't do that until |max_halfword| has been defined.) @= bad:=0; if (half_error_line<30)or(half_error_line>error_line-15) then bad:=1; if max_print_line<60 then bad:=2; if dvi_buf_size mod 8<>0 then bad:=3; if mem_bot+1100>mem_top then bad:=4; if hash_prime>hash_size then bad:=5; if max_in_open>=128 then bad:=6; if mem_top<256+11 then bad:=7; {we will want |null_list>255|} @ Labels are given symbolic names by the following definitions, so that occasional |goto| statements will be meaningful. We insert the label `|exit|:' just before the `\ignorespaces|end|\unskip' of a procedure in which we have used the `|return|' statement defined below; the label `|restart|' is occasionally used at the very beginning of a procedure; and the label `|reswitch|' is occasionally used just prior to a |case| statement in which some cases change the conditions and we wish to branch to the newly applicable case. Loops that are set up with the |loop| construction defined below are commonly exited by going to `|done|' or to `|found|' or to `|not_found|', and they are sometimes repeated by going to `|continue|'. If two or more parts of a subroutine start differently but end up the same, the shared code may be gathered together at `|common_ending|'. Incidentally, this program never declares a label that isn't actually used, because some fussy \PASCAL\ compilers will complain about redundant labels. @d exit=10 {go here to leave a procedure} @d restart=20 {go here to start a procedure again} @d reswitch=21 {go here to start a case statement again} @d continue=22 {go here to resume a loop} @d done=30 {go here to exit a loop} @d done1=31 {like |done|, when there is more than one loop} @d done2=32 {for exiting the second loop in a long block} @d done3=33 {for exiting the third loop in a very long block} @d done4=34 {for exiting the fourth loop in an extremely long block} @d done5=35 {for exiting the fifth loop in an immense block} @d done6=36 {for exiting the sixth loop in a block} @d found=40 {go here when you've found it} @d found1=41 {like |found|, when there's more than one per routine} @d found2=42 {like |found|, when there's more than two per routine} @d not_found=45 {go here when you've found nothing} @d common_ending=50 {go here when you want to merge with another branch} @ Here are some macros for common programming idioms. @d incr(#) == #:=#+1 {increase a variable by unity} @d decr(#) == #:=#-1 {decrease a variable by unity} @d negate(#) == #:=-# {change the sign of a variable} @d loop == @+ while true do@+ {repeat over and over until a |goto| happens} @f loop == xclause {\.{WEB}'s |xclause| acts like `\ignorespaces|while true do|\unskip'} @d do_nothing == {empty statement} @d return == goto exit {terminate a procedure call} @f return == nil @d empty=0 {symbolic name for a null constant} @* \[2] The character set. In order to make \TeX\ readily portable to a wide variety of computers, all of its input text is converted to an internal eight-bit code that includes standard ASCII, the ``American Standard Code for Information Interchange.'' This conversion is done immediately when each character is read in. Conversely, characters are converted from ASCII to the user's external representation just before they are output to a text file. Such an internal code is relevant to users of \TeX\ primarily because it governs the positions of characters in the fonts. For example, the character `\.A' has ASCII code $65=@'101$, and when \TeX\ typesets this letter it specifies character number 65 in the current font. If that font actually has `\.A' in a different position, \TeX\ doesn't know what the real position is; the program that does the actual printing from \TeX's device-independent files is responsible for converting from ASCII to a particular font encoding. @^ASCII code@> \TeX's internal code also defines the value of constants that begin with a reverse apostrophe; and it provides an index to the \.{\\catcode}, \.{\\mathcode}, \.{\\uccode}, \.{\\lccode}, and \.{\\delcode} tables. @ Characters of text that have been converted to \TeX's internal form are said to be of type |ASCII_code|, which is a subrange of the integers. @= @!ASCII_code=0..255; {eight-bit numbers} @ The original \PASCAL\ compiler was designed in the late 60s, when six-bit character sets were common, so it did not make provision for lowercase letters. Nowadays, of course, we need to deal with both capital and small letters in a convenient way, especially in a program for typesetting; so the present specification of \TeX\ has been written under the assumption that the \PASCAL\ compiler and run-time system permit the use of text files with more than 64 distinguishable characters. More precisely, we assume that the character set contains at least the letters and symbols associated with ASCII codes @'40 through @'176; all of these characters are now available on most computer terminals. Since we are dealing with more characters than were present in the first \PASCAL\ compilers, we have to decide what to call the associated data type. Some \PASCAL s use the original name |char| for the characters in text files, even though there now are more than 64 such characters, while other \PASCAL s consider |char| to be a 64-element subrange of a larger data type that has some other name. In order to accommodate this difference, we shall use the name |text_char| to stand for the data type of the characters that are converted to and from |ASCII_code| when they are input and output. We shall also assume that |text_char| consists of the elements |chr(first_text_char)| through |chr(last_text_char)|, inclusive. The following definitions should be adjusted if necessary. @^system dependencies@> @d text_char == char {the data type of characters in text files} @d first_text_char=0 {ordinal number of the smallest element of |text_char|} @d last_text_char=255 {ordinal number of the largest element of |text_char|} @= @!i:integer; @ The \TeX\ processor converts between ASCII code and the user's external character set by means of arrays |xord| and |xchr| that are analogous to \PASCAL's |ord| and |chr| functions. @= @!xord: array [text_char] of ASCII_code; {specifies conversion of input characters} @!xchr: array [ASCII_code] of text_char; {specifies conversion of output characters} @ Since we are assuming that our \PASCAL\ system is able to read and write the visible characters of standard ASCII (although not necessarily using the ASCII codes to represent them), the following assignment statements initialize the standard part of the |xchr| array properly, without needing any system-dependent changes. On the other hand, it is possible to implement \TeX\ with less complete character sets, and in such cases it will be necessary to change something here. @^system dependencies@> @= xchr[@'40]:=' '; xchr[@'41]:='!'; xchr[@'42]:='"'; xchr[@'43]:='#'; xchr[@'44]:='$'; xchr[@'45]:='%'; xchr[@'46]:='&'; xchr[@'47]:='''';@/ xchr[@'50]:='('; xchr[@'51]:=')'; xchr[@'52]:='*'; xchr[@'53]:='+'; xchr[@'54]:=','; xchr[@'55]:='-'; xchr[@'56]:='.'; xchr[@'57]:='/';@/ xchr[@'60]:='0'; xchr[@'61]:='1'; xchr[@'62]:='2'; xchr[@'63]:='3'; xchr[@'64]:='4'; xchr[@'65]:='5'; xchr[@'66]:='6'; xchr[@'67]:='7';@/ xchr[@'70]:='8'; xchr[@'71]:='9'; xchr[@'72]:=':'; xchr[@'73]:=';'; xchr[@'74]:='<'; xchr[@'75]:='='; xchr[@'76]:='>'; xchr[@'77]:='?';@/ xchr[@'100]:='@@'; xchr[@'101]:='A'; xchr[@'102]:='B'; xchr[@'103]:='C'; xchr[@'104]:='D'; xchr[@'105]:='E'; xchr[@'106]:='F'; xchr[@'107]:='G';@/ xchr[@'110]:='H'; xchr[@'111]:='I'; xchr[@'112]:='J'; xchr[@'113]:='K'; xchr[@'114]:='L'; xchr[@'115]:='M'; xchr[@'116]:='N'; xchr[@'117]:='O';@/ xchr[@'120]:='P'; xchr[@'121]:='Q'; xchr[@'122]:='R'; xchr[@'123]:='S'; xchr[@'124]:='T'; xchr[@'125]:='U'; xchr[@'126]:='V'; xchr[@'127]:='W';@/ xchr[@'130]:='X'; xchr[@'131]:='Y'; xchr[@'132]:='Z'; xchr[@'133]:='['; xchr[@'134]:='\'; xchr[@'135]:=']'; xchr[@'136]:='^'; xchr[@'137]:='_';@/ xchr[@'140]:='`'; xchr[@'141]:='a'; xchr[@'142]:='b'; xchr[@'143]:='c'; xchr[@'144]:='d'; xchr[@'145]:='e'; xchr[@'146]:='f'; xchr[@'147]:='g';@/ xchr[@'150]:='h'; xchr[@'151]:='i'; xchr[@'152]:='j'; xchr[@'153]:='k'; xchr[@'154]:='l'; xchr[@'155]:='m'; xchr[@'156]:='n'; xchr[@'157]:='o';@/ xchr[@'160]:='p'; xchr[@'161]:='q'; xchr[@'162]:='r'; xchr[@'163]:='s'; xchr[@'164]:='t'; xchr[@'165]:='u'; xchr[@'166]:='v'; xchr[@'167]:='w';@/ xchr[@'170]:='x'; xchr[@'171]:='y'; xchr[@'172]:='z'; xchr[@'173]:='{'; xchr[@'174]:='|'; xchr[@'175]:='}'; xchr[@'176]:='~';@/ @ Some of the ASCII codes without visible characters have been given symbolic names in this program because they are used with a special meaning. @d null_code=@'0 {ASCII code that might disappear} @d carriage_return=@'15 {ASCII code used at end of line} @d invalid_code=@'177 {ASCII code that many systems prohibit in text files} @ The ASCII code is ``standard'' only to a certain extent, since many computer installations have found it advantageous to have ready access to more than 94 printing characters. Appendix~C of {\sl The \TeX book\/} gives a complete specification of the intended correspondence between characters and \TeX's internal representation. @:TeXbook}{\sl The \TeX book@> If \TeX\ is being used on a garden-variety \PASCAL\ for which only standard ASCII codes will appear in the input and output files, it doesn't really matter what codes are specified in |xchr[0..@'37]|, but the safest policy is to blank everything out by using the code shown below. However, other settings of |xchr| will make \TeX\ more friendly on computers that have an extended character set, so that users can type things like `\.^^Z' instead of `\.{\\ne}'. People with extended character sets can assign codes arbitrarily, giving an |xchr| equivalent to whatever characters the users of \TeX\ are allowed to have in their input files. It is best to make the codes correspond to the intended interpretations as shown in Appendix~C whenever possible; but this is not necessary. For example, in countries with an alphabet of more than 26 letters, it is usually best to map the additional letters into codes less than~@'40. To get the most ``permissive'' character set, change |' '| on the right of these assignment statements to |chr(i)|. @^character set dependencies@> @^system dependencies@> @= for i:=0 to @'37 do xchr[i]:=' '; for i:=@'177 to @'377 do xchr[i]:=' '; @ The following system-independent code makes the |xord| array contain a suitable inverse to the information in |xchr|. Note that if |xchr[i]=xchr[j]| where |i= for i:=first_text_char to last_text_char do xord[chr(i)]:=invalid_code; for i:=@'200 to @'377 do xord[xchr[i]]:=i; for i:=0 to @'176 do xord[xchr[i]]:=i; @* \[3] Input and output. The bane of portability is the fact that different operating systems treat input and output quite differently, perhaps because computer scientists have not given sufficient attention to this problem. People have felt somehow that input and output are not part of ``real'' programming. Well, it is true that some kinds of programming are more fun than others. With existing input/output conventions being so diverse and so messy, the only sources of joy in such parts of the code are the rare occasions when one can find a way to make the program a little less bad than it might have been. We have two choices, either to attack I/O now and get it over with, or to postpone I/O until near the end. Neither prospect is very attractive, so let's get it over with. The basic operations we need to do are (1)~inputting and outputting of text, to or from a file or the user's terminal; (2)~inputting and outputting of eight-bit bytes, to or from a file; (3)~instructing the operating system to initiate (``open'') or to terminate (``close'') input or output from a specified file; (4)~testing whether the end of an input file has been reached. \TeX\ needs to deal with two kinds of files. We shall use the term |alpha_file| for a file that contains textual data, and the term |byte_file| for a file that contains eight-bit binary information. These two types turn out to be the same on many computers, but sometimes there is a significant distinction, so we shall be careful to distinguish between them. Standard protocols for transferring such files from computer to computer, via high-speed networks, are now becoming available to more and more communities of users. The program actually makes use also of a third kind of file, called a |word_file|, when dumping and reloading base information for its own initialization. We shall define a word file later; but it will be possible for us to specify simple operations on word files before they are defined. @= @!eight_bits=0..255; {unsigned one-byte quantity} @!alpha_file=packed file of text_char; {files that contain textual data} @!byte_file=packed file of eight_bits; {files that contain binary data} @ Most of what we need to do with respect to input and output can be handled by the I/O facilities that are standard in \PASCAL, i.e., the routines called |get|, |put|, |eof|, and so on. But standard \PASCAL\ does not allow file variables to be associated with file names that are determined at run time, so it cannot be used to implement \TeX; some sort of extension to \PASCAL's ordinary |reset| and |rewrite| is crucial for our purposes. We shall assume that |name_of_file| is a variable of an appropriate type such that the \PASCAL\ run-time system being used to implement \TeX\ can open a file whose external name is specified by |name_of_file|. @^system dependencies@> @= @!name_of_file:packed array[1..file_name_size] of char;@;@/ {on some systems this may be a \&{record} variable} @!name_length:0..file_name_size;@/{this many characters are actually relevant in |name_of_file| (the rest are blank)} @ The \ph\ compiler with which the present version of \TeX\ was prepared has extended the rules of \PASCAL\ in a very convenient way. To open file~|f|, we can write $$\vbox{\halign{#\hfil\qquad&#\hfil\cr |reset(f,@t\\{name}@>,'/O')|&for input;\cr |rewrite(f,@t\\{name}@>,'/O')|&for output.\cr}}$$ The `\\{name}' parameter, which is of type `{\bf packed array $[\langle\\{any}\rangle]$ of \\{char}}', stands for the name of the external file that is being opened for input or output. Blank spaces that might appear in \\{name} are ignored. The `\.{/O}' parameter tells the operating system not to issue its own error messages if something goes wrong. If a file of the specified name cannot be found, or if such a file cannot be opened for some other reason (e.g., someone may already be trying to write the same file), we will have |@!erstat(f)<>0| after an unsuccessful |reset| or |rewrite|. This allows \TeX\ to undertake appropriate corrective action. @:PASCAL H}{\ph@> @^system dependencies@> \TeX's file-opening procedures return |false| if no file identified by |name_of_file| could be opened. @d reset_OK(#)==erstat(#)=0 @d rewrite_OK(#)==erstat(#)=0 @p function a_open_in(var f:alpha_file):boolean; {open a text file for input} begin reset(f,name_of_file,'/O'); a_open_in:=reset_OK(f); end; @# function a_open_out(var f:alpha_file):boolean; {open a text file for output} begin rewrite(f,name_of_file,'/O'); a_open_out:=rewrite_OK(f); end; @# function b_open_in(var f:byte_file):boolean; {open a binary file for input} begin reset(f,name_of_file,'/O'); b_open_in:=reset_OK(f); end; @# function b_open_out(var f:byte_file):boolean; {open a binary file for output} begin rewrite(f,name_of_file,'/O'); b_open_out:=rewrite_OK(f); end; @# function w_open_in(var f:word_file):boolean; {open a word file for input} begin reset(f,name_of_file,'/O'); w_open_in:=reset_OK(f); end; @# function w_open_out(var f:word_file):boolean; {open a word file for output} begin rewrite(f,name_of_file,'/O'); w_open_out:=rewrite_OK(f); end; @ Files can be closed with the \ph\ routine `|close(f)|', which @^system dependencies@> should be used when all input or output with respect to |f| has been completed. This makes |f| available to be opened again, if desired; and if |f| was used for output, the |close| operation makes the corresponding external file appear on the user's area, ready to be read. These procedures should not generate error messages if a file is being closed before it has been successfully opened. @p procedure a_close(var f:alpha_file); {close a text file} begin close(f); end; @# procedure b_close(var f:byte_file); {close a binary file} begin close(f); end; @# procedure w_close(var f:word_file); {close a word file} begin close(f); end; @ Binary input and output are done with \PASCAL's ordinary |get| and |put| procedures, so we don't have to make any other special arrangements for binary~I/O. Text output is also easy to do with standard \PASCAL\ routines. The treatment of text input is more difficult, however, because of the necessary translation to |ASCII_code| values. \TeX's conventions should be efficient, and they should blend nicely with the user's operating environment. @ Input from text files is read one line at a time, using a routine called |input_ln|. This function is defined in terms of global variables called |buffer|, |first|, and |last| that will be described in detail later; for now, it suffices for us to know that |buffer| is an array of |ASCII_code| values, and that |first| and |last| are indices into this array representing the beginning and ending of a line of text. @= @!buffer:array[0..buf_size] of ASCII_code; {lines of characters being read} @!first:0..buf_size; {the first unused position in |buffer|} @!last:0..buf_size; {end of the line just input to |buffer|} @!max_buf_stack:0..buf_size; {largest index used in |buffer|} @ The |input_ln| function brings the next line of input from the specified file into available positions of the buffer array and returns the value |true|, unless the file has already been entirely read, in which case it returns |false| and sets |last:=first|. In general, the |ASCII_code| numbers that represent the next line of the file are input into |buffer[first]|, |buffer[first+1]|, \dots, |buffer[last-1]|; and the global variable |last| is set equal to |first| plus the length of the line. Trailing blanks are removed from the line; thus, either |last=first| (in which case the line was entirely blank) or |buffer[last-1]<>" "|. An overflow error is given, however, if the normal actions of |input_ln| would make |last>=buf_size|; this is done so that other parts of \TeX\ can safely look at the contents of |buffer[last+1]| without overstepping the bounds of the |buffer| array. Upon entry to |input_ln|, the condition |first @p function input_ln(var f:alpha_file;@!bypass_eoln:boolean):boolean; {inputs the next line or returns |false|} var last_nonblank:0..buf_size; {|last| with trailing blanks removed} begin if bypass_eoln then if not eof(f) then get(f); {input the first character of the line into |f^|} last:=first; {cf.\ Matthew 19\thinspace:\thinspace30} if eof(f) then input_ln:=false else begin last_nonblank:=first; while not eoln(f) do begin if last>=max_buf_stack then begin max_buf_stack:=last+1; if max_buf_stack=buf_size then @; end; buffer[last]:=xord[f^]; get(f); incr(last); if buffer[last-1]<>" " then last_nonblank:=last; end; last:=last_nonblank; input_ln:=true; end; end; @ The user's terminal acts essentially like other files of text, except that it is used both for input and for output. When the terminal is considered an input file, the file variable is called |term_in|, and when it is considered an output file the file variable is |term_out|. @^system dependencies@> @= @!term_in:alpha_file; {the terminal as an input file} @!term_out:alpha_file; {the terminal as an output file} @ Here is how to open the terminal files in \ph. The `\.{/I}' switch suppresses the first |get|. @^system dependencies@> @d t_open_in==reset(term_in,'TTY:','/O/I') {open the terminal for text input} @d t_open_out==rewrite(term_out,'TTY:','/O') {open the terminal for text output} @ Sometimes it is necessary to synchronize the input/output mixture that happens on the user's terminal, and three system-dependent procedures are used for this purpose. The first of these, |update_terminal|, is called when we want to make sure that everything we have output to the terminal so far has actually left the computer's internal buffers and been sent. The second, |clear_terminal|, is called when we wish to cancel any input that the user may have typed ahead (since we are about to issue an unexpected error message). The third, |wake_up_terminal|, is supposed to revive the terminal if the user has disabled it by some instruction to the operating system. The following macros show how these operations can be specified in \ph: @^system dependencies@> @d update_terminal == break(term_out) {empty the terminal output buffer} @d clear_terminal == break_in(term_in,true) {clear the terminal input buffer} @d wake_up_terminal == do_nothing {cancel the user's cancellation of output} @ We need a special routine to read the first line of \TeX\ input from the user's terminal. This line is different because it is read before we have opened the transcript file; there is sort of a ``chicken and egg'' problem here. If the user types `\.{\\input paper}' on the first line, or if some macro invoked by that line does such an \.{\\input}, the transcript file will be named `\.{paper.log}'; but if no \.{\\input} commands are performed during the first line of terminal input, the transcript file will acquire its default name `\.{texput.log}'. (The transcript file will not contain error messages generated by the first line before the first \.{\\input} command.) @.texput@> The first line is even more special if we are lucky enough to have an operating system that treats \TeX\ differently from a run-of-the-mill \PASCAL\ object program. It's nice to let the user start running a \TeX\ job by typing a command line like `\.{tex paper}'; in such a case, \TeX\ will operate as if the first line of input were `\.{paper}', i.e., the first line will consist of the remainder of the command line, after the part that invoked \TeX. The first line is special also because it may be read before \TeX\ has input a format file. In such cases, normal error messages cannot yet be given. The following code uses concepts that will be explained later. (If the \PASCAL\ compiler does not support non-local |@!goto|, the @^system dependencies@> statement `|goto final_end|' should be replaced by something that quietly terminates the program.) @= if format_ident=0 then begin write_ln(term_out,'Buffer size exceeded!'); goto final_end; @.Buffer size exceeded@> end else begin cur_input.loc_field:=first; cur_input.limit_field:=last-1; overflow("buffer size",buf_size); @:TeX capacity exceeded buffer size}{\quad buffer size@> end @ Different systems have different ways to get started. But regardless of what conventions are adopted, the routine that initializes the terminal should satisfy the following specifications: \yskip\textindent{1)}It should open file |term_in| for input from the terminal. (The file |term_out| will already be open for output to the terminal.) \textindent{2)}If the user has given a command line, this line should be considered the first line of terminal input. Otherwise the user should be prompted with `\.{**}', and the first line of input should be whatever is typed in response. \textindent{3)}The first line of input, which might or might not be a command line, should appear in locations |first| to |last-1| of the |buffer| array. \textindent{4)}The global variable |loc| should be set so that the character to be read next by \TeX\ is in |buffer[loc]|. This character should not be blank, and we should have |loc @p function init_terminal:boolean; {gets the terminal input started} label exit; begin t_open_in; loop@+begin wake_up_terminal; write(term_out,'**'); update_terminal; @.**@> if not input_ln(term_in,true) then {this shouldn't happen} begin write_ln(term_out); write(term_out,'! End of file on the terminal... why?'); @.End of file on the terminal@> init_terminal:=false; return; end; loc:=first; while (loc which converts single-character strings into the ASCII code number of the single character involved, while it converts other strings into integers and builds a string pool file. Thus, when the string constant \.{"."} appears in the program below, \.{WEB} converts it into the integer 46, which is the ASCII code for a period, while \.{WEB} will convert a string like \.{"hello"} into some integer greater than~255. String number 46 will presumably be the single character `\..'; but some ASCII codes have no standard visible representation, and \TeX\ sometimes needs to be able to print an arbitrary ASCII character, so the first 256 strings are used to specify exactly what should be printed for each of the 256 possibilities. Elements of the |str_pool| array must be ASCII codes that can actually be printed; i.e., they must have an |xchr| equivalent in the local character set. (However, the names of control sequences need not meet this restriction, when they appear in |str_pool|.) Some \PASCAL\ compilers won't pack integers into a single byte unless the integers lie in the range |-128..127|. To accommodate such systems we access the string pool only via macros that can easily be redefined. @d si(#) == # {convert from |ASCII_code| to |packed_ASCII_code|} @d so(#) == # {convert from |packed_ASCII_code| to |ASCII_code|} @= @!pool_pointer = 0..pool_size; {for variables that point into |str_pool|} @!str_number = 0..max_strings; {for variables that point into |str_start|} @!packed_ASCII_code = 0..255; {elements of |str_pool| array} @ @= @!str_pool:packed array[pool_pointer] of packed_ASCII_code; {the characters} @!str_start : array[str_number] of pool_pointer; {the starting pointers} @!pool_ptr : pool_pointer; {first unused position in |str_pool|} @!str_ptr : str_number; {number of the current string being created} @!init_pool_ptr : pool_pointer; {the starting value of |pool_ptr|} @!init_str_ptr : str_number; {the starting value of |str_ptr|} @ Several of the elementary string operations are performed using \.{WEB} macros instead of \PASCAL\ procedures, because many of the operations are done quite frequently and we want to avoid the overhead of procedure calls. For example, here is a simple macro that computes the length of a string. @.WEB@> @d length(#)==(str_start[#+1]-str_start[#]) {the number of characters in string number \#} @ The length of the current string is called |cur_length|: @d cur_length == (pool_ptr - str_start[str_ptr]) @ Strings are created by appending character codes to |str_pool|. The |append_char| macro, defined here, does not check to see if the value of |pool_ptr| has gotten too high; this test is supposed to be made before |append_char| is used. There is also a |flush_char| macro, which erases the last character appended. To test if there is room to append |l| more characters to |str_pool|, we shall write |str_room(l)|, which aborts \TeX\ and gives an apologetic error message if there isn't enough room. @d append_char(#) == {put |ASCII_code| \# at the end of |str_pool|} begin str_pool[pool_ptr]:=si(#); incr(pool_ptr); end @d flush_char == decr(pool_ptr) {forget the last character in the pool} @d str_room(#) == {make sure that the pool hasn't overflowed} begin if pool_ptr+# > pool_size then overflow("pool size",pool_size-init_pool_ptr); @:TeX capacity exceeded pool size}{\quad pool size@> end @ Once a sequence of characters has been appended to |str_pool|, it officially becomes a string when the function |make_string| is called. This function returns the identification number of the new string as its value. @p function make_string : str_number; {current string enters the pool} begin if str_ptr=max_strings then overflow("number of strings",max_strings-init_str_ptr); @:TeX capacity exceeded number of strings}{\quad number of strings@> incr(str_ptr); str_start[str_ptr]:=pool_ptr; make_string:=str_ptr-1; end; @ To destroy the most recently made string, we say |flush_string|. @d flush_string==begin decr(str_ptr); pool_ptr:=str_start[str_ptr]; end @ The following subroutine compares string |s| with another string of the same length that appears in |buffer| starting at position |k|; the result is |true| if and only if the strings are equal. Empirical tests indicate that |str_eq_buf| is used in such a way that it tends to return |true| about 80 percent of the time. @p function str_eq_buf(@!s:str_number;@!k:integer):boolean; {test equality of strings} label not_found; {loop exit} var j: pool_pointer; {running index} @!result: boolean; {result of comparison} begin j:=str_start[s]; while jbuffer[k] then begin result:=false; goto not_found; end; incr(j); incr(k); end; result:=true; not_found: str_eq_buf:=result; end; @ Here is a similar routine, but it compares two strings in the string pool, and it does not assume that they have the same length. @p function str_eq_str(@!s,@!t:str_number):boolean; {test equality of strings} label not_found; {loop exit} var j,@!k: pool_pointer; {running indices} @!result: boolean; {result of comparison} begin result:=false; if length(s)<>length(t) then goto not_found; j:=str_start[s]; k:=str_start[t]; while jstr_pool[k] then goto not_found; incr(j); incr(k); end; result:=true; not_found: str_eq_str:=result; end; @ The initial values of |str_pool|, |str_start|, |pool_ptr|, and |str_ptr| are computed by the \.{INITEX} program, based in part on the information that \.{WEB} has output while processing \TeX. @.INITEX@> @^string pool@> @p @!init function get_strings_started:boolean; {initializes the string pool, but returns |false| if something goes wrong} label done,exit; var k,@!l:0..255; {small indices or counters} @!m,@!n:text_char; {characters input from |pool_file|} @!g:str_number; {garbage} @!a:integer; {accumulator for check sum} @!c:boolean; {check sum has been checked} begin pool_ptr:=0; str_ptr:=0; str_start[0]:=0; @; @; exit:end; tini @ @d app_lc_hex(#)==l:=#; if l<10 then append_char(l+"0")@+else append_char(l-10+"a") @= for k:=0 to 255 do begin if (@) then begin append_char("^"); append_char("^"); if k<@'100 then append_char(k+@'100) else if k<@'200 then append_char(k-@'100) else begin app_lc_hex(k div 16); app_lc_hex(k mod 16); end; end else append_char(k); g:=make_string; end @ The first 128 strings will contain 95 standard ASCII characters, and the other 33 characters will be printed in three-symbol form like `\.{\^\^A}' unless a system-dependent change is made here. Installations that have an extended character set, where for example |xchr[@'32]=@t\.{\'^^Z\'}@>|, would like string @'32 to be the single character @'32 instead of the three characters @'136, @'136, @'132 (\.{\^\^Z}). On the other hand, even people with an extended character set will want to represent string @'15 by \.{\^\^M}, since @'15 is |carriage_return|; the idea is to produce visible strings instead of tabs or line-feeds or carriage-returns or bell-rings or characters that are treated anomalously in text files. Unprintable characters of codes 128--255 are, similarly, rendered \.{\^\^80}--\.{\^\^ff}. The boolean expression defined here should be |true| unless \TeX\ internal code number~|k| corresponds to a non-troublesome visible symbol in the local character set. An appropriate formula for the extended character set recommended in {\sl The \TeX book\/} would, for example, be `|k in [0,@'10..@'12,@'14,@'15,@'33,@'177..@'377]|'. If character |k| cannot be printed, and |k<@'200|, then character |k+@'100| or |k-@'100| must be printable; moreover, ASCII codes |[@'41..@'46, @'60..@'71, @'141..@'146, @'160..@'171]| must be printable. Thus, at least 80 printable characters are needed. @:TeXbook}{\sl The \TeX book@> @^character set dependencies@> @^system dependencies@> @= (k<" ")or(k>"~") @ When the \.{WEB} system program called \.{TANGLE} processes the \.{TEX.WEB} description that you are now reading, it outputs the \PASCAL\ program \.{TEX.PAS} and also a string pool file called \.{TEX.POOL}. The \.{INITEX} @.WEB@>@.INITEX@> program reads the latter file, where each string appears as a two-digit decimal length followed by the string itself, and the information is recorded in \TeX's string memory. @= @!init @!pool_file:alpha_file; {the string-pool file output by \.{TANGLE}} tini @ @d bad_pool(#)==begin wake_up_terminal; write_ln(term_out,#); a_close(pool_file); get_strings_started:=false; return; end @= name_of_file:=pool_name; {we needn't set |name_length|} if a_open_in(pool_file) then begin c:=false; repeat @; until c; a_close(pool_file); get_strings_started:=true; end else bad_pool('! I can''t read TEX.POOL.') @.I can't read TEX.POOL@> @ @= begin if eof(pool_file) then bad_pool('! TEX.POOL has no check sum.'); @.TEX.POOL has no check sum@> read(pool_file,m,n); {read two digits of string length} if m='*' then @ else begin if (xord[m]<"0")or(xord[m]>"9")or@| (xord[n]<"0")or(xord[n]>"9") then bad_pool('! TEX.POOL line doesn''t begin with two digits.'); @.TEX.POOL line doesn't...@> l:=xord[m]*10+xord[n]-"0"*11; {compute the length} if pool_ptr+l+string_vacancies>pool_size then bad_pool('! You have to increase POOLSIZE.'); @.You have to increase POOLSIZE@> for k:=1 to l do begin if eoln(pool_file) then m:=' '@+else read(pool_file,m); append_char(xord[m]); end; read_ln(pool_file); g:=make_string; end; end @ The \.{WEB} operation \.{@@\$} denotes the value that should be at the end of this \.{TEX.POOL} file; any other value means that the wrong pool file has been loaded. @^check sum@> @= begin a:=0; k:=1; loop@+ begin if (xord[n]<"0")or(xord[n]>"9") then bad_pool('! TEX.POOL check sum doesn''t have nine digits.'); @.TEX.POOL check sum...@> a:=10*a+xord[n]-"0"; if k=9 then goto done; incr(k); read(pool_file,n); end; done: if a<>@$ then bad_pool('! TEX.POOL doesn''t match; TANGLE me again.'); @.TEX.POOL doesn't match@> c:=true; end @* \[5] On-line and off-line printing. Messages that are sent to a user's terminal and to the transcript-log file are produced by several `|print|' procedures. These procedures will direct their output to a variety of places, based on the setting of the global variable |selector|, which has the following possible values: \yskip \hang |term_and_log|, the normal setting, prints on the terminal and on the transcript file. \hang |log_only|, prints only on the transcript file. \hang |term_only|, prints only on the terminal. \hang |no_print|, doesn't print at all. This is used only in rare cases before the transcript file is open. \hang |pseudo|, puts output into a cyclic buffer that is used by the |show_context| routine; when we get to that routine we shall discuss the reasoning behind this curious mode. \hang |new_string|, appends the output to the current string in the string pool. \hang 0 to 15, prints on one of the sixteen files for \.{\\write} output. \yskip \noindent The symbolic names `|term_and_log|', etc., have been assigned numeric codes that satisfy the convenient relations |no_print+1=term_only|, |no_print+2=log_only|, |term_only+2=log_only+1=term_and_log|. Three additional global variables, |tally| and |term_offset| and |file_offset|, record the number of characters that have been printed since they were most recently cleared to zero. We use |tally| to record the length of (possibly very long) stretches of printing; |term_offset| and |file_offset|, on the other hand, keep track of how many characters have appeared so far on the current line that has been output to the terminal or to the transcript file, respectively. @d no_print=16 {|selector| setting that makes data disappear} @d term_only=17 {printing is destined for the terminal only} @d log_only=18 {printing is destined for the transcript file only} @d term_and_log=19 {normal |selector| setting} @d pseudo=20 {special |selector| setting for |show_context|} @d new_string=21 {printing is deflected to the string pool} @d max_selector=21 {highest selector setting} @= @!log_file : alpha_file; {transcript of \TeX\ session} @!selector : 0..max_selector; {where to print a message} @!dig : array[0..22] of 0..15; {digits in a number being output} @!tally : integer; {the number of characters recently printed} @!term_offset : 0..max_print_line; {the number of characters on the current terminal line} @!file_offset : 0..max_print_line; {the number of characters on the current file line} @!trick_buf:array[0..error_line] of ASCII_code; {circular buffer for pseudoprinting} @!trick_count: integer; {threshold for pseudoprinting, explained later} @!first_count: integer; {another variable for pseudoprinting} @ @= selector:=term_only; tally:=0; term_offset:=0; file_offset:=0; @ Macro abbreviations for output to the terminal and to the log file are defined here for convenience. Some systems need special conventions for terminal output, and it is possible to adhere to those conventions by changing |wterm|, |wterm_ln|, and |wterm_cr| in this section. @^system dependencies@> @d wterm(#)==write(term_out,#) @d wterm_ln(#)==write_ln(term_out,#) @d wterm_cr==write_ln(term_out) @d wlog(#)==write(log_file,#) @d wlog_ln(#)==write_ln(log_file,#) @d wlog_cr==write_ln(log_file) @ To end a line of text output, we call |print_ln|. @= procedure print_ln; {prints an end-of-line} begin case selector of term_and_log: begin wterm_cr; wlog_cr; term_offset:=0; file_offset:=0; end; log_only: begin wlog_cr; file_offset:=0; end; term_only: begin wterm_cr; term_offset:=0; end; no_print,pseudo,new_string: do_nothing; othercases write_ln(write_file[selector]) endcases;@/ end; {|tally| is not affected} @ The |print_char| procedure sends one character to the desired destination, using the |xchr| array to map it into an external character compatible with |input_ln|. All printing comes through |print_ln| or |print_char|. @= procedure print_char(@!s:ASCII_code); {prints a single character} label exit; begin if @ then if selector @= procedure print(@!s:integer); {prints string |s|} label exit; var j:pool_pointer; {current character code position} begin if s>=str_ptr then s:="???" {this can't happen} @.???@> else if s<256 then if s<0 then s:="???" {can't happen} else if (@) then if selector= procedure slow_print(@!s:integer); {prints string |s|} label exit; var j:pool_pointer; {current character code position} begin if s>=str_ptr then s:="???" {this can't happen} @.???@> else if s<256 then if s<0 then s:="???" {can't happen} else if (@) then if selector= wterm(banner); if format_ident=0 then wterm_ln(' (no format preloaded)') else begin print(format_ident); print_ln; end; update_terminal; @ The procedure |print_nl| is like |print|, but it makes sure that the string appears at the beginning of a new line. @= procedure print_nl(@!s:str_number); {prints string |s| at beginning of line} begin if ((term_offset>0)and(odd(selector)))or@| ((file_offset>0)and(selector>=log_only)) then print_ln; print(s); end; @ The procedure |print_esc| prints a string that is preceded by the user's escape character (which is usually a backslash). @= procedure print_esc(@!s:str_number); {prints escape character, then |s|} var c:integer; {the escape character code} begin @; if c>=0 then if c<256 then print(c); print(s); end; @ An array of digits in the range |0..15| is printed by |print_the_digs|. @= procedure print_the_digs(@!k:eight_bits); {prints |dig[k-1]|$\,\ldots\,$|dig[0]|} begin while k>0 do begin decr(k); if dig[k]<10 then print_char("0"+dig[k]) else print_char("A"-10+dig[k]); end; end; @ The following procedure, which prints out the decimal representation of a given integer |n|, has been written carefully so that it works properly if |n=0| or if |(-n)| would cause overflow. It does not apply |mod| or |div| to negative arguments, since such operations are not implemented consistently by all \PASCAL\ compilers. @= procedure print_int(@!n:integer); {prints an integer in decimal form} var k:0..23; {index to current digit; we assume that $|n|<10^{23}$} @!m:integer; {used to negate |n| in possibly dangerous cases} begin k:=0; if n<0 then begin print_char("-"); if n>-100000000 then negate(n) else begin m:=-1-n; n:=m div 10; m:=(m mod 10)+1; k:=1; if m<10 then dig[0]:=m else begin dig[0]:=0; incr(n); end; end; end; repeat dig[k]:=n mod 10; n:=n div 10; incr(k); until n=0; print_the_digs(k); end; @ Here is a trivial procedure to print two digits; it is usually called with a parameter in the range |0<=n<=99|. @p procedure print_two(@!n:integer); {prints two least significant digits} begin n:=abs(n) mod 100; print_char("0"+(n div 10)); print_char("0"+(n mod 10)); end; @ Hexadecimal printing of nonnegative integers is accomplished by |print_hex|. @p procedure print_hex(@!n:integer); {prints a positive integer in hexadecimal form} var k:0..22; {index to current digit; we assume that $0\L n<16^{22}$} begin k:=0; print_char(""""); repeat dig[k]:=n mod 16; n:=n div 16; incr(k); until n=0; print_the_digs(k); end; @ Old versions of \TeX\ needed a procedure called |print_ASCII| whose function is now subsumed by |print|. We retain the old name here as a possible aid to future software arch\ae ologists. @d print_ASCII == print @ Roman numerals are produced by the |print_roman_int| routine. Readers who like puzzles might enjoy trying to figure out how this tricky code works; therefore no explanation will be given. Notice that 1990 yields \.{mcmxc}, not \.{mxm}. @p procedure print_roman_int(@!n:integer); label exit; var j,@!k: pool_pointer; {mysterious indices into |str_pool|} @!u,@!v: nonnegative_integer; {mysterious numbers} begin j:=str_start["m2d5c2l5x2v5i"]; v:=1000; loop@+ begin while n>=v do begin print_char(so(str_pool[j])); n:=n-v; end; if n<=0 then return; {nonpositive input produces no output} k:=j+2; u:=v div (so(str_pool[k-1])-"0"); if str_pool[k-1]=si("2") then begin k:=k+2; u:=u div (so(str_pool[k-1])-"0"); end; if n+u>=v then begin print_char(so(str_pool[k])); n:=n+u; end else begin j:=j+2; v:=v div (so(str_pool[j-1])-"0"); end; end; exit:end; @ The |print| subroutine will not print a string that is still being created. The following procedure will. @p procedure print_current_string; {prints a yet-unmade string} var j:pool_pointer; {points to current character code} begin j:=str_start[str_ptr]; while j term_offset:=0; {the user's line ended with \<\rm return>} decr(selector); {prepare to echo the input} if last<>first then for k:=first to last-1 do print(buffer[k]); print_ln; incr(selector); {restore previous status} end; @* \[6] Reporting errors. When something anomalous is detected, \TeX\ typically does something like this: $$\vbox{\halign{#\hfil\cr |print_err("Something anomalous has been detected");|\cr |help3("This is the first line of my offer to help.")|\cr |("This is the second line. I'm trying to")|\cr |("explain the best way for you to proceed.");|\cr |error;|\cr}}$$ A two-line help message would be given using |help2|, etc.; these informal helps should use simple vocabulary that complements the words used in the official error message that was printed. (Outside the U.S.A., the help messages should preferably be translated into the local vernacular. Each line of help is at most 60 characters long, in the present implementation, so that |max_print_line| will not be exceeded.) The |print_err| procedure supplies a `\.!' before the official message, and makes sure that the terminal is awake if a stop is going to occur. The |error| procedure supplies a `\..' after the official message, then it shows the location of the error; and if |interaction=error_stop_mode|, it also enters into a dialog with the user, during which time the help message may be printed. @^system dependencies@> @ The global variable |interaction| has four settings, representing increasing amounts of user interaction: @d batch_mode=0 {omits all stops and omits terminal output} @d nonstop_mode=1 {omits all stops} @d scroll_mode=2 {omits error stops} @d error_stop_mode=3 {stops at every opportunity to interact} @d print_err(#)==begin if interaction=error_stop_mode then wake_up_terminal; print_nl("! "); print(#); end @= @!interaction:batch_mode..error_stop_mode; {current level of interaction} @ @=interaction:=error_stop_mode; @ \TeX\ is careful not to call |error| when the print |selector| setting might be unusual. The only possible values of |selector| at the time of error messages are \yskip\hang|no_print| (when |interaction=batch_mode| and |log_file| not yet open); \hang|term_only| (when |interaction>batch_mode| and |log_file| not yet open); \hang|log_only| (when |interaction=batch_mode| and |log_file| is open); \hang|term_and_log| (when |interaction>batch_mode| and |log_file| is open). @= if interaction=batch_mode then selector:=no_print@+else selector:=term_only @ A global variable |deletions_allowed| is set |false| if the |get_next| routine is active when |error| is called; this ensures that |get_next| and related routines like |get_token| will never be called recursively. @^recursion@> The global variable |history| records the worst level of error that has been detected. It has four possible values: |spotless|, |warning_issued|, |error_message_issued|, and |fatal_error_stop|. Another global variable, |error_count|, is increased by one when an |error| occurs without an interactive dialog, and it is reset to zero at the end of every paragraph. If |error_count| reaches 100, \TeX\ decides that there is no point in continuing further. @d spotless=0 {|history| value when nothing has been amiss yet} @d warning_issued=1 {|history| value when |begin_diagnostic| has been called} @d error_message_issued=2 {|history| value when |error| has been called} @d fatal_error_stop=3 {|history| value when termination was premature} @= @!deletions_allowed:boolean; {is it safe for |error| to call |get_token|?} @!history:spotless..fatal_error_stop; {has the source input been clean so far?} @!error_count:-1..100; {the number of scrolled errors since the last paragraph ended} @ The value of |history| is initially |fatal_error_stop|, but it will be changed to |spotless| if \TeX\ survives the initialization process. @= deletions_allowed:=true; error_count:=0; {|history| is initialized elsewhere} @ Since errors can be detected almost anywhere in \TeX, we want to declare the error procedures near the beginning of the program. But the error procedures in turn use some other procedures, which need to be declared |forward| before we get to |error| itself. It is possible for |error| to be called recursively if some error arises when |get_token| is being used to delete a token, and/or if some fatal error occurs while \TeX\ is trying to fix a non-fatal one. But such recursion @^recursion@> is never more than two levels deep. @= procedure@?normalize_selector; forward;@t\2@>@/ procedure@?get_token; forward;@t\2@>@/ procedure@?term_input; forward;@t\2@>@/ procedure@?show_context; forward;@t\2@>@/ procedure@?begin_file_reading; forward;@t\2@>@/ procedure@?open_log_file; forward;@t\2@>@/ procedure@?close_files_and_terminate; forward;@t\2@>@/ procedure@?clear_for_error_prompt; forward;@t\2@>@/ procedure@?give_err_help; forward;@t\2@>@/ @t\4\hskip-\fontdimen2\font@>@;@+@!debug@+procedure@?debug_help; forward;@;@+gubed @ Individual lines of help are recorded in the array |help_line|, which contains entries in positions |0..(help_ptr-1)|. They should be printed in reverse order, i.e., with |help_line[0]| appearing last. @d hlp1(#)==help_line[0]:=#;@+end @d hlp2(#)==help_line[1]:=#; hlp1 @d hlp3(#)==help_line[2]:=#; hlp2 @d hlp4(#)==help_line[3]:=#; hlp3 @d hlp5(#)==help_line[4]:=#; hlp4 @d hlp6(#)==help_line[5]:=#; hlp5 @d help0==help_ptr:=0 {sometimes there might be no help} @d help1==@+begin help_ptr:=1; hlp1 {use this with one help line} @d help2==@+begin help_ptr:=2; hlp2 {use this with two help lines} @d help3==@+begin help_ptr:=3; hlp3 {use this with three help lines} @d help4==@+begin help_ptr:=4; hlp4 {use this with four help lines} @d help5==@+begin help_ptr:=5; hlp5 {use this with five help lines} @d help6==@+begin help_ptr:=6; hlp6 {use this with six help lines} @= @!help_line:array[0..5] of str_number; {helps for the next |error|} @!help_ptr:0..6; {the number of help lines present} @!use_err_help:boolean; {should the |err_help| list be shown?} @ @= help_ptr:=0; use_err_help:=false; @ The |jump_out| procedure just cuts across all active procedure levels and goes to |end_of_TEX|. This is the only nontrivial |@!goto| statement in the whole program. It is used when there is no recovery from a particular error. Some \PASCAL\ compilers do not implement non-local |goto| statements. @^system dependencies@> In such cases the body of |jump_out| should simply be `|close_files_and_terminate|;\thinspace' followed by a call on some system procedure that quietly terminates the program. @= procedure jump_out; begin goto end_of_TEX; end; @ Here now is the general |error| routine. @= procedure error; {completes the job of error reporting} label continue,exit; var c:ASCII_code; {what the user types} @!s1,@!s2,@!s3,@!s4:integer; {used to save global variables when deleting tokens} begin if history; incr(error_count); if error_count=100 then begin print_nl("(That makes 100 errors; please try again.)"); @.That makes 100 errors...@> history:=fatal_error_stop; jump_out; end; @; exit:end; @ @= loop@+begin continue: clear_for_error_prompt; prompt_input("? "); @.?\relax@> if last=first then return; c:=buffer[first]; if c>="a" then c:=c+"A"-"a"; {convert to uppercase} @; end @ It is desirable to provide an `\.E' option here that gives the user an easy way to return from \TeX\ to the system editor, with the offending line ready to be edited. But such an extension requires some system wizardry, so the present implementation simply types out the name of the file that should be edited and the relevant line number. @^system dependencies@> There is a secret `\.D' option available when the debugging routines haven't been commented~out. @^debugging@> @= case c of "0","1","2","3","4","5","6","7","8","9": if deletions_allowed then @; @t\4\4@>@;@+@!debug "D": begin debug_help; goto continue;@+end;@+gubed@/ "E": if base_ptr>0 then begin print_nl("You want to edit file "); @.You want to edit file x@> print(input_stack[base_ptr].name_field); print(" at line "); print_int(line); interaction:=scroll_mode; jump_out; end; "H": @; "I":@; "Q","R","S":@; "X":begin interaction:=scroll_mode; jump_out; end; othercases do_nothing endcases;@/ @ @ @= begin print("Type to proceed, S to scroll future error messages,");@/ @.Type to proceed...@> print_nl("R to run without stopping, Q to run quietly,");@/ print_nl("I to insert something, "); if base_ptr>0 then print("E to edit your file,"); if deletions_allowed then print_nl("1 or ... or 9 to ignore the next 1 to 9 tokens of input,"); print_nl("H for help, X to quit."); end @ Here the author of \TeX\ apologizes for making use of the numerical relation between |"Q"|, |"R"|, |"S"|, and the desired interaction settings |batch_mode|, |nonstop_mode|, |scroll_mode|. @^Knuth, Donald Ervin@> @= begin error_count:=0; interaction:=batch_mode+c-"Q"; print("OK, entering "); case c of "Q":begin print_esc("batchmode"); decr(selector); end; "R":print_esc("nonstopmode"); "S":print_esc("scrollmode"); end; {there are no other cases} print("..."); print_ln; update_terminal; return; end @ When the following code is executed, |buffer[(first+1)..(last-1)]| may contain the material inserted by the user; otherwise another prompt will be given. In order to understand this part of the program fully, you need to be familiar with \TeX's input stacks. @= begin begin_file_reading; {enter a new syntactic level for terminal input} {now |state=mid_line|, so an initial blank space will count as a blank} if last>first+1 then begin loc:=first+1; buffer[first]:=" "; end else begin prompt_input("insert>"); loc:=first; @.insert>@> end; first:=last; cur_input.limit_field:=last-1; {no |end_line_char| ends this line} return; end @ We allow deletion of up to 99 tokens at a time. @= begin s1:=cur_tok; s2:=cur_cmd; s3:=cur_chr; s4:=align_state; align_state:=1000000; OK_to_interrupt:=false; if (last>first+1) and (buffer[first+1]>="0")and(buffer[first+1]<="9") then c:=c*10+buffer[first+1]-"0"*11 else c:=c-"0"; while c>0 do begin get_token; {one-level recursive call of |error| is possible} decr(c); end; cur_tok:=s1; cur_cmd:=s2; cur_chr:=s3; align_state:=s4; OK_to_interrupt:=true; help2("I have just deleted some text, as you asked.")@/ ("You can now delete more, or insert, or whatever."); show_context; goto continue; end @ @= begin if use_err_help then begin give_err_help; use_err_help:=false; end else begin if help_ptr=0 then help2("Sorry, I don't know how to help in this situation.")@/ @t\kern1em@>("Maybe you should try asking a human?"); repeat decr(help_ptr); print(help_line[help_ptr]); print_ln; until help_ptr=0; end; help4("Sorry, I already gave what help I could...")@/ ("Maybe you should try asking a human?")@/ ("An error might have occurred before I noticed any problems.")@/ ("``If all else fails, read the instructions.''");@/ goto continue; end @ @= if interaction>batch_mode then decr(selector); {avoid terminal output} if use_err_help then begin print_ln; give_err_help; end else while help_ptr>0 do begin decr(help_ptr); print_nl(help_line[help_ptr]); end; print_ln; if interaction>batch_mode then incr(selector); {re-enable terminal output} print_ln @ A dozen or so error messages end with a parenthesized integer, so we save a teeny bit of program space by declaring the following procedure: @p procedure int_error(@!n:integer); begin print(" ("); print_int(n); print_char(")"); error; end; @ In anomalous cases, the print selector might be in an unknown state; the following subroutine is called to fix things just enough to keep running a bit longer. @p procedure normalize_selector; begin if log_opened then selector:=term_and_log else selector:=term_only; if job_name=0 then open_log_file; if interaction=batch_mode then decr(selector); end; @ The following procedure prints \TeX's last words before dying. @d succumb==begin if interaction=error_stop_mode then interaction:=scroll_mode; {no more interaction} if log_opened then error; @!debug if interaction>batch_mode then debug_help;@+gubed@;@/ history:=fatal_error_stop; jump_out; {irrecoverable error} end @= procedure fatal_error(@!s:str_number); {prints |s|, and that's it} begin normalize_selector;@/ print_err("Emergency stop"); help1(s); succumb; @.Emergency stop@> end; @ Here is the most dreaded error message. @= procedure overflow(@!s:str_number;@!n:integer); {stop due to finiteness} begin normalize_selector; print_err("TeX capacity exceeded, sorry ["); @.TeX capacity exceeded ...@> print(s); print_char("="); print_int(n); print_char("]"); help2("If you really absolutely need more capacity,")@/ ("you can ask a wizard to enlarge me."); succumb; end; @ The program might sometime run completely amok, at which point there is no choice but to stop. If no previous error has been detected, that's bad news; a message is printed that is really intended for the \TeX\ maintenance person instead of the user (unless the user has been particularly diabolical). The index entries for `this can't happen' may help to pinpoint the problem. @^dry rot@> @= procedure confusion(@!s:str_number); {consistency check violated; |s| tells where} begin normalize_selector; if history help1("I'm broken. Please show this to someone who can fix can fix"); end else begin print_err("I can't go on meeting you like this"); @.I can't go on...@> help2("One of your faux pas seems to have wounded me deeply...")@/ ("in fact, I'm barely conscious. Please fix it and try again."); end; succumb; end; @ Users occasionally want to interrupt \TeX\ while it's running. If the \PASCAL\ runtime system allows this, one can implement a routine that sets the global variable |interrupt| to some nonzero value when such an interrupt is signalled. Otherwise there is probably at least a way to make |interrupt| nonzero using the \PASCAL\ debugger. @^system dependencies@> @^debugging@> @d check_interrupt==begin if interrupt<>0 then pause_for_instructions; end @= @!interrupt:integer; {should \TeX\ pause for instructions?} @!OK_to_interrupt:boolean; {should interrupts be observed?} @ @= interrupt:=0; OK_to_interrupt:=true; @ When an interrupt has been detected, the program goes into its highest interaction level and lets the user have nearly the full flexibility of the |error| routine. \TeX\ checks for interrupts only at times when it is safe to do this. @p procedure pause_for_instructions; begin if OK_to_interrupt then begin interaction:=error_stop_mode; if (selector=log_only)or(selector=no_print) then incr(selector); print_err("Interruption"); @.Interruption@> help3("You rang?")@/ ("Try to insert some instructions for me (e.g.,`I\showlists'),")@/ ("unless you just want to quit by typing `X'."); deletions_allowed:=false; error; deletions_allowed:=true; interrupt:=0; end; end; @* \[7] Arithmetic with scaled dimensions. The principal computations performed by \TeX\ are done entirely in terms of integers less than $2^{31}$ in magnitude; and divisions are done only when both dividend and divisor are nonnegative. Thus, the arithmetic specified in this program can be carried out in exactly the same way on a wide variety of computers, including some small ones. Why? Because the arithmetic calculations need to be spelled out precisely in order to guarantee that \TeX\ will produce identical output on different machines. If some quantities were rounded differently in different implementations, we would find that line breaks and even page breaks might occur in different places. Hence the arithmetic of \TeX\ has been designed with care, and systems that claim to be implementations of \TeX82 should follow precisely the @:TeX82}{\TeX82@> calculations as they appear in the present program. (Actually there are three places where \TeX\ uses |div| with a possibly negative numerator. These are harmless; see |div| in the index. Also if the user sets the \.{\\time} or the \.{\\year} to a negative value, some diagnostic information will involve negative-numerator division. The same remarks apply for |mod| as well as for |div|.) @ Here is a routine that calculates half of an integer, using an unambiguous convention with respect to signed odd numbers. @p function half(@!x:integer):integer; begin if odd(x) then half:=(x+1) div 2 else half:=x @!div 2; end; @ Fixed-point arithmetic is done on {\sl scaled integers\/} that are multiples of $2^{-16}$. In other words, a binary point is assumed to be sixteen bit positions from the right end of a binary computer word. @d unity == @'200000 {$2^{16}$, represents 1.00000} @d two == @'400000 {$2^{17}$, represents 2.00000} @= @!scaled = integer; {this type is used for scaled integers} @!nonnegative_integer=0..@'17777777777; {$0\L x<2^{31}$} @!small_number=0..63; {this type is self-explanatory} @ The following function is used to create a scaled integer from a given decimal fraction $(.d_0d_1\ldots d_{k-1})$, where |0<=k<=17|. The digit $d_i$ is given in |dig[i]|, and the calculation produces a correctly rounded result. @p function round_decimals(@!k:small_number) : scaled; {converts a decimal fraction} var a:integer; {the accumulator} begin a:=0; while k>0 do begin decr(k); a:=(a+dig[k]*two) div 10; end; round_decimals:=(a+1) div 2; end; @ Conversely, here is a procedure analogous to |print_int|. If the output of this procedure is subsequently read by \TeX\ and converted by the |round_decimals| routine above, it turns out that the original value will be reproduced exactly; the ``simplest'' such decimal number is output, but there is always at least one digit following the decimal point. The invariant relation in the \&{repeat} loop is that a sequence of decimal digits yet to be printed will yield the original number if and only if they form a fraction~$f$ in the range $s-\delta\L10\cdot2^{16}funity then s:=s+@'100000-50000; {round the last digit} print_char("0"+(s div unity)); s:=10*(s mod unity); delta:=delta*10; until s<=delta; end; @ Physical sizes that a \TeX\ user specifies for portions of documents are represented internally as scaled points. Thus, if we define an `sp' (scaled @^sp@> point) as a unit equal to $2^{-16}$ printer's points, every dimension inside of \TeX\ is an integer number of sp. There are exactly 4,736,286.72 sp per inch. Users are not allowed to specify dimensions larger than $2^{30}-1$ sp, which is a distance of about 18.892 feet (5.7583 meters); two such quantities can be added without overflow on a 32-bit computer. The present implementation of \TeX\ does not check for overflow when @^Overflow in arithmetic@> dimensions are added or subtracted. This could be done by inserting a few dozen tests of the form `\ignorespaces|if x>=@'10000000000 then @t\\{report\_overflow}@>|', but the chance of overflow is so remote that such tests do not seem worthwhile. \TeX\ needs to do only a few arithmetic operations on scaled quantities, other than addition and subtraction, and the following subroutines do most of the work. A single computation might use several subroutine calls, and it is desirable to avoid producing multiple error messages in case of arithmetic overflow; so the routines set the global variable |arith_error| to |true| instead of reporting errors directly to the user. Another global variable, |remainder|, holds the remainder after a division. @= @!arith_error:boolean; {has arithmetic overflow occurred recently?} @!remainder:scaled; {amount subtracted to get an exact division} @ The first arithmetical subroutine we need computes $nx+y$, where |x| and~|y| are |scaled| and |n| is an integer. We will also use it to multiply integers. @d nx_plus_y(#)==mult_and_add(#,@'7777777777) @d mult_integers(#)==mult_and_add(#,0,@'17777777777) @p function mult_and_add(@!n:integer;@!x,@!y,@!max_answer:scaled):scaled; begin if n<0 then begin negate(x); negate(n); end; if n=0 then mult_and_add:=y else if ((x<=(max_answer-y) div n)and(-x<=(max_answer+y) div n)) then mult_and_add:=n*x+y else begin arith_error:=true; mult_and_add:=0; end; end; @ We also need to divide scaled dimensions by integers. @p function x_over_n(@!x:scaled;@!n:integer):scaled; var negative:boolean; {should |remainder| be negated?} begin negative:=false; if n=0 then begin arith_error:=true; x_over_n:=0; remainder:=x; end else begin if n<0 then begin negate(x); negate(n); negative:=true; end; if x>=0 then begin x_over_n:=x div n; remainder:=x mod n; end else begin x_over_n:=-((-x) div n); remainder:=-((-x) mod n); end; end; if negative then negate(remainder); end; @ Then comes the multiplication of a scaled number by a fraction |n/d|, where |n| and |d| are nonnegative integers |<=@t$2^{16}$@>| and |d| is positive. It would be too dangerous to multiply by~|n| and then divide by~|d|, in separate operations, since overflow might well occur; and it would be too inaccurate to divide by |d| and then multiply by |n|. Hence this subroutine simulates 1.5-precision arithmetic. @p function xn_over_d(@!x:scaled; @!n,@!d:integer):scaled; var positive:boolean; {was |x>=0|?} @!t,@!u,@!v:nonnegative_integer; {intermediate quantities} begin if x>=0 then positive:=true else begin negate(x); positive:=false; end; t:=(x mod @'100000)*n; u:=(x div @'100000)*n+(t div @'100000); v:=(u mod d)*@'100000 + (t mod @'100000); if u div d>=@'100000 then arith_error:=true else u:=@'100000*(u div d) + (v div d); if positive then begin xn_over_d:=u; remainder:=v mod d; end else begin xn_over_d:=-u; remainder:=-(v mod d); end; end; @ The next subroutine is used to compute the ``badness'' of glue, when a total~|t| is supposed to be made from amounts that sum to~|s|. According to {\sl The \TeX book}, the badness of this situation is $100(t/s)^3$; however, badness is simply a heuristic, so we need not squeeze out the last drop of accuracy when computing it. All we really want is an approximation that has similar properties. @:TeXbook}{\sl The \TeX book@> The actual method used to compute the badness is easier to read from the program than to describe in words. It produces an integer value that is a reasonably close approximation to $100(t/s)^3$, and all implementations of \TeX\ should use precisely this method. Any badness of $2^{13}$ or more is treated as infinitely bad, and represented by 10000. It is not difficult to prove that $$\hbox{|badness(t+1,s)>=badness(t,s) >=badness(t,s+1)|}.$$ The badness function defined here is capable of computing at most 1095 distinct values, but that is plenty. @d inf_bad = 10000 {infinitely bad value} @p function badness(@!t,@!s:scaled):halfword; {compute badness, given |t>=0|} var r:integer; {approximation to $\alpha t/s$, where $\alpha^3\approx 100\cdot2^{18}$} begin if t=0 then badness:=0 else if s<=0 then badness:=inf_bad else begin if t<=7230584 then r:=(t*297) div s {$297^3=99.94\times2^{18}$} else if s>=1663497 then r:=t div (s div 297) else r:=t; if r>1290 then badness:=inf_bad {$1290^3<2^{31}<1291^3$} else badness:=(r*r*r+@'400000) div @'1000000; end; {that was $r^3/2^{18}$, rounded to the nearest integer} end; @ When \TeX\ ``packages'' a list into a box, it needs to calculate the proportionality ratio by which the glue inside the box should stretch or shrink. This calculation does not affect \TeX's decision making, so the precise details of rounding, etc., in the glue calculation are not of critical importance for the consistency of results on different computers. We shall use the type |glue_ratio| for such proportionality ratios. A glue ratio should take the same amount of memory as an |integer| (usually 32 bits) if it is to blend smoothly with \TeX's other data structures. Thus |glue_ratio| should be equivalent to |short_real| in some implementations of \PASCAL. Alternatively, it is possible to deal with glue ratios using nothing but fixed-point arithmetic; see {\sl TUGboat \bf3},1 (February 1982), 10--27. (But the routines cited there must be modified to allow negative glue ratios.) @^system dependencies@> @d set_glue_ratio_zero(#) == #:=0.0 {store the representation of zero ratio} @d set_glue_ratio_one(#) == #:=1.0 {store the representation of unit ratio} @d float(#) == # {convert from |glue_ratio| to type |real|} @d unfloat(#) == # {convert from |real| to type |glue_ratio|} @d float_constant(#) == #.0 {convert |integer| constant to |real|} @= @!glue_ratio=real; {one-word representation of a glue expansion factor} @* \[8] Packed data. In order to make efficient use of storage space, \TeX\ bases its major data structures on a |memory_word|, which contains either a (signed) integer, possibly scaled, or an (unsigned) |glue_ratio|, or a small number of fields that are one half or one quarter of the size used for storing integers. If |x| is a variable of type |memory_word|, it contains up to four fields that can be referred to as follows: $$\vbox{\halign{\hfil#&#\hfil&#\hfil\cr |x|&.|int|&(an |integer|)\cr |x|&.|sc|\qquad&(a |scaled| integer)\cr |x|&.|gr|&(a |glue_ratio|)\cr |x.hh.lh|, |x.hh|&.|rh|&(two halfword fields)\cr |x.hh.b0|, |x.hh.b1|, |x.hh|&.|rh|&(two quarterword fields, one halfword field)\cr |x.qqqq.b0|, |x.qqqq.b1|, |x.qqqq|&.|b2|, |x.qqqq.b3|\hskip-100pt &\qquad\qquad\qquad(four quarterword fields)\cr}}$$ This is somewhat cumbersome to write, and not very readable either, but macros will be used to make the notation shorter and more transparent. The \PASCAL\ code below gives a formal definition of |memory_word| and its subsidiary types, using packed variant records. \TeX\ makes no assumptions about the relative positions of the fields within a word. Since we are assuming 32-bit integers, a halfword must contain at least 16 bits, and a quarterword must contain at least 8 bits. @^system dependencies@> But it doesn't hurt to have more bits; for example, with enough 36-bit words you might be able to have |mem_max| as large as 262142, which is eight times as much memory as anybody had during the first four years of \TeX's existence. N.B.: Valuable memory space will be dreadfully wasted unless \TeX\ is compiled by a \PASCAL\ that packs all of the |memory_word| variants into the space of a single integer. This means, for example, that |glue_ratio| words should be |short_real| instead of |real| on some computers. Some \PASCAL\ compilers will pack an integer whose subrange is `|0..255|' into an eight-bit field, but others insist on allocating space for an additional sign bit; on such systems you can get 256 values into a quarterword only if the subrange is `|-128..127|'. The present implementation tries to accommodate as many variations as possible, so it makes few assumptions. If integers having the subrange `|min_quarterword..max_quarterword|' can be packed into a quarterword, and if integers having the subrange `|min_halfword..max_halfword|' can be packed into a halfword, everything should work satisfactorily. It is usually most efficient to have |min_quarterword=min_halfword=0|, so one should try to achieve this unless it causes a severe problem. The values defined here are recommended for most 32-bit computers. @d min_quarterword=0 {smallest allowable value in a |quarterword|} @d max_quarterword=255 {largest allowable value in a |quarterword|} @d min_halfword==0 {smallest allowable value in a |halfword|} @d max_halfword==65535 {largest allowable value in a |halfword|} @ Here are the inequalities that the quarterword and halfword values must satisfy (or rather, the inequalities that they mustn't satisfy): @= init if (mem_min<>mem_bot)or(mem_max<>mem_top) then bad:=10;@+tini@;@/ if (mem_min>mem_bot)or(mem_max0)or(max_quarterword<127) then bad:=11; if (min_halfword>0)or(max_halfword<32767) then bad:=12; if (min_quarterwordmax_halfword) then bad:=13; if (mem_min=max_halfword)or@| (mem_bot-mem_min>max_halfword+1) then bad:=14; if (font_basemax_quarterword) then bad:=15; if font_max>font_base+256 then bad:=16; if (save_size>max_halfword)or(max_strings>max_halfword) then bad:=17; if buf_size>max_halfword then bad:=18; if max_quarterword-min_quarterword<255 then bad:=19; @ The operation of adding or subtracting |min_quarterword| occurs quite frequently in \TeX, so it is convenient to abbreviate this operation by using the macros |qi| and |qo| for input and output to and from quarterword format. The inner loop of \TeX\ will run faster with respect to compilers that don't optimize expressions like `|x+0|' and `|x-0|', if these macros are simplified in the obvious way when |min_quarterword=0|. @^inner loop@>@^system dependencies@> @d qi(#)==#+min_quarterword {to put an |eight_bits| item into a quarterword} @d qo(#)==#-min_quarterword {to take an |eight_bits| item out of a quarterword} @d hi(#)==#+min_halfword {to put a sixteen-bit item into a halfword} @d ho(#)==#-min_halfword {to take a sixteen-bit item from a halfword} @ The reader should study the following definitions closely: @^system dependencies@> @d sc==int {|scaled| data is equivalent to |integer|} @= @!quarterword = min_quarterword..max_quarterword; {1/4 of a word} @!halfword=min_halfword..max_halfword; {1/2 of a word} @!two_choices = 1..2; {used when there are two variants in a record} @!four_choices = 1..4; {used when there are four variants in a record} @!two_halves = packed record@;@/ @!rh:halfword; case two_choices of 1: (@!lh:halfword); 2: (@!b0:quarterword; @!b1:quarterword); end; @!four_quarters = packed record@;@/ @!b0:quarterword; @!b1:quarterword; @!b2:quarterword; @!b3:quarterword; end; @!memory_word = record@;@/ case four_choices of 1: (@!int:integer); 2: (@!gr:glue_ratio); 3: (@!hh:two_halves); 4: (@!qqqq:four_quarters); end; @!word_file = file of memory_word; @ When debugging, we may want to print a |memory_word| without knowing what type it is; so we print it in all modes. @^dirty \PASCAL@>@^debugging@> @p @!debug procedure print_word(@!w:memory_word); {prints |w| in all ways} begin print_int(w.int); print_char(" ");@/ print_scaled(w.sc); print_char(" ");@/ print_scaled(round(unity*float(w.gr))); print_ln;@/ @^real multiplication@> print_int(w.hh.lh); print_char("="); print_int(w.hh.b0); print_char(":"); print_int(w.hh.b1); print_char(";"); print_int(w.hh.rh); print_char(" ");@/ print_int(w.qqqq.b0); print_char(":"); print_int(w.qqqq.b1); print_char(":"); print_int(w.qqqq.b2); print_char(":"); print_int(w.qqqq.b3); end; gubed @* \[9] Dynamic memory allocation. The \TeX\ system does nearly all of its own memory allocation, so that it can readily be transported into environments that do not have automatic facilities for strings, garbage collection, etc., and so that it can be in control of what error messages the user receives. The dynamic storage requirements of \TeX\ are handled by providing a large array |mem| in which consecutive blocks of words are used as nodes by the \TeX\ routines. Pointer variables are indices into this array, or into another array called |eqtb| that will be explained later. A pointer variable might also be a special flag that lies outside the bounds of |mem|, so we allow pointers to assume any |halfword| value. The minimum halfword value represents a null pointer. \TeX\ does not assume that |mem[null]| exists. @d pointer==halfword {a flag or a location in |mem| or |eqtb|} @d null==min_halfword {the null pointer} @= @!temp_ptr:pointer; {a pointer variable for occasional emergency use} @ The |mem| array is divided into two regions that are allocated separately, but the dividing line between these two regions is not fixed; they grow together until finding their ``natural'' size in a particular job. Locations less than or equal to |lo_mem_max| are used for storing variable-length records consisting of two or more words each. This region is maintained using an algorithm similar to the one described in exercise 2.5--19 of {\sl The Art of Computer Programming}. However, no size field appears in the allocated nodes; the program is responsible for knowing the relevant size when a node is freed. Locations greater than or equal to |hi_mem_min| are used for storing one-word records; a conventional \.{AVAIL} stack is used for allocation in this region. Locations of |mem| between |mem_bot| and |mem_top| may be dumped as part of preloaded format files, by the \.{INITEX} preprocessor. @.INITEX@> Production versions of \TeX\ may extend the memory at both ends in order to provide more space; locations between |mem_min| and |mem_bot| are always used for variable-size nodes, and locations between |mem_top| and |mem_max| are always used for single-word nodes. The key pointers that govern |mem| allocation have a prescribed order: $$\advance\thickmuskip-2mu \hbox{|null<=mem_min<=mem_bot= @!mem : array[mem_min..mem_max] of memory_word; {the big dynamic storage area} @!lo_mem_max : pointer; {the largest location of variable-size memory in use} @!hi_mem_min : pointer; {the smallest location of one-word memory in use} @ In order to study the memory requirements of particular applications, it is possible to prepare a version of \TeX\ that keeps track of current and maximum memory usage. When code between the delimiters |@!stat| $\ldots$ |tats| is not ``commented out,'' \TeX\ will run a bit slower but it will report these statistics when |tracing_stats| is sufficiently large. @= @!var_used, @!dyn_used : integer; {how much memory is in use} @ Let's consider the one-word memory region first, since it's the simplest. The pointer variable |mem_end| holds the highest-numbered location of |mem| that has ever been used. The free locations of |mem| that occur between |hi_mem_min| and |mem_end|, inclusive, are of type |two_halves|, and we write |info(p)| and |link(p)| for the |lh| and |rh| fields of |mem[p]| when it is of this type. The single-word free locations form a linked list $$|avail|,\;\hbox{|link(avail)|},\;\hbox{|link(link(avail))|},\;\ldots$$ terminated by |null|. @d link(#) == mem[#].hh.rh {the |link| field of a memory word} @d info(#) == mem[#].hh.lh {the |info| field of a memory word} @= @!avail : pointer; {head of the list of available one-word nodes} @!mem_end : pointer; {the last one-word node used in |mem|} @ If memory is exhausted, it might mean that the user has forgotten a right brace. We will define some procedures later that try to help pinpoint the trouble. @p @@/ @ @ The function |get_avail| returns a pointer to a new one-word node whose |link| field is null. However, \TeX\ will halt if there is no more room left. @^inner loop@> If the available-space list is empty, i.e., if |avail=null|, we try first to increase |mem_end|. If that cannot be done, i.e., if |mem_end=mem_max|, we try to decrease |hi_mem_min|. If that cannot be done, i.e., if |hi_mem_min=lo_mem_max+1|, we have to quit. @p function get_avail : pointer; {single-word node allocation} var p:pointer; {the new node being got} begin p:=avail; {get top location in the |avail| stack} if p<>null then avail:=link(avail) {and pop it off} else if mem_end end; end; link(p):=null; {provide an oft-desired initialization of the new node} @!stat incr(dyn_used);@+tats@;{maintain statistics} get_avail:=p; end; @ Conversely, a one-word node is recycled by calling |free_avail|. This routine is part of \TeX's ``inner loop,'' so we want it to be fast. @^inner loop@> @d free_avail(#)== {single-word node liberation} begin link(#):=avail; avail:=#; @!stat decr(dyn_used);@+tats@/ end @ There's also a |fast_get_avail| routine, which saves the procedure-call overhead at the expense of extra programming. This routine is used in the places that would otherwise account for the most calls of |get_avail|. @^inner loop@> @d fast_get_avail(#)==@t@>@;@/ begin #:=avail; {avoid |get_avail| if possible, to save time} if #=null then #:=get_avail else begin avail:=link(#); link(#):=null; @!stat incr(dyn_used);@+tats@/ end; end @ The procedure |flush_list(p)| frees an entire linked list of one-word nodes that starts at position |p|. @^inner loop@> @p procedure flush_list(@!p:pointer); {makes list of single-word nodes available} var @!q,@!r:pointer; {list traversers} begin if p<>null then begin r:=p; repeat q:=r; r:=link(r); @!stat decr(dyn_used);@+tats@/ until r=null; {now |q| is the last node on the list} link(q):=avail; avail:=p; end; end; @ The available-space list that keeps track of the variable-size portion of |mem| is a nonempty, doubly-linked circular list of empty nodes, pointed to by the roving pointer |rover|. Each empty node has size 2 or more; the first word contains the special value |max_halfword| in its |link| field and the size in its |info| field; the second word contains the two pointers for double linking. Each nonempty node also has size 2 or more. Its first word is of type |two_halves|\kern-1pt, and its |link| field is never equal to |max_halfword|. Otherwise there is complete flexibility with respect to the contents of its other fields and its other words. (We require |mem_max= @!rover : pointer; {points to some node in the list of empties} @ A call to |get_node| with argument |s| returns a pointer to a new node of size~|s|, which must be 2~or more. The |link| field of the first word of this new node is set to null. An overflow stop occurs if no suitable space exists. If |get_node| is called with $s=2^{30}$, it simply merges adjacent free areas and returns the value |max_halfword|. @p function get_node(@!s:integer):pointer; {variable-size node allocation} label found,exit,restart; var p:pointer; {the node currently under inspection} @!q:pointer; {the node physically after node |p|} @!r:integer; {the newly allocated node, or a candidate for this honor} @!t:integer; {temporary register} begin restart: p:=rover; {start at some free node in the ring} repeat @; @^inner loop@> p:=rlink(p); {move to the next node in the ring} until p=rover; {repeat until the whole list has been traversed} if s=@'10000000000 then begin get_node:=max_halfword; return; end; if lo_mem_max+2; overflow("main memory size",mem_max+1-mem_min); {sorry, nothing satisfactory is left} @:TeX capacity exceeded main memory size}{\quad main memory size@> found: link(r):=null; {this node is now nonempty} @!stat var_used:=var_used+s; {maintain usage statistics} tats@;@/ get_node:=r; exit:end; @ The lower part of |mem| grows by 1000 words at a time, unless we are very close to going under. When it grows, we simply link a new node into the available-space list. This method of controlled growth helps to keep the |mem| usage consecutive when \TeX\ is implemented on ``virtual memory'' systems. @^virtual memory@> @= begin if hi_mem_min-lo_mem_max>=1998 then t:=lo_mem_max+1000 else t:=lo_mem_max+1+(hi_mem_min-lo_mem_max) div 2; {|lo_mem_max+2<=tmem_bot+max_halfword then t:=mem_bot+max_halfword; rlink(q):=rover; llink(q):=p; link(q):=empty_flag; node_size(q):=t-lo_mem_max;@/ lo_mem_max:=t; link(lo_mem_max):=null; info(lo_mem_max):=null; rover:=q; goto restart; end @ Empirical tests show that the routine in this section performs a node-merging operation about 0.75 times per allocation, on the average, after which it finds that |r>p+1| about 95\pct! of the time. @= q:=p+node_size(p); {find the physical successor} @^inner loop@> while is_empty(q) do {merge node |p| with node |q|} begin t:=rlink(q); if q=rover then rover:=t; llink(t):=llink(q); rlink(llink(q)):=t;@/ q:=q+node_size(q); end; r:=q-s; if r>p+1 then @; if r=p then if rlink(p)<>p then @; node_size(p):=q-p {reset the size in case it grew} @ @= begin node_size(p):=r-p; {store the remaining size} @^inner loop@> rover:=p; {start searching here next time} goto found; end @ Here we delete node |p| from the ring, and let |rover| rove around. @= begin rover:=rlink(p); t:=llink(p); llink(rover):=t; rlink(t):=rover; goto found; end @ Conversely, when some variable-size node |p| of size |s| is no longer needed, the operation |free_node(p,s)| will make its words available, by inserting |p| as a new empty node just before where |rover| now points. @^inner loop@> @p procedure free_node(@!p:pointer; @!s:halfword); {variable-size node liberation} var q:pointer; {|llink(rover)|} begin node_size(p):=s; link(p):=empty_flag; q:=llink(rover); llink(p):=q; rlink(p):=rover; {set both links} llink(rover):=p; rlink(q):=p; {insert |p| into the ring} @!stat var_used:=var_used-s;@+tats@;{maintain statistics} end; @ Just before \.{INITEX} writes out the memory, it sorts the doubly linked available space list. The list is probably very short at such times, so a simple insertion sort is used. The smallest available location will be pointed to by |rover|, the next-smallest by |rlink(rover)|, etc. @p @!init procedure sort_avail; {sorts the available variable-size nodes by location} var p,@!q,@!r: pointer; {indices into |mem|} @!old_rover:pointer; {initial |rover| setting} begin p:=get_node(@'10000000000); {merge adjacent free areas} p:=rlink(rover); rlink(rover):=max_halfword; old_rover:=rover; while p<>old_rover do @; p:=rover; while rlink(p)<>max_halfword do begin llink(rlink(p)):=p; p:=rlink(p); end; rlink(p):=rover; llink(rover):=p; end; tini @ The following |while| loop is guaranteed to terminate, since the list that starts at |rover| ends with |max_halfword| during the sorting procedure. @= if p@^Chinese characters@>@^Japanese characters@> and styles of type. It is suggested that Chinese and Japanese fonts be handled by representing such characters in two consecutive |char_node| entries: The first of these has |font=font_base|, and its |link| points to the second; the second identifies the font and the character dimensions. The saving feature about oriental characters is that most of them have the same box dimensions. The |character| field of the first |char_node| is a ``\\{charext}'' that distinguishes between graphic symbols whose dimensions are identical for typesetting purposes. (See the \MF\ manual.) Such an extension of \TeX\ would not be difficult; further details are left to the reader. In order to make sure that the |character| code fits in a quarterword, \TeX\ adds the quantity |min_quarterword| to the actual code. Character nodes appear only in horizontal lists, never in vertical lists. @d is_char_node(#) == (#>=hi_mem_min) {does the argument point to a |char_node|?} @d font == type {the font code in a |char_node|} @d character == subtype {the character code in a |char_node|} @ An |hlist_node| stands for a box that was made from a horizontal list. Each |hlist_node| is seven words long, and contains the following fields (in addition to the mandatory |type| and |link|, which we shall not mention explicitly when discussing the other node types): The |height| and |width| and |depth| are scaled integers denoting the dimensions of the box. There is also a |shift_amount| field, a scaled integer indicating how much this box should be lowered (if it appears in a horizontal list), or how much it should be moved to the right (if it appears in a vertical list). There is a |list_ptr| field, which points to the beginning of the list from which this box was fabricated; if |list_ptr| is |null|, the box is empty. Finally, there are three fields that represent the setting of the glue: |glue_set(p)| is a word of type |glue_ratio| that represents the proportionality constant for glue setting; |glue_sign(p)| is |stretching| or |shrinking| or |normal| depending on whether or not the glue should stretch or shrink or remain rigid; and |glue_order(p)| specifies the order of infinity to which glue setting applies (|normal|, |fil|, |fill|, or |filll|). The |subtype| field is not used. @d hlist_node=0 {|type| of hlist nodes} @d box_node_size=7 {number of words to allocate for a box node} @d width_offset=1 {position of |width| field in a box node} @d depth_offset=2 {position of |depth| field in a box node} @d height_offset=3 {position of |height| field in a box node} @d width(#) == mem[#+width_offset].sc {width of the box, in sp} @d depth(#) == mem[#+depth_offset].sc {depth of the box, in sp} @d height(#) == mem[#+height_offset].sc {height of the box, in sp} @d shift_amount(#) == mem[#+4].sc {repositioning distance, in sp} @d list_offset=5 {position of |list_ptr| field in a box node} @d list_ptr(#) == link(#+list_offset) {beginning of the list inside the box} @d glue_order(#) == subtype(#+list_offset) {applicable order of infinity} @d glue_sign(#) == type(#+list_offset) {stretching or shrinking} @d normal=0 {the most common case when several cases are named} @d stretching = 1 {glue setting applies to the stretch components} @d shrinking = 2 {glue setting applies to the shrink components} @d glue_offset = 6 {position of |glue_set| in a box node} @d glue_set(#) == mem[#+glue_offset].gr {a word of type |glue_ratio| for glue setting} @ The |new_null_box| function returns a pointer to an |hlist_node| in which all subfields have the values corresponding to `\.{\\hbox\{\}}'. The |subtype| field is set to |min_quarterword|, since that's the desired |span_count| value if this |hlist_node| is changed to an |unset_node|. @p function new_null_box:pointer; {creates a new box node} var p:pointer; {the new node} begin p:=get_node(box_node_size); type(p):=hlist_node; subtype(p):=min_quarterword; width(p):=0; depth(p):=0; height(p):=0; shift_amount(p):=0; list_ptr(p):=null; glue_sign(p):=normal; glue_order(p):=normal; set_glue_ratio_zero(glue_set(p)); new_null_box:=p; end; @ A |vlist_node| is like an |hlist_node| in all respects except that it contains a vertical list. @d vlist_node=1 {|type| of vlist nodes} @ A |rule_node| stands for a solid black rectangle; it has |width|, |depth|, and |height| fields just as in an |hlist_node|. However, if any of these dimensions is $-2^{30}$, the actual value will be determined by running the rule up to the boundary of the innermost enclosing box. This is called a ``running dimension.'' The |width| is never running in an hlist; the |height| and |depth| are never running in a~vlist. @d rule_node=2 {|type| of rule nodes} @d rule_node_size=4 {number of words to allocate for a rule node} @d null_flag==-@'10000000000 {$-2^{30}$, signifies a missing item} @d is_running(#) == (#=null_flag) {tests for a running dimension} @ A new rule node is delivered by the |new_rule| function. It makes all the dimensions ``running,'' so you have to change the ones that are not allowed to run. @p function new_rule:pointer; var p:pointer; {the new node} begin p:=get_node(rule_node_size); type(p):=rule_node; subtype(p):=0; {the |subtype| is not used} width(p):=null_flag; depth(p):=null_flag; height(p):=null_flag; new_rule:=p; end; @ Insertions are represented by |ins_node| records, where the |subtype| indicates the corresponding box number. For example, `\.{\\insert 250}' leads to an |ins_node| whose |subtype| is |250+min_quarterword|. The |height| field of an |ins_node| is slightly misnamed; it actually holds the natural height plus depth of the vertical list being inserted. The |depth| field holds the |split_max_depth| to be used in case this insertion is split, and the |split_top_ptr| points to the corresponding |split_top_skip|. The |float_cost| field holds the |floating_penalty| that will be used if this insertion floats to a subsequent page after a split insertion of the same class. There is one more field, the |ins_ptr|, which points to the beginning of the vlist for the insertion. @d ins_node=3 {|type| of insertion nodes} @d ins_node_size=5 {number of words to allocate for an insertion} @d float_cost(#)==mem[#+1].int {the |floating_penalty| to be used} @d ins_ptr(#)==info(#+4) {the vertical list to be inserted} @d split_top_ptr(#)==link(#+4) {the |split_top_skip| to be used} @ A |mark_node| has a |mark_ptr| field that points to the reference count of a token list that contains the user's \.{\\mark} text. This field occupies a full word instead of a halfword, because there's nothing to put in the other halfword; it is easier in \PASCAL\ to use the full word than to risk leaving garbage in the unused half. @d mark_node=4 {|type| of a mark node} @d small_node_size=2 {number of words to allocate for most node types} @d mark_ptr(#)==mem[#+1].int {head of the token list for a mark} @ An |adjust_node|, which occurs only in horizontal lists, specifies material that will be moved out into the surrounding vertical list; i.e., it is used to implement \TeX's `\.{\\vadjust}' operation. The |adjust_ptr| field points to the vlist containing this material. @d adjust_node=5 {|type| of an adjust node} @d adjust_ptr==mark_ptr {vertical list to be moved out of horizontal list} @ A |ligature_node|, which occurs only in horizontal lists, specifies a character that was fabricated from the interaction of two or more actual characters. The second word of the node, which is called the |lig_char| word, contains |font| and |character| fields just as in a |char_node|. The characters that generated the ligature have not been forgotten, since they are needed for diagnostic messages and for hyphenation; the |lig_ptr| field points to a linked list of character nodes for all original characters that have been deleted. (This list might be empty if the characters that generated the ligature were retained in other nodes.) The |subtype| field is 0, plus 2 and/or 1 if the original source of the ligature included implicit left and/or right boundaries. @d ligature_node=6 {|type| of a ligature node} @d lig_char(#)==#+1 {the word where the ligature is to be found} @d lig_ptr(#)==link(lig_char(#)) {the list of characters} @ The |new_ligature| function creates a ligature node having given contents of the |font|, |character|, and |lig_ptr| fields. We also have a |new_lig_item| function, which returns a two-word node having a given |character| field. Such nodes are used for temporary processing as ligatures are being created. @p function new_ligature(@!f,@!c:quarterword; @!q:pointer):pointer; var p:pointer; {the new node} begin p:=get_node(small_node_size); type(p):=ligature_node; font(lig_char(p)):=f; character(lig_char(p)):=c; lig_ptr(p):=q; subtype(p):=0; new_ligature:=p; end; @# function new_lig_item(@!c:quarterword):pointer; var p:pointer; {the new node} begin p:=get_node(small_node_size); character(p):=c; lig_ptr(p):=null; new_lig_item:=p; end; @ A |disc_node|, which occurs only in horizontal lists, specifies a ``dis\-cretion\-ary'' line break. If such a break occurs at node |p|, the text that starts at |pre_break(p)| will precede the break, the text that starts at |post_break(p)| will follow the break, and text that appears in the next |replace_count(p)| nodes will be ignored. For example, an ordinary discretionary hyphen, indicated by `\.{\\-}', yields a |disc_node| with |pre_break| pointing to a |char_node| containing a hyphen, |post_break=null|, and |replace_count=0|. All three of the discretionary texts must be lists that consist entirely of character, kern, box, rule, and ligature nodes. If |pre_break(p)=null|, the |ex_hyphen_penalty| will be charged for this break. Otherwise the |hyphen_penalty| will be charged. The texts will actually be substituted into the list by the line-breaking algorithm if it decides to make the break, and the discretionary node will disappear at that time; thus, the output routine sees only discretionaries that were not chosen. @d disc_node=7 {|type| of a discretionary node} @d replace_count==subtype {how many subsequent nodes to replace} @d pre_break==llink {text that precedes a discretionary break} @d post_break==rlink {text that follows a discretionary break} @p function new_disc:pointer; {creates an empty |disc_node|} var p:pointer; {the new node} begin p:=get_node(small_node_size); type(p):=disc_node; replace_count(p):=0; pre_break(p):=null; post_break(p):=null; new_disc:=p; end; @ A |whatsit_node| is a wild card reserved for extensions to \TeX. The |subtype| field in its first word says what `\\{whatsit}' it is, and implicitly determines the node size (which must be 2 or more) and the format of the remaining words. When a |whatsit_node| is encountered in a list, special actions are invoked; knowledgeable people who are careful not to mess up the rest of \TeX\ are able to make \TeX\ do new things by adding code at the end of the program. For example, there might be a `\TeX nicolor' extension to specify different colors of ink, @^extensions to \TeX@> and the whatsit node might contain the desired parameters. The present implementation of \TeX\ treats the features associated with `\.{\\write}' and `\.{\\special}' as if they were extensions, in order to illustrate how such routines might be coded. We shall defer further discussion of extensions until the end of this program. @d whatsit_node=8 {|type| of special extension nodes} @ A |math_node|, which occurs only in horizontal lists, appears before and after mathematical formulas. The |subtype| field is |before| before the formula and |after| after it. There is a |width| field, which represents the amount of surrounding space inserted by \.{\\mathsurround}. @d math_node=9 {|type| of a math node} @d before=0 {|subtype| for math node that introduces a formula} @d after=1 {|subtype| for math node that winds up a formula} @p function new_math(@!w:scaled;@!s:small_number):pointer; var p:pointer; {the new node} begin p:=get_node(small_node_size); type(p):=math_node; subtype(p):=s; width(p):=w; new_math:=p; end; @ \TeX\ makes use of the fact that |hlist_node|, |vlist_node|, |rule_node|, |ins_node|, |mark_node|, |adjust_node|, |ligature_node|, |disc_node|, |whatsit_node|, and |math_node| are at the low end of the type codes, by permitting a break at glue in a list if and only if the |type| of the previous node is less than |math_node|. Furthermore, a node is discarded after a break if its type is |math_node| or~more. @d precedes_break(#)==(type(#) representing |null| plus the number of glue nodes that point to it (less one). Note that the reference count appears in the same position as the |link| field in list nodes; this is the field that is initialized to |null| when a node is allocated, and it is also the field that is flagged by |empty_flag| in empty nodes. Glue specifications also contain three |scaled| fields, for the |width|, |stretch|, and |shrink| dimensions. Finally, there are two one-byte fields called |stretch_order| and |shrink_order|; these contain the orders of infinity (|normal|, |fil|, |fill|, or |filll|) corresponding to the stretch and shrink values. @d glue_spec_size=4 {number of words to allocate for a glue specification} @d glue_ref_count(#) == link(#) {reference count of a glue specification} @d stretch(#) == mem[#+2].sc {the stretchability of this glob of glue} @d shrink(#) == mem[#+3].sc {the shrinkability of this glob of glue} @d stretch_order == type {order of infinity for stretching} @d shrink_order == subtype {order of infinity for shrinking} @d fil=1 {first-order infinity} @d fill=2 {second-order infinity} @d filll=3 {third-order infinity} @= @!glue_ord=normal..filll; {infinity to the 0, 1, 2, or 3 power} @ Here is a function that returns a pointer to a copy of a glue spec. The reference count in the copy is |null|, because there is assumed to be exactly one reference to the new specification. @p function new_spec(@!p:pointer):pointer; {duplicates a glue specification} var q:pointer; {the new spec} begin q:=get_node(glue_spec_size);@/ mem[q]:=mem[p]; glue_ref_count(q):=null;@/ width(q):=width(p); stretch(q):=stretch(p); shrink(q):=shrink(p); new_spec:=q; end; @ And here's a function that creates a glue node for a given parameter identified by its code number; for example, |new_param_glue(line_skip_code)| returns a pointer to a glue node for the current \.{\\lineskip}. @p function new_param_glue(@!n:small_number):pointer; var p:pointer; {the new node} @!q:pointer; {the glue specification} begin p:=get_node(small_node_size); type(p):=glue_node; subtype(p):=n+1; leader_ptr(p):=null;@/ q:=@@t@>; glue_ptr(p):=q; incr(glue_ref_count(q)); new_param_glue:=p; end; @ Glue nodes that are more or less anonymous are created by |new_glue|, whose argument points to a glue specification. @p function new_glue(@!q:pointer):pointer; var p:pointer; {the new node} begin p:=get_node(small_node_size); type(p):=glue_node; subtype(p):=normal; leader_ptr(p):=null; glue_ptr(p):=q; incr(glue_ref_count(q)); new_glue:=p; end; @ Still another subroutine is needed: this one is sort of a combination of |new_param_glue| and |new_glue|. It creates a glue node for one of the current glue parameters, but it makes a fresh copy of the glue specification, since that specification will probably be subject to change, while the parameter will stay put. The global variable |temp_ptr| is set to the address of the new spec. @p function new_skip_param(@!n:small_number):pointer; var p:pointer; {the new node} begin temp_ptr:=new_spec(@); p:=new_glue(temp_ptr); glue_ref_count(temp_ptr):=null; subtype(p):=n+1; new_skip_param:=p; end; @ A |kern_node| has a |width| field to specify a (normally negative) amount of spacing. This spacing correction appears in horizontal lists between letters like A and V when the font designer said that it looks better to move them closer together or further apart. A kern node can also appear in a vertical list, when its `|width|' denotes additional spacing in the vertical direction. The |subtype| is either |normal| (for kerns inserted from font information or math mode calculations) or |explicit| (for kerns inserted from \.{\\kern} and \.{\\/} commands) or |acc_kern| (for kerns inserted from non-math accents) or |mu_glue| (for kerns inserted from \.{\\mkern} specifications in math formulas). @d kern_node=11 {|type| of a kern node} @d explicit=1 {|subtype| of kern nodes from \.{\\kern} and \.{\\/}} @d acc_kern=2 {|subtype| of kern nodes from accents} @ The |new_kern| function creates a kern node having a given width. @p function new_kern(@!w:scaled):pointer; var p:pointer; {the new node} begin p:=get_node(small_node_size); type(p):=kern_node; subtype(p):=normal; width(p):=w; new_kern:=p; end; @ A |penalty_node| specifies the penalty associated with line or page breaking, in its |penalty| field. This field is a fullword integer, but the full range of integer values is not used: Any penalty |>=10000| is treated as infinity, and no break will be allowed for such high values. Similarly, any penalty |<=-10000| is treated as negative infinity, and a break will be forced. @d penalty_node=12 {|type| of a penalty node} @d inf_penalty=inf_bad {``infinite'' penalty value} @d eject_penalty=-inf_penalty {``negatively infinite'' penalty value} @d penalty(#) == mem[#+1].int {the added cost of breaking a list here} @ Anyone who has been reading the last few sections of the program will be able to guess what comes next. @p function new_penalty(@!m:integer):pointer; var p:pointer; {the new node} begin p:=get_node(small_node_size); type(p):=penalty_node; subtype(p):=0; {the |subtype| is not used} penalty(p):=m; new_penalty:=p; end; @ You might think that we have introduced enough node types by now. Well, almost, but there is one more: An |unset_node| has nearly the same format as an |hlist_node| or |vlist_node|; it is used for entries in \.{\\halign} or \.{\\valign} that are not yet in their final form, since the box dimensions are their ``natural'' sizes before any glue adjustment has been made. The |glue_set| word is not present; instead, we have a |glue_stretch| field, which contains the total stretch of order |glue_order| that is present in the hlist or vlist being boxed. Similarly, the |shift_amount| field is replaced by a |glue_shrink| field, containing the total shrink of order |glue_sign| that is present. The |subtype| field is called |span_count|; an unset box typically contains the data for |qo(span_count)+1| columns. Unset nodes will be changed to box nodes when alignment is completed. @d unset_node=13 {|type| for an unset node} @d glue_stretch(#)==mem[#+glue_offset].sc {total stretch in an unset node} @d glue_shrink==shift_amount {total shrink in an unset node} @d span_count==subtype {indicates the number of spanned columns} @ In fact, there are still more types coming. When we get to math formula processing we will see that a |style_node| has |type=14|; and a number of larger type codes will also be defined, for use in math mode only. @ Warning: If any changes are made to these data structure layouts, such as changing any of the node sizes or even reordering the words of nodes, the |copy_node_list| procedure and the memory initialization code below may have to be changed. Such potentially dangerous parts of the program are listed in the index under `data structure assumptions'. @!@^data structure assumptions@> However, other references to the nodes are made symbolically in terms of the \.{WEB} macro definitions above, so that format changes will leave \TeX's other algorithms intact. @^system dependencies@> @* \[11] Memory layout. Some areas of |mem| are dedicated to fixed usage, since static allocation is more efficient than dynamic allocation when we can get away with it. For example, locations |mem_bot| to |mem_bot+3| are always used to store the specification for glue that is `\.{0pt plus 0pt minus 0pt}'. The following macro definitions accomplish the static allocation by giving symbolic names to the fixed positions. Static variable-size nodes appear in locations |mem_bot| through |lo_mem_stat_max|, and static single-word nodes appear in locations |hi_mem_stat_min| through |mem_top|, inclusive. It is harmless to let |lig_trick| and |garbage| share the same location of |mem|. @d zero_glue==mem_bot {specification for \.{0pt plus 0pt minus 0pt}} @d fil_glue==zero_glue+glue_spec_size {\.{0pt plus 1fil minus 0pt}} @d fill_glue==fil_glue+glue_spec_size {\.{0pt plus 1fill minus 0pt}} @d ss_glue==fill_glue+glue_spec_size {\.{0pt plus 1fil minus 1fil}} @d fil_neg_glue==ss_glue+glue_spec_size {\.{0pt plus -1fil minus 0pt}} @d lo_mem_stat_max==fil_neg_glue+glue_spec_size-1 {largest statically allocated word in the variable-size |mem|} @# @d page_ins_head==mem_top {list of insertion data for current page} @d contrib_head==mem_top-1 {vlist of items not yet on current page} @d page_head==mem_top-2 {vlist for current page} @d temp_head==mem_top-3 {head of a temporary list of some kind} @d hold_head==mem_top-4 {head of a temporary list of another kind} @d adjust_head==mem_top-5 {head of adjustment list returned by |hpack|} @d active==mem_top-7 {head of active list in |line_break|, needs two words} @d align_head==mem_top-8 {head of preamble list for alignments} @d end_span==mem_top-9 {tail of spanned-width lists} @d omit_template==mem_top-10 {a constant token list} @d null_list==mem_top-11 {permanently empty list} @d lig_trick==mem_top-12 {a ligature masquerading as a |char_node|} @d garbage==mem_top-12 {used for scrap information} @d backup_head==mem_top-13 {head of token list built by |scan_keyword|} @d hi_mem_stat_min==mem_top-13 {smallest statically allocated word in the one-word |mem|} @d hi_mem_stat_usage=14 {the number of one-word nodes always present} @ The following code gets |mem| off to a good start, when \TeX\ is initializing itself the slow~way. @= @!k:integer; {index into |mem|, |eqtb|, etc.} @ @= for k:=mem_bot+1 to lo_mem_stat_max do mem[k].sc:=0; {all glue dimensions are zeroed} @^data structure assumptions@> k:=mem_bot;@+while k<=lo_mem_stat_max do {set first words of glue specifications} begin glue_ref_count(k):=null+1; stretch_order(k):=normal; shrink_order(k):=normal; k:=k+glue_spec_size; end; stretch(fil_glue):=unity; stretch_order(fil_glue):=fil;@/ stretch(fill_glue):=unity; stretch_order(fill_glue):=fill;@/ stretch(ss_glue):=unity; stretch_order(ss_glue):=fil;@/ shrink(ss_glue):=unity; shrink_order(ss_glue):=fil;@/ stretch(fil_neg_glue):=-unity; stretch_order(fil_neg_glue):=fil;@/ rover:=lo_mem_stat_max+1; link(rover):=empty_flag; {now initialize the dynamic memory} node_size(rover):=1000; {which is a 1000-word available node} llink(rover):=rover; rlink(rover):=rover;@/ lo_mem_max:=rover+1000; link(lo_mem_max):=null; info(lo_mem_max):=null;@/ for k:=hi_mem_stat_min to mem_top do mem[k]:=mem[lo_mem_max]; {clear list heads} @; avail:=null; mem_end:=mem_top; hi_mem_min:=hi_mem_stat_min; {initialize the one-word memory} var_used:=lo_mem_stat_max+1-mem_bot; dyn_used:=hi_mem_stat_usage; {initialize statistics} @ If \TeX\ is extended improperly, the |mem| array might get screwed up. For example, some pointers might be wrong, or some ``dead'' nodes might not have been freed when the last reference to them disappeared. Procedures |check_mem| and |search_mem| are available to help diagnose such problems. These procedures make use of two arrays called |free| and |was_free| that are present only if \TeX's debugging routines have been included. (You may want to decrease the size of |mem| while you @^debugging@> are debugging.) @= @!debug @!free: packed array [mem_min..mem_max] of boolean; {free cells} @t\hskip10pt@>@!was_free: packed array [mem_min..mem_max] of boolean; {previously free cells} @t\hskip10pt@>@!was_mem_end,@!was_lo_max,@!was_hi_min: pointer; {previous |mem_end|, |lo_mem_max|, and |hi_mem_min|} @t\hskip10pt@>@!panicking:boolean; {do we want to check memory constantly?} gubed @ @= @!debug was_mem_end:=mem_min; {indicate that everything was previously free} was_lo_max:=mem_min; was_hi_min:=mem_max; panicking:=false; gubed @ Procedure |check_mem| makes sure that the available space lists of |mem| are well formed, and it optionally prints out all locations that are reserved now but were free the last time this procedure was called. @p @!debug procedure check_mem(@!print_locs : boolean); label done1,done2; {loop exits} var p,@!q:pointer; {current locations of interest in |mem|} @!clobbered:boolean; {is something amiss?} begin for p:=mem_min to lo_mem_max do free[p]:=false; {you can probably do this faster} for p:=hi_mem_min to mem_end do free[p]:=false; {ditto} @; @; @; if print_locs then @; for p:=mem_min to lo_mem_max do was_free[p]:=free[p]; for p:=hi_mem_min to mem_end do was_free[p]:=free[p]; {|was_free:=free| might be faster} was_mem_end:=mem_end; was_lo_max:=lo_mem_max; was_hi_min:=hi_mem_min; end; gubed @ @= p:=avail; q:=null; clobbered:=false; while p<>null do begin if (p>mem_end)or(p print_int(q); goto done1; end; free[p]:=true; q:=p; p:=link(q); end; done1: @ @= p:=rover; q:=null; clobbered:=false; repeat if (p>=lo_mem_max)or(p=lo_mem_max)or(rlink(p)lo_mem_max)or@| (llink(rlink(p))<>p) then clobbered:=true; if clobbered then begin print_nl("Double-AVAIL list clobbered at "); print_int(q); goto done2; end; for q:=p to p+node_size(p)-1 do {mark all locations free} begin if free[q] then begin print_nl("Doubly free location at "); @.Doubly free location...@> print_int(q); goto done2; end; free[q]:=true; end; q:=p; p:=rlink(p); until p=rover; done2: @ @= p:=mem_min; while p<=lo_mem_max do {node |p| should not be empty} begin if is_empty(p) then begin print_nl("Bad flag at "); print_int(p); @.Bad flag...@> end; while (p<=lo_mem_max) and not free[p] do incr(p); while (p<=lo_mem_max) and free[p] do incr(p); end @ @= begin print_nl("New busy locs:"); for p:=mem_min to lo_mem_max do if not free[p] and ((p>was_lo_max) or was_free[p]) then begin print_char(" "); print_int(p); end; for p:=hi_mem_min to mem_end do if not free[p] and ((pwas_mem_end) or was_free[p]) then begin print_char(" "); print_int(p); end; end @ The |search_mem| procedure attempts to answer the question ``Who points to node~|p|?'' In doing so, it fetches |link| and |info| fields of |mem| that might not be of type |two_halves|. Strictly speaking, this is @^dirty \PASCAL@> undefined in \PASCAL, and it can lead to ``false drops'' (words that seem to point to |p| purely by coincidence). But for debugging purposes, we want to rule out the places that do {\sl not\/} point to |p|, so a few false drops are tolerable. @p @!debug procedure search_mem(@!p:pointer); {look for pointers to |p|} var q:integer; {current position being searched} begin for q:=mem_min to lo_mem_max do begin if link(q)=p then begin print_nl("LINK("); print_int(q); print_char(")"); end; if info(q)=p then begin print_nl("INFO("); print_int(q); print_char(")"); end; end; for q:=hi_mem_min to mem_end do begin if link(q)=p then begin print_nl("LINK("); print_int(q); print_char(")"); end; if info(q)=p then begin print_nl("INFO("); print_int(q); print_char(")"); end; end; @; @; @; end; gubed @* \[12] Displaying boxes. We can reinforce our knowledge of the data structures just introduced by considering two procedures that display a list in symbolic form. The first of these, called |short_display|, is used in ``overfull box'' messages to give the top-level description of a list. The other one, called |show_node_list|, prints a detailed description of exactly what is in the data structure. The philosophy of |short_display| is to ignore the fine points about exactly what is inside boxes, except that ligatures and discretionary breaks are expanded. As a result, |short_display| is a recursive procedure, but the recursion is never more than one level deep. @^recursion@> A global variable |font_in_short_display| keeps track of the font code that is assumed to be present when |short_display| begins; deviations from this font will be printed. @= @!font_in_short_display:integer; {an internal font number} @ Boxes, rules, inserts, whatsits, marks, and things in general that are sort of ``complicated'' are indicated only by printing `\.{[]}'. @p procedure short_display(@!p:integer); {prints highlights of list |p|} var n:integer; {for replacement counts} begin while p>mem_min do begin if is_char_node(p) then begin if p<=mem_end then begin if font(p)<>font_in_short_display then begin if (font(p)font_max) then print_char("*") @.*\relax@> else @; print_char(" "); font_in_short_display:=font(p); end; print_ASCII(qo(character(p))); end; end else @; p:=link(p); end; end; @ @= case type(p) of hlist_node,vlist_node,ins_node,whatsit_node,mark_node,adjust_node, unset_node: print("[]"); rule_node: print_char("|"); glue_node: if glue_ptr(p)<>zero_glue then print_char(" "); math_node: print_char("$"); ligature_node: short_display(lig_ptr(p)); disc_node: begin short_display(pre_break(p)); short_display(post_break(p));@/ n:=replace_count(p); while n>0 do begin if link(p)<>null then p:=link(p); decr(n); end; end; othercases do_nothing endcases @ The |show_node_list| routine requires some auxiliary subroutines: one to print a font-and-character combination, one to print a token list without its reference count, and one to print a rule dimension. @p procedure print_font_and_char(@!p:integer); {prints |char_node| data} begin if p>mem_end then print_esc("CLOBBERED.") else begin if (font(p)font_max) then print_char("*") @.*\relax@> else @; print_char(" "); print_ASCII(qo(character(p))); end; end; @# procedure print_mark(@!p:integer); {prints token list data in braces} begin print_char("{"); if (pmem_end) then print_esc("CLOBBERED.") else show_token_list(link(p),null,max_print_line-10); print_char("}"); end; @# procedure print_rule_dimen(@!d:scaled); {prints dimension in rule node} begin if is_running(d) then print_char("*") else print_scaled(d); @.*\relax@> end; @ Then there is a subroutine that prints glue stretch and shrink, possibly followed by the name of finite units: @p procedure print_glue(@!d:scaled;@!order:integer;@!s:str_number); {prints a glue component} begin print_scaled(d); if (orderfilll) then print("foul") else if order>normal then begin print("fil"); while order>fil do begin print_char("l"); decr(order); end; end else if s<>0 then print(s); end; @ The next subroutine prints a whole glue specification. @p procedure print_spec(@!p:integer;@!s:str_number); {prints a glue specification} begin if (p=lo_mem_max) then print_char("*") @.*\relax@> else begin print_scaled(width(p)); if s<>0 then print(s); if stretch(p)<>0 then begin print(" plus "); print_glue(stretch(p),stretch_order(p),s); end; if shrink(p)<>0 then begin print(" minus "); print_glue(shrink(p),shrink_order(p),s); end; end; end; @ We also need to declare some procedures that appear later in this documentation. @p @@; @ @ Since boxes can be inside of boxes, |show_node_list| is inherently recursive, @^recursion@> up to a given maximum number of levels. The history of nesting is indicated by the current string, which will be printed at the beginning of each line; the length of this string, namely |cur_length|, is the depth of nesting. Recursive calls on |show_node_list| therefore use the following pattern: @d node_list_display(#)== begin append_char("."); show_node_list(#); flush_char; end {|str_room| need not be checked; see |show_box| below} @ A global variable called |depth_threshold| is used to record the maximum depth of nesting for which |show_node_list| will show information. If we have |depth_threshold=0|, for example, only the top level information will be given and no sublists will be traversed. Another global variable, called |breadth_max|, tells the maximum number of items to show at each level; |breadth_max| had better be positive, or you won't see anything. @= @!depth_threshold : integer; {maximum nesting depth in box displays} @!breadth_max : integer; {maximum number of items shown at the same list level} @ Now we are ready for |show_node_list| itself. This procedure has been written to be ``extra robust'' in the sense that it should not crash or get into a loop even if the data structures have been messed up by bugs in the rest of the program. You can safely call its parent routine |show_box(p)| for arbitrary values of |p| when you are debugging \TeX. However, in the presence of bad data, the procedure may @^dirty \PASCAL@>@^debugging@> fetch a |memory_word| whose variant is different from the way it was stored; for example, it might try to read |mem[p].hh| when |mem[p]| contains a scaled integer, if |p| is a pointer that has been clobbered or chosen at random. @p procedure show_node_list(@!p:integer); {prints a node list symbolically} label exit; var n:integer; {the number of items already printed at this level} @!g:real; {a glue ratio, as a floating point number} begin if cur_length>depth_threshold then begin if p>null then print(" []"); {indicate that there's been some truncation} return; end; n:=0; while p>mem_min do begin print_ln; print_current_string; {display the nesting history} if p>mem_end then {pointer out of range} begin print("Bad link, display aborted."); return; @.Bad link...@> end; incr(n); if n>breadth_max then {time to stop} begin print("etc."); return; @.etc@> end; @; p:=link(p); end; exit: end; @ @= if is_char_node(p) then print_font_and_char(p) else case type(p) of hlist_node,vlist_node,unset_node: @; rule_node: @; ins_node: @; whatsit_node: @; glue_node: @; kern_node: @; math_node: @; ligature_node: @; penalty_node: @; disc_node: @; mark_node: @; adjust_node: @; @t\4@>@@; othercases print("Unknown node type!") endcases @ @= begin if type(p)=hlist_node then print_esc("h") else if type(p)=vlist_node then print_esc("v") else print_esc("unset"); print("box("); print_scaled(height(p)); print_char("+"); print_scaled(depth(p)); print(")x"); print_scaled(width(p)); if type(p)=unset_node then @ else begin @; if shift_amount(p)<>0 then begin print(", shifted "); print_scaled(shift_amount(p)); end; end; node_list_display(list_ptr(p)); {recursive call} end @ @= begin if span_count(p)<>min_quarterword then begin print(" ("); print_int(qo(span_count(p))+1); print(" columns)"); end; if glue_stretch(p)<>0 then begin print(", stretch "); print_glue(glue_stretch(p),glue_order(p),0); end; if glue_shrink(p)<>0 then begin print(", shrink "); print_glue(glue_shrink(p),glue_sign(p),0); end; end @ The code will have to change in this place if |glue_ratio| is a structured type instead of an ordinary |real|. Note that this routine should avoid arithmetic errors even if the |glue_set| field holds an arbitrary random value. The following code assumes that a properly formed nonzero |real| number has absolute value $2^{20}$ or more when it is regarded as an integer; this precaution was adequate to prevent floating point underflow on the author's computer. @^system dependencies@> @^dirty \PASCAL@> @= g:=float(glue_set(p)); if (g<>float_constant(0))and(glue_sign(p)<>normal) then begin print(", glue set "); if glue_sign(p)=shrinking then print("- "); if abs(mem[p+glue_offset].int)<@'4000000 then print("?.?") else if abs(g)>float_constant(20000) then begin if g>float_constant(0) then print_char(">") else print("< -"); print_glue(20000*unity,glue_order(p),0); end else print_glue(round(unity*g),glue_order(p),0); @^real multiplication@> end @ @= begin print_esc("rule("); print_rule_dimen(height(p)); print_char("+"); print_rule_dimen(depth(p)); print(")x"); print_rule_dimen(width(p)); end @ @= begin print_esc("insert"); print_int(qo(subtype(p))); print(", natural size "); print_scaled(height(p)); print("; split("); print_spec(split_top_ptr(p),0); print_char(","); print_scaled(depth(p)); print("); float cost "); print_int(float_cost(p)); node_list_display(ins_ptr(p)); {recursive call} end @ @= if subtype(p)>=a_leaders then @ else begin print_esc("glue"); if subtype(p)<>normal then begin print_char("("); if subtype(p)cond_math_glue then begin print_char(" "); if subtype(p)= begin print_esc(""); if subtype(p)=c_leaders then print_char("c") else if subtype(p)=x_leaders then print_char("x"); print("leaders "); print_spec(glue_ptr(p),0); node_list_display(leader_ptr(p)); {recursive call} end @ An ``explicit'' kern value is indicated implicitly by an explicit space. @= if subtype(p)<>mu_glue then begin print_esc("kern"); if subtype(p)<>normal then print_char(" "); print_scaled(width(p)); if subtype(p)=acc_kern then print(" (for accent)"); @.for accent@> end else begin print_esc("mkern"); print_scaled(width(p)); print("mu"); end @ @= begin print_esc("math"); if subtype(p)=before then print("on") else print("off"); if width(p)<>0 then begin print(", surrounded "); print_scaled(width(p)); end; end @ @= begin print_font_and_char(lig_char(p)); print(" (ligature "); if subtype(p)>1 then print_char("|"); font_in_short_display:=font(lig_char(p)); short_display(lig_ptr(p)); if odd(subtype(p)) then print_char("|"); print_char(")"); end @ @= begin print_esc("penalty "); print_int(penalty(p)); end @ The |post_break| list of a discretionary node is indicated by a prefixed `\.{\char'174}' instead of the `\..' before the |pre_break| list. @= begin print_esc("discretionary"); if replace_count(p)>0 then begin print(" replacing "); print_int(replace_count(p)); end; node_list_display(pre_break(p)); {recursive call} append_char("|"); show_node_list(post_break(p)); flush_char; {recursive call} end @ @= begin print_esc("mark"); print_mark(mark_ptr(p)); end @ @= begin print_esc("vadjust"); node_list_display(adjust_ptr(p)); {recursive call} end @ The recursive machinery is started by calling |show_box|. @^recursion@> @p procedure show_box(@!p:pointer); begin @; if breadth_max<=0 then breadth_max:=5; if pool_ptr+depth_threshold>=pool_size then depth_threshold:=pool_size-pool_ptr-1; {now there's enough room for prefix string} show_node_list(p); {the show starts at |p|} print_ln; end; @* \[13] Destroying boxes. When we are done with a node list, we are obliged to return it to free storage, including all of its sublists. The recursive procedure |flush_node_list| does this for us. @ First, however, we shall consider two non-recursive procedures that do simpler tasks. The first of these, |delete_token_ref|, is called when a pointer to a token list's reference count is being removed. This means that the token list should disappear if the reference count was |null|, otherwise the count should be decreased by one. @^reference counts@> @d token_ref_count(#) == info(#) {reference count preceding a token list} @p procedure delete_token_ref(@!p:pointer); {|p| points to the reference count of a token list that is losing one reference} begin if token_ref_count(p)=null then flush_list(p) else decr(token_ref_count(p)); end; @ Similarly, |delete_glue_ref| is called when a pointer to a glue specification is being withdrawn. @^reference counts@> @d fast_delete_glue_ref(#)==@t@>@;@/ begin if glue_ref_count(#)=null then free_node(#,glue_spec_size) else decr(glue_ref_count(#)); end @p procedure delete_glue_ref(@!p:pointer); {|p| points to a glue specification} fast_delete_glue_ref(p); @ Now we are ready to delete any node list, recursively. In practice, the nodes deleted are usually charnodes (about 2/3 of the time), and they are glue nodes in about half of the remaining cases. @^recursion@> @p procedure flush_node_list(@!p:pointer); {erase list of nodes starting at |p|} label done; {go here when node |p| has been freed} var q:pointer; {successor to node |p|} begin while p<>null do @^inner loop@> begin q:=link(p); if is_char_node(p) then free_avail(p) else begin case type(p) of hlist_node,vlist_node,unset_node: begin flush_node_list(list_ptr(p)); free_node(p,box_node_size); goto done; end; rule_node: begin free_node(p,rule_node_size); goto done; end; ins_node: begin flush_node_list(ins_ptr(p)); delete_glue_ref(split_top_ptr(p)); free_node(p,ins_node_size); goto done; end; whatsit_node: @; glue_node: begin fast_delete_glue_ref(glue_ptr(p)); if leader_ptr(p)<>null then flush_node_list(leader_ptr(p)); end; kern_node,math_node,penalty_node: do_nothing; ligature_node: flush_node_list(lig_ptr(p)); mark_node: delete_token_ref(mark_ptr(p)); disc_node: begin flush_node_list(pre_break(p)); flush_node_list(post_break(p)); end; adjust_node: flush_node_list(adjust_ptr(p)); @t\4@>@@; othercases confusion("flushing") @:this can't happen flushing}{\quad flushing@> endcases;@/ free_node(p,small_node_size); done:end; p:=q; end; end; @* \[14] Copying boxes. Another recursive operation that acts on boxes is sometimes needed: The procedure |copy_node_list| returns a pointer to another node list that has the same structure and meaning as the original. Note that since glue specifications and token lists have reference counts, we need not make copies of them. Reference counts can never get too large to fit in a halfword, since each pointer to a node is in a different memory address, and the total number of memory addresses fits in a halfword. @^recursion@> @^reference counts@> (Well, there actually are also references from outside |mem|; if the |save_stack| is made arbitrarily large, it would theoretically be possible to break \TeX\ by overflowing a reference count. But who would want to do that?) @d add_token_ref(#)==incr(token_ref_count(#)) {new reference to a token list} @d add_glue_ref(#)==incr(glue_ref_count(#)) {new reference to a glue spec} @ The copying procedure copies words en masse without bothering to look at their individual fields. If the node format changes---for example, if the size is altered, or if some link field is moved to another relative position---then this code may need to be changed too. @^data structure assumptions@> @p function copy_node_list(@!p:pointer):pointer; {makes a duplicate of the node list that starts at |p| and returns a pointer to the new list} var h:pointer; {temporary head of copied list} @!q:pointer; {previous position in new list} @!r:pointer; {current node being fabricated for new list} @!words:0..5; {number of words remaining to be copied} begin h:=get_avail; q:=h; while p<>null do begin @; link(q):=r; q:=r; p:=link(p); end; link(q):=null; q:=link(h); free_avail(h); copy_node_list:=q; end; @ @= words:=1; {this setting occurs in more branches than any other} if is_char_node(p) then r:=get_avail else @; while words>0 do begin decr(words); mem[r+words]:=mem[p+words]; end @ @= case type(p) of hlist_node,vlist_node,unset_node: begin r:=get_node(box_node_size); mem[r+6]:=mem[p+6]; mem[r+5]:=mem[p+5]; {copy the last two words} list_ptr(r):=copy_node_list(list_ptr(p)); {this affects |mem[r+5]|} words:=5; end; rule_node: begin r:=get_node(rule_node_size); words:=rule_node_size; end; ins_node: begin r:=get_node(ins_node_size); mem[r+4]:=mem[p+4]; add_glue_ref(split_top_ptr(p)); ins_ptr(r):=copy_node_list(ins_ptr(p)); {this affects |mem[r+4]|} words:=ins_node_size-1; end; whatsit_node:@; glue_node: begin r:=get_node(small_node_size); add_glue_ref(glue_ptr(p)); glue_ptr(r):=glue_ptr(p); leader_ptr(r):=copy_node_list(leader_ptr(p)); end; kern_node,math_node,penalty_node: begin r:=get_node(small_node_size); words:=small_node_size; end; ligature_node: begin r:=get_node(small_node_size); mem[lig_char(r)]:=mem[lig_char(p)]; {copy |font| and |character|} lig_ptr(r):=copy_node_list(lig_ptr(p)); end; disc_node: begin r:=get_node(small_node_size); pre_break(r):=copy_node_list(pre_break(p)); post_break(r):=copy_node_list(post_break(p)); end; mark_node: begin r:=get_node(small_node_size); add_token_ref(mark_ptr(p)); words:=small_node_size; end; adjust_node: begin r:=get_node(small_node_size); adjust_ptr(r):=copy_node_list(adjust_ptr(p)); end; {|words=1=small_node_size-1|} othercases confusion("copying") @:this can't happen copying}{\quad copying@> endcases @* \[15] The command codes. Before we can go any further, we need to define symbolic names for the internal code numbers that represent the various commands obeyed by \TeX. These codes are somewhat arbitrary, but not completely so. For example, the command codes for character types are fixed by the language, since a user says, e.g., `\.{\\catcode \`\\\${} = 3}' to make \.{\char'44} a math delimiter, and the command code |math_shift| is equal to~3. Some other codes have been made adjacent so that |case| statements in the program need not consider cases that are widely spaced, or so that |case| statements can be replaced by |if| statements. At any rate, here is the list, for future reference. First come the ``catcode'' commands, several of which share their numeric codes with ordinary commands when the catcode cannot emerge from \TeX's scanning routine. @d escape=0 {escape delimiter (called \.\\ in {\sl The \TeX book\/})} @:TeXbook}{\sl The \TeX book@> @d relax=0 {do nothing ( \.{\\relax} )} @d left_brace=1 {beginning of a group ( \.\{ )} @d right_brace=2 {ending of a group ( \.\} )} @d math_shift=3 {mathematics shift character ( \.\$ )} @d tab_mark=4 {alignment delimiter ( \.\&, \.{\\span} )} @d car_ret=5 {end of line ( |carriage_return|, \.{\\cr}, \.{\\crcr} )} @d out_param=5 {output a macro parameter} @d mac_param=6 {macro parameter symbol ( \.\# )} @d sup_mark=7 {superscript ( \.{\char'136} )} @d sub_mark=8 {subscript ( \.{\char'137} )} @d ignore=9 {characters to ignore ( \.{\^\^J} )} @d endv=9 {end of \ list in alignment template} @d spacer=10 {characters equivalent to blank space ( \.{\ } )} @d letter=11 {characters regarded as letters ( \.{A..Z}, \.{a..z} )} @d other_char=12 {none of the special character types} @d active_char=13 {characters that invoke macros ( \.{\^\^[} )} @d par_end=13 {end of paragraph ( \.{\\par} )} @d match=13 {match a macro parameter} @d comment=14 {characters that introduce comments ( \.\% )} @d end_match=14 {end of parameters to macro} @d stop=14 {end of job ( \.{\\end}, \.{\\dump} )} @d invalid_char=15 {characters that shouldn't appear ( \.{\^\^?} )} @d delim_num=15 {specify delimiter numerically ( \.{\\delimiter} )} @d max_char_code=15 {largest catcode for individual characters} @ Next are the ordinary run-of-the-mill command codes. Codes that are |min_internal| or more represent internal quantities that might be expanded by `\.{\\the}'. @d char_num=16 {character specified numerically ( \.{\\char} )} @d math_char_num=17 {explicit math code ( \.{\\mathchar} )} @d mark=18 {mark definition ( \.{\\mark} )} @d xray=19 {peek inside of \TeX\ ( \.{\\show}, \.{\\showbox}, etc.~)} @d make_box=20 {make a box ( \.{\\box}, \.{\\copy}, \.{\\hbox}, etc.~)} @d hmove=21 {horizontal motion ( \.{\\moveleft}, \.{\\moveright} )} @d vmove=22 {vertical motion ( \.{\\raise}, \.{\\lower} )} @d un_hbox=23 {unglue a box ( \.{\\unhbox}, \.{\\unhcopy} )} @d un_vbox=24 {unglue a box ( \.{\\unvbox}, \.{\\unvcopy} )} @d remove_item=25 {nullify last item ( \.{\\unpenalty}, \.{\\unkern}, \.{\\unskip} )} @d hskip=26 {horizontal glue ( \.{\\hskip}, \.{\\hfil}, etc.~)} @d vskip=27 {vertical glue ( \.{\\vskip}, \.{\\vfil}, etc.~)} @d mskip=28 {math glue ( \.{\\mskip} )} @d kern=29 {fixed space ( \.{\\kern})} @d mkern=30 {math kern ( \.{\\mkern} )} @d leader_ship=31 {use a box ( \.{\\shipout}, \.{\\leaders}, etc.~)} @d halign=32 {horizontal table alignment ( \.{\\halign} )} @d valign=33 {vertical table alignment ( \.{\\valign} )} @d no_align=34 {temporary escape from alignment ( \.{\\noalign} )} @d vrule=35 {vertical rule ( \.{\\vrule} )} @d hrule=36 {horizontal rule ( \.{\\hrule} )} @d insert=37 {vlist inserted in box ( \.{\\insert} )} @d vadjust=38 {vlist inserted in enclosing paragraph ( \.{\\vadjust} )} @d ignore_spaces=39 {gobble |spacer| tokens ( \.{\\ignorespaces} )} @d after_assignment=40 {save till assignment is done ( \.{\\afterassignment} )} @d after_group=41 {save till group is done ( \.{\\aftergroup} )} @d break_penalty=42 {additional badness ( \.{\\penalty} )} @d start_par=43 {begin paragraph ( \.{\\indent}, \.{\\noindent} )} @d ital_corr=44 {italic correction ( \.{\\/} )} @d accent=45 {attach accent in text ( \.{\\accent} )} @d math_accent=46 {attach accent in math ( \.{\\mathaccent} )} @d discretionary=47 {discretionary texts ( \.{\\-}, \.{\\discretionary} )} @d eq_no=48 {equation number ( \.{\\eqno}, \.{\\leqno} )} @d left_right=49 {variable delimiter ( \.{\\left}, \.{\\right} )} @d math_comp=50 {component of formula ( \.{\\mathbin}, etc.~)} @d limit_switch=51 {diddle limit conventions ( \.{\\displaylimits}, etc.~)} @d above=52 {generalized fraction ( \.{\\above}, \.{\\atop}, etc.~)} @d math_style=53 {style specification ( \.{\\displaystyle}, etc.~)} @d math_choice=54 {choice specification ( \.{\\mathchoice} )} @d non_script=55 {conditional math glue ( \.{\\nonscript} )} @d vcenter=56 {vertically center a vbox ( \.{\\vcenter} )} @d case_shift=57 {force specific case ( \.{\\lowercase}, \.{\\uppercase}~)} @d message=58 {send to user ( \.{\\message}, \.{\\errmessage} )} @d extension=59 {extensions to \TeX\ ( \.{\\write}, \.{\\special}, etc.~)} @d in_stream=60 {files for reading ( \.{\\openin}, \.{\\closein} )} @d begin_group=61 {begin local grouping ( \.{\\begingroup} )} @d end_group=62 {end local grouping ( \.{\\endgroup} )} @d omit=63 {omit alignment template ( \.{\\omit} )} @d ex_space=64 {explicit space ( \.{\\\ } )} @d no_boundary=65 {suppress boundary ligatures ( \.{\\noboundary} )} @d radical=66 {square root and similar signs ( \.{\\radical} )} @d end_cs_name=67 {end control sequence ( \.{\\endcsname} )} @d min_internal=68 {the smallest code that can follow \.{\\the}} @d char_given=68 {character code defined by \.{\\chardef}} @d math_given=69 {math code defined by \.{\\mathchardef}} @d last_item=70 {most recent item ( \.{\\lastpenalty}, \.{\\lastkern}, \.{\\lastskip} )} @d max_non_prefixed_command=70 {largest command code that can't be \.{\\global}} @ The next codes are special; they all relate to mode-independent assignment of values to \TeX's internal registers or tables. Codes that are |max_internal| or less represent internal quantities that might be expanded by `\.{\\the}'. @d toks_register=71 {token list register ( \.{\\toks} )} @d assign_toks=72 {special token list ( \.{\\output}, \.{\\everypar}, etc.~)} @d assign_int=73 {user-defined integer ( \.{\\tolerance}, \.{\\day}, etc.~)} @d assign_dimen=74 {user-defined length ( \.{\\hsize}, etc.~)} @d assign_glue=75 {user-defined glue ( \.{\\baselineskip}, etc.~)} @d assign_mu_glue=76 {user-defined muglue ( \.{\\thinmuskip}, etc.~)} @d assign_font_dimen=77 {user-defined font dimension ( \.{\\fontdimen} )} @d assign_font_int=78 {user-defined font integer ( \.{\\hyphenchar}, \.{\\skewchar} )} @d set_aux=79 {specify state info ( \.{\\spacefactor}, \.{\\prevdepth} )} @d set_prev_graf=80 {specify state info ( \.{\\prevgraf} )} @d set_page_dimen=81 {specify state info ( \.{\\pagegoal}, etc.~)} @d set_page_int=82 {specify state info ( \.{\\deadcycles}, \.{\\insertpenalties} )} @d set_box_dimen=83 {change dimension of box ( \.{\\wd}, \.{\\ht}, \.{\\dp} )} @d set_shape=84 {specify fancy paragraph shape ( \.{\\parshape} )} @d def_code=85 {define a character code ( \.{\\catcode}, etc.~)} @d def_family=86 {declare math fonts ( \.{\\textfont}, etc.~)} @d set_font=87 {set current font ( font identifiers )} @d def_font=88 {define a font file ( \.{\\font} )} @d register=89 {internal register ( \.{\\count}, \.{\\dimen}, etc.~)} @d max_internal=89 {the largest code that can follow \.{\\the}} @d advance=90 {advance a register or parameter ( \.{\\advance} )} @d multiply=91 {multiply a register or parameter ( \.{\\multiply} )} @d divide=92 {divide a register or parameter ( \.{\\divide} )} @d prefix=93 {qualify a definition ( \.{\\global}, \.{\\long}, \.{\\outer} )} @d let=94 {assign a command code ( \.{\\let}, \.{\\futurelet} )} @d shorthand_def=95 {code definition ( \.{\\chardef}, \.{\\countdef}, etc.~)} @d read_to_cs=96 {read into a control sequence ( \.{\\read} )} @d def=97 {macro definition ( \.{\\def}, \.{\\gdef}, \.{\\xdef}, \.{\\edef} )} @d set_box=98 {set a box ( \.{\\setbox} )} @d hyph_data=99 {hyphenation data ( \.{\\hyphenation}, \.{\\patterns} )} @d set_interaction=100 {define level of interaction ( \.{\\batchmode}, etc.~)} @d max_command=100 {the largest command code seen at |big_switch|} @ The remaining command codes are extra special, since they cannot get through \TeX's scanner to the main control routine. They have been given values higher than |max_command| so that their special nature is easily discernible. The ``expandable'' commands come first. @d undefined_cs=max_command+1 {initial state of most |eq_type| fields} @d expand_after=max_command+2 {special expansion ( \.{\\expandafter} )} @d no_expand=max_command+3 {special nonexpansion ( \.{\\noexpand} )} @d input=max_command+4 {input a source file ( \.{\\input}, \.{\\endinput} )} @d if_test=max_command+5 {conditional text ( \.{\\if}, \.{\\ifcase}, etc.~)} @d fi_or_else=max_command+6 {delimiters for conditionals ( \.{\\else}, etc.~)} @d cs_name=max_command+7 {make a control sequence from tokens ( \.{\\csname} )} @d convert=max_command+8 {convert to text ( \.{\\number}, \.{\\string}, etc.~)} @d the=max_command+9 {expand an internal quantity ( \.{\\the} )} @d top_bot_mark=max_command+10 {inserted mark ( \.{\\topmark}, etc.~)} @d call=max_command+11 {non-long, non-outer control sequence} @d long_call=max_command+12 {long, non-outer control sequence} @d outer_call=max_command+13 {non-long, outer control sequence} @d long_outer_call=max_command+14 {long, outer control sequence} @d end_template=max_command+15 {end of an alignment template} @d dont_expand=max_command+16 {the following token was marked by \.{\\noexpand}} @d glue_ref=max_command+17 {the equivalent points to a glue specification} @d shape_ref=max_command+18 {the equivalent points to a parshape specification} @d box_ref=max_command+19 {the equivalent points to a box node, or is |null|} @d data=max_command+20 {the equivalent is simply a halfword number} @* \[16] The semantic nest. \TeX\ is typically in the midst of building many lists at once. For example, when a math formula is being processed, \TeX\ is in math mode and working on an mlist; this formula has temporarily interrupted \TeX\ from being in horizontal mode and building the hlist of a paragraph; and this paragraph has temporarily interrupted \TeX\ from being in vertical mode and building the vlist for the next page of a document. Similarly, when a \.{\\vbox} occurs inside of an \.{\\hbox}, \TeX\ is temporarily interrupted from working in restricted horizontal mode, and it enters internal vertical mode. The ``semantic nest'' is a stack that keeps track of what lists and modes are currently suspended. At each level of processing we are in one of six modes: \yskip\hang|vmode| stands for vertical mode (the page builder); \hang|hmode| stands for horizontal mode (the paragraph builder); \hang|mmode| stands for displayed formula mode; \hang|-vmode| stands for internal vertical mode (e.g., in a \.{\\vbox}); \hang|-hmode| stands for restricted horizontal mode (e.g., in an \.{\\hbox}); \hang|-mmode| stands for math formula mode (not displayed). \yskip\noindent The mode is temporarily set to zero while processing \.{\\write} texts in the |ship_out| routine. Numeric values are assigned to |vmode|, |hmode|, and |mmode| so that \TeX's ``big semantic switch'' can select the appropriate thing to do by computing the value |abs(mode)+cur_cmd|, where |mode| is the current mode and |cur_cmd| is the current command code. @d vmode=1 {vertical mode} @d hmode=vmode+max_command+1 {horizontal mode} @d mmode=hmode+max_command+1 {math mode} @p procedure print_mode(@!m:integer); {prints the mode represented by |m|} begin if m>0 then case m div (max_command+1) of 0:print("vertical"); 1:print("horizontal"); 2:print("display math"); end else if m=0 then print("no") else case (-m) div (max_command+1) of 0:print("internal vertical"); 1:print("restricted horizontal"); 2:print("math"); end; print(" mode"); end; @ The state of affairs at any semantic level can be represented by five values: \yskip\hang|mode| is the number representing the semantic mode, as just explained. \yskip\hang|head| is a |pointer| to a list head for the list being built; |link(head)| therefore points to the first element of the list, or to |null| if the list is empty. \yskip\hang|tail| is a |pointer| to the final node of the list being built; thus, |tail=head| if and only if the list is empty. \yskip\hang|prev_graf| is the number of lines of the current paragraph that have already been put into the present vertical list. \yskip\hang|aux| is an auxiliary |memory_word| that gives further information that is needed to characterize the situation. \yskip\noindent In vertical mode, |aux| is also known as |prev_depth|; it is the scaled value representing the depth of the previous box, for use in baseline calculations, or it is |<=-1000|pt if the next box on the vertical list is to be exempt from baseline calculations. In horizontal mode, |aux| is also known as |space_factor| and |clang|; it holds the current space factor used in spacing calculations, and the current language used for hyphenation. (The value of |clang| is undefined in restricted horizontal mode.) In math mode, |aux| is also known as |incompleat_noad|; if not |null|, it points to a record that represents the numerator of a generalized fraction for which the denominator is currently being formed in the current list. There is also a sixth quantity, |mode_line|, which correlates the semantic nest with the user's input; |mode_line| contains the source line number at which the current level of nesting was entered. The negative of this line number is the |mode_line| at the level of the user's output routine. Seventh and eighth quantities, |lhmin| and |rhmin|, affect hyphenation. The semantic nest is an array called |nest| that holds the |mode|, |head|, |tail|, |prev_graf|, |aux|, and |mode_line| values for all semantic levels below the currently active one. Information about the currently active level is kept in the global quantities |mode|, |head|, |tail|, |prev_graf|, |aux|, and |mode_line|, which live in a \PASCAL\ record that is ready to be pushed onto |nest| if necessary. @d ignore_depth==-65536000 {|prev_depth| value that is ignored} @= @!list_state_record=record@!mode_field:-mmode..mmode;@+ @!head_field,@!tail_field: pointer; @!pg_field,@!ml_field: integer;@+ @!aux_field: memory_word;@+ @!lhm_field,@!rhm_field: quarterword; end; @ @d mode==cur_list.mode_field {current mode} @d head==cur_list.head_field {header node of current list} @d tail==cur_list.tail_field {final node on current list} @d prev_graf==cur_list.pg_field {number of paragraph lines accumulated} @d aux==cur_list.aux_field {auxiliary data about the current list} @d prev_depth==aux.sc {the name of |aux| in vertical mode} @d space_factor==aux.hh.lh {part of |aux| in horizontal mode} @d clang==aux.hh.rh {the other part of |aux| in horizontal mode} @d incompleat_noad==aux.int {the name of |aux| in math mode} @d mode_line==cur_list.ml_field {source file line number at beginning of list} @d lhmin==cur_list.lhm_field {\.{\\lefthyphenmin} at start of paragraph} @d rhmin==cur_list.rhm_field {\.{\\righthyphenmin} at start of paragraph} @= @!nest:array[0..nest_size] of list_state_record; @!nest_ptr:0..nest_size; {first unused location of |nest|} @!max_nest_stack:0..nest_size; {maximum of |nest_ptr| when pushing} @!cur_list:list_state_record; {the ``top'' semantic state} @!shown_mode:-mmode..mmode; {most recent mode shown by \.{\\tracingcommands}} @ Here is a common way to make the current list grow: @d tail_append(#)==begin link(tail):=#; tail:=link(tail); end @ We will see later that the vertical list at the bottom semantic level is split into two parts; the ``current page'' runs from |page_head| to |page_tail|, and the ``contribution list'' runs from |contrib_head| to |tail| of semantic level zero. The idea is that contributions are first formed in vertical mode, then ``contributed'' to the current page (during which time the page-breaking decisions are made). For now, we don't need to know any more details about the page-building process. @= nest_ptr:=0; max_nest_stack:=0; mode:=vmode; head:=contrib_head; tail:=contrib_head; prev_depth:=ignore_depth; mode_line:=0; lhmin:=0; rhmin:=0; prev_graf:=0; shown_mode:=0; @; @ When \TeX's work on one level is interrupted, the state is saved by calling |push_nest|. This routine changes |head| and |tail| so that a new (empty) list is begun; it does not change |mode| or |aux|. @p procedure push_nest; {enter a new semantic level, save the old} begin if nest_ptr>max_nest_stack then begin max_nest_stack:=nest_ptr; if nest_ptr=nest_size then overflow("semantic nest size",nest_size); @:TeX capacity exceeded semantic nest size}{\quad semantic nest size@> end; nest[nest_ptr]:=cur_list; {stack the record} incr(nest_ptr); head:=get_avail; tail:=head; prev_graf:=0; mode_line:=line; end; @ Conversely, when \TeX\ is finished on the current level, the former state is restored by calling |pop_nest|. This routine will never be called at the lowest semantic level, nor will it be called unless |head| is a node that should be returned to free memory. @p procedure pop_nest; {leave a semantic level, re-enter the old} begin free_avail(head); decr(nest_ptr); cur_list:=nest[nest_ptr]; end; @ Here is a procedure that displays what \TeX\ is working on, at all levels. @p procedure@?print_totals; forward;@t\2@> procedure show_activities; var p:0..nest_size; {index into |nest|} @!m:-mmode..mmode; {mode} @!a:memory_word; {auxiliary} @!q,@!r:pointer; {for showing the current page} @!t:integer; {ditto} begin nest[nest_ptr]:=cur_list; {put the top level into the array} print_nl(""); print_ln; for p:=nest_ptr downto 0 do begin m:=nest[p].mode_field; a:=nest[p].aux_field; print_nl("### "); print_mode(m); print(" entered at line "); print_int(abs(nest[p].ml_field)); if m=hmode then@+if(nest[p].lhm_field<>2)or(nest[p].rhm_field<>3)then begin print(" (hyphenmin "); print_int(nest[p].lhm_field); print_char(","); print_int(nest[p].rhm_field); print_char(")"); end; if nest[p].ml_field<0 then print(" (\output routine)"); if p=0 then begin @; if link(contrib_head)<>null then print_nl("### recent contributions:"); end; show_box(link(nest[p].head_field)); @; end; end; @ @= case abs(m) div (max_command+1) of 0: begin print_nl("prevdepth "); if a.sc<=ignore_depth then print("ignored") else print_scaled(a.sc); if nest[p].pg_field<>0 then begin print(", prevgraf "); print_int(nest[p].pg_field); print(" line"); if nest[p].pg_field<>1 then print_char("s"); end; end; 1: begin print_nl("spacefactor "); print_int(a.hh.lh); if m>0 then@+ if a.hh.rh>0 then begin print(", current language "); print_int(a.hh.rh);@+ end; end; 2: if a.int<>null then begin print("this will be denominator of:"); show_box(a.int);@+ end; end {there are no other cases} @* \[17] The table of equivalents. Now that we have studied the data structures for \TeX's semantic routines, we ought to consider the data structures used by its syntactic routines. In other words, our next concern will be the tables that \TeX\ looks at when it is scanning what the user has written. The biggest and most important such table is called |eqtb|. It holds the current ``equivalents'' of things; i.e., it explains what things mean or what their current values are, for all quantities that are subject to the nesting structure provided by \TeX's grouping mechanism. There are six parts to |eqtb|: \yskip\hang 1) |eqtb[active_base..(hash_base-1)]| holds the current equivalents of single-character control sequences. \yskip\hang 2) |eqtb[hash_base..(glue_base-1)]| holds the current equivalents of multiletter control sequences. \yskip\hang 3) |eqtb[glue_base..(local_base-1)]| holds the current equivalents of glue parameters like the current baselineskip. \yskip\hang 4) |eqtb[local_base..(int_base-1)]| holds the current equivalents of local halfword quantities like the current box registers, the current ``catcodes,'' the current font, and a pointer to the current paragraph shape. \yskip\hang 5) |eqtb[int_base..(dimen_base-1)]| holds the current equivalents of fullword integer parameters like the current hyphenation penalty. \yskip\hang 6) |eqtb[dimen_base..eqtb_size]| holds the current equivalents of fullword dimension parameters like the current hsize or amount of hanging indentation. \yskip\noindent Note that, for example, the current amount of baselineskip glue is determined by the setting of a particular location in region~3 of |eqtb|, while the current meaning of the control sequence `\.{\\baselineskip}' (which might have been changed by \.{\\def} or \.{\\let}) appears in region~2. @ Each entry in |eqtb| is a |memory_word|. Most of these words are of type |two_halves|, and subdivided into three fields: \yskip\hang 1) The |eq_level| (a quarterword) is the level of grouping at which this equivalent was defined. If the level is |level_zero|, the equivalent has never been defined; |level_one| refers to the outer level (outside of all groups), and this level is also used for global definitions that never go away. Higher levels are for equivalents that will disappear at the end of their group. @^global definitions@> \yskip\hang 2) The |eq_type| (another quarterword) specifies what kind of entry this is. There are many types, since each \TeX\ primitive like \.{\\hbox}, \.{\\def}, etc., has its own special code. The list of command codes above includes all possible settings of the |eq_type| field. \yskip\hang 3) The |equiv| (a halfword) is the current equivalent value. This may be a font number, a pointer into |mem|, or a variety of other things. @d eq_level_field(#)==#.hh.b1 @d eq_type_field(#)==#.hh.b0 @d equiv_field(#)==#.hh.rh @d eq_level(#)==eq_level_field(eqtb[#]) {level of definition} @d eq_type(#)==eq_type_field(eqtb[#]) {command code for equivalent} @d equiv(#)==equiv_field(eqtb[#]) {equivalent value} @d level_zero=min_quarterword {level for undefined quantities} @d level_one=level_zero+1 {outermost level for defined quantities} @ Many locations in |eqtb| have symbolic names. The purpose of the next paragraphs is to define these names, and to set up the initial values of the equivalents. In the first region we have 256 equivalents for ``active characters'' that act as control sequences, followed by 256 equivalents for single-character control sequences. Then comes region~2, which corresponds to the hash table that we will define later. The maximum address in this region is used for a dummy control sequence that is perpetually undefined. There also are several locations for control sequences that are perpetually defined (since they are used in error recovery). @d active_base=1 {beginning of region 1, for active character equivalents} @d single_base=active_base+256 {equivalents of one-character control sequences} @d null_cs=single_base+256 {equivalent of \.{\\csname\\endcsname}} @d hash_base=null_cs+1 {beginning of region 2, for the hash table} @d frozen_control_sequence=hash_base+hash_size {for error recovery} @d frozen_protection=frozen_control_sequence {inaccessible but definable} @d frozen_cr=frozen_control_sequence+1 {permanent `\.{\\cr}'} @d frozen_end_group=frozen_control_sequence+2 {permanent `\.{\\endgroup}'} @d frozen_right=frozen_control_sequence+3 {permanent `\.{\\right}'} @d frozen_fi=frozen_control_sequence+4 {permanent `\.{\\fi}'} @d frozen_end_template=frozen_control_sequence+5 {permanent `\.{\\endtemplate}'} @d frozen_endv=frozen_control_sequence+6 {second permanent `\.{\\endtemplate}'} @d frozen_relax=frozen_control_sequence+7 {permanent `\.{\\relax}'} @d end_write=frozen_control_sequence+8 {permanent `\.{\\endwrite}'} @d frozen_dont_expand=frozen_control_sequence+9 {permanent `\.{\\notexpanded:}'} @d frozen_null_font=frozen_control_sequence+10 {permanent `\.{\\nullfont}'} @d font_id_base=frozen_null_font-font_base {begins table of 257 permanent font identifiers} @d undefined_control_sequence=frozen_null_font+257 {dummy location} @d glue_base=undefined_control_sequence+1 {beginning of region 3} @= eq_type(undefined_control_sequence):=undefined_cs; equiv(undefined_control_sequence):=null; eq_level(undefined_control_sequence):=level_zero; for k:=active_base to undefined_control_sequence-1 do eqtb[k]:=eqtb[undefined_control_sequence]; @ Here is a routine that displays the current meaning of an |eqtb| entry in region 1 or~2. (Similar routines for the other regions will appear below.) @= begin sprint_cs(n); print_char("="); print_cmd_chr(eq_type(n),equiv(n)); if eq_type(n)>=call then begin print_char(":"); show_token_list(link(equiv(n)),null,32); end; end @ Region 3 of |eqtb| contains the 256 \.{\\skip} registers, as well as the glue parameters defined here. It is important that the ``muskip'' parameters have larger numbers than the others. @d line_skip_code=0 {interline glue if |baseline_skip| is infeasible} @d baseline_skip_code=1 {desired glue between baselines} @d par_skip_code=2 {extra glue just above a paragraph} @d above_display_skip_code=3 {extra glue just above displayed math} @d below_display_skip_code=4 {extra glue just below displayed math} @d above_display_short_skip_code=5 {glue above displayed math following short lines} @d below_display_short_skip_code=6 {glue below displayed math following short lines} @d left_skip_code=7 {glue at left of justified lines} @d right_skip_code=8 {glue at right of justified lines} @d top_skip_code=9 {glue at top of main pages} @d split_top_skip_code=10 {glue at top of split pages} @d tab_skip_code=11 {glue between aligned entries} @d space_skip_code=12 {glue between words (if not |zero_glue|)} @d xspace_skip_code=13 {glue after sentences (if not |zero_glue|)} @d par_fill_skip_code=14 {glue on last line of paragraph} @d thin_mu_skip_code=15 {thin space in math formula} @d med_mu_skip_code=16 {medium space in math formula} @d thick_mu_skip_code=17 {thick space in math formula} @d glue_pars=18 {total number of glue parameters} @d skip_base=glue_base+glue_pars {table of 256 ``skip'' registers} @d mu_skip_base=skip_base+256 {table of 256 ``muskip'' registers} @d local_base=mu_skip_base+256 {beginning of region 4} @# @d skip(#)==equiv(skip_base+#) {|mem| location of glue specification} @d mu_skip(#)==equiv(mu_skip_base+#) {|mem| location of math glue spec} @d glue_par(#)==equiv(glue_base+#) {|mem| location of glue specification} @d line_skip==glue_par(line_skip_code) @d baseline_skip==glue_par(baseline_skip_code) @d par_skip==glue_par(par_skip_code) @d above_display_skip==glue_par(above_display_skip_code) @d below_display_skip==glue_par(below_display_skip_code) @d above_display_short_skip==glue_par(above_display_short_skip_code) @d below_display_short_skip==glue_par(below_display_short_skip_code) @d left_skip==glue_par(left_skip_code) @d right_skip==glue_par(right_skip_code) @d top_skip==glue_par(top_skip_code) @d split_top_skip==glue_par(split_top_skip_code) @d tab_skip==glue_par(tab_skip_code) @d space_skip==glue_par(space_skip_code) @d xspace_skip==glue_par(xspace_skip_code) @d par_fill_skip==glue_par(par_fill_skip_code) @d thin_mu_skip==glue_par(thin_mu_skip_code) @d med_mu_skip==glue_par(med_mu_skip_code) @d thick_mu_skip==glue_par(thick_mu_skip_code) @=glue_par(n) @ Sometimes we need to convert \TeX's internal code numbers into symbolic form. The |print_skip_param| routine gives the symbolic name of a glue parameter. @= procedure print_skip_param(@!n:integer); begin case n of line_skip_code: print_esc("lineskip"); baseline_skip_code: print_esc("baselineskip"); par_skip_code: print_esc("parskip"); above_display_skip_code: print_esc("abovedisplayskip"); below_display_skip_code: print_esc("belowdisplayskip"); above_display_short_skip_code: print_esc("abovedisplayshortskip"); below_display_short_skip_code: print_esc("belowdisplayshortskip"); left_skip_code: print_esc("leftskip"); right_skip_code: print_esc("rightskip"); top_skip_code: print_esc("topskip"); split_top_skip_code: print_esc("splittopskip"); tab_skip_code: print_esc("tabskip"); space_skip_code: print_esc("spaceskip"); xspace_skip_code: print_esc("xspaceskip"); par_fill_skip_code: print_esc("parfillskip"); thin_mu_skip_code: print_esc("thinmuskip"); med_mu_skip_code: print_esc("medmuskip"); thick_mu_skip_code: print_esc("thickmuskip"); othercases print("[unknown glue parameter!]") endcases; end; @ The symbolic names for glue parameters are put into \TeX's hash table by using the routine called |primitive|, defined below. Let us enter them now, so that we don't have to list all those parameter names anywhere else. @= primitive("lineskip",assign_glue,glue_base+line_skip_code);@/ @!@:line_skip_}{\.{\\lineskip} primitive@> primitive("baselineskip",assign_glue,glue_base+baseline_skip_code);@/ @!@:baseline_skip_}{\.{\\baselineskip} primitive@> primitive("parskip",assign_glue,glue_base+par_skip_code);@/ @!@:par_skip_}{\.{\\parskip} primitive@> primitive("abovedisplayskip",assign_glue,glue_base+above_display_skip_code);@/ @!@:above_display_skip_}{\.{\\abovedisplayskip} primitive@> primitive("belowdisplayskip",assign_glue,glue_base+below_display_skip_code);@/ @!@:below_display_skip_}{\.{\\belowdisplayskip} primitive@> primitive("abovedisplayshortskip", assign_glue,glue_base+above_display_short_skip_code);@/ @!@:above_display_short_skip_}{\.{\\abovedisplayshortskip} primitive@> primitive("belowdisplayshortskip", assign_glue,glue_base+below_display_short_skip_code);@/ @!@:below_display_short_skip_}{\.{\\belowdisplayshortskip} primitive@> primitive("leftskip",assign_glue,glue_base+left_skip_code);@/ @!@:left_skip_}{\.{\\leftskip} primitive@> primitive("rightskip",assign_glue,glue_base+right_skip_code);@/ @!@:right_skip_}{\.{\\rightskip} primitive@> primitive("topskip",assign_glue,glue_base+top_skip_code);@/ @!@:top_skip_}{\.{\\topskip} primitive@> primitive("splittopskip",assign_glue,glue_base+split_top_skip_code);@/ @!@:split_top_skip_}{\.{\\splittopskip} primitive@> primitive("tabskip",assign_glue,glue_base+tab_skip_code);@/ @!@:tab_skip_}{\.{\\tabskip} primitive@> primitive("spaceskip",assign_glue,glue_base+space_skip_code);@/ @!@:space_skip_}{\.{\\spaceskip} primitive@> primitive("xspaceskip",assign_glue,glue_base+xspace_skip_code);@/ @!@:xspace_skip_}{\.{\\xspaceskip} primitive@> primitive("parfillskip",assign_glue,glue_base+par_fill_skip_code);@/ @!@:par_fill_skip_}{\.{\\parfillskip} primitive@> primitive("thinmuskip",assign_mu_glue,glue_base+thin_mu_skip_code);@/ @!@:thin_mu_skip_}{\.{\\thinmuskip} primitive@> primitive("medmuskip",assign_mu_glue,glue_base+med_mu_skip_code);@/ @!@:med_mu_skip_}{\.{\\medmuskip} primitive@> primitive("thickmuskip",assign_mu_glue,glue_base+thick_mu_skip_code);@/ @!@:thick_mu_skip_}{\.{\\thickmuskip} primitive@> @ @= assign_glue,assign_mu_glue: if chr_code= equiv(glue_base):=zero_glue; eq_level(glue_base):=level_one; eq_type(glue_base):=glue_ref; for k:=glue_base+1 to local_base-1 do eqtb[k]:=eqtb[glue_base]; glue_ref_count(zero_glue):=glue_ref_count(zero_glue)+local_base-glue_base; @ @= if n= primitive("output",assign_toks,output_routine_loc); @!@:output_}{\.{\\output} primitive@> primitive("everypar",assign_toks,every_par_loc); @!@:every_par_}{\.{\\everypar} primitive@> primitive("everymath",assign_toks,every_math_loc); @!@:every_math_}{\.{\\everymath} primitive@> primitive("everydisplay",assign_toks,every_display_loc); @!@:every_display_}{\.{\\everydisplay} primitive@> primitive("everyhbox",assign_toks,every_hbox_loc); @!@:every_hbox_}{\.{\\everyhbox} primitive@> primitive("everyvbox",assign_toks,every_vbox_loc); @!@:every_vbox_}{\.{\\everyvbox} primitive@> primitive("everyjob",assign_toks,every_job_loc); @!@:every_job_}{\.{\\everyjob} primitive@> primitive("everycr",assign_toks,every_cr_loc); @!@:every_cr_}{\.{\\everycr} primitive@> primitive("errhelp",assign_toks,err_help_loc); @!@:err_help_}{\.{\\errhelp} primitive@> @ @= assign_toks: if chr_code>=toks_base then begin print_esc("toks"); print_int(chr_code-toks_base); end else case chr_code of output_routine_loc: print_esc("output"); every_par_loc: print_esc("everypar"); every_math_loc: print_esc("everymath"); every_display_loc: print_esc("everydisplay"); every_hbox_loc: print_esc("everyhbox"); every_vbox_loc: print_esc("everyvbox"); every_job_loc: print_esc("everyjob"); every_cr_loc: print_esc("everycr"); othercases print_esc("errhelp") endcases; @ We initialize most things to null or undefined values. An undefined font is represented by the internal code |font_base|. However, the character code tables are given initial values based on the conventional interpretation of ASCII code. These initial values should not be changed when \TeX\ is adapted for use with non-English languages; all changes to the initialization conventions should be made in format packages, not in \TeX\ itself, so that global interchange of formats is possible. @d null_font==font_base @d var_code==@'70000 {math code meaning ``use the current family''} @= par_shape_ptr:=null; eq_type(par_shape_loc):=shape_ref; eq_level(par_shape_loc):=level_one;@/ for k:=output_routine_loc to toks_base+255 do eqtb[k]:=eqtb[undefined_control_sequence]; box(0):=null; eq_type(box_base):=box_ref; eq_level(box_base):=level_one; for k:=box_base+1 to box_base+255 do eqtb[k]:=eqtb[box_base]; cur_font:=null_font; eq_type(cur_font_loc):=data; eq_level(cur_font_loc):=level_one;@/ for k:=math_font_base to math_font_base+47 do eqtb[k]:=eqtb[cur_font_loc]; equiv(cat_code_base):=0; eq_type(cat_code_base):=data; eq_level(cat_code_base):=level_one;@/ for k:=cat_code_base+1 to int_base-1 do eqtb[k]:=eqtb[cat_code_base]; for k:=0 to 255 do begin cat_code(k):=other_char; math_code(k):=hi(k); sf_code(k):=1000; end; cat_code(carriage_return):=car_ret; cat_code(" "):=spacer; cat_code("\"):=escape; cat_code("%"):=comment; cat_code(invalid_code):=invalid_char; cat_code(null_code):=ignore; for k:="0" to "9" do math_code(k):=hi(k+var_code); for k:="A" to "Z" do begin cat_code(k):=letter; cat_code(k+"a"-"A"):=letter;@/ math_code(k):=hi(k+var_code+@"100); math_code(k+"a"-"A"):=hi(k+"a"-"A"+var_code+@"100);@/ lc_code(k):=k+"a"-"A"; lc_code(k+"a"-"A"):=k+"a"-"A";@/ uc_code(k):=k; uc_code(k+"a"-"A"):=k;@/ sf_code(k):=999; end; @ @= if n=par_shape_loc then begin print_esc("parshape"); print_char("="); if par_shape_ptr=null then print_char("0") else print_int(info(par_shape_ptr)); end else if n else @ @ @= begin if n=cur_font_loc then print("current font") else if n= primitive("pretolerance",assign_int,int_base+pretolerance_code);@/ @!@:pretolerance_}{\.{\\pretolerance} primitive@> primitive("tolerance",assign_int,int_base+tolerance_code);@/ @!@:tolerance_}{\.{\\tolerance} primitive@> primitive("linepenalty",assign_int,int_base+line_penalty_code);@/ @!@:line_penalty_}{\.{\\linepenalty} primitive@> primitive("hyphenpenalty",assign_int,int_base+hyphen_penalty_code);@/ @!@:hyphen_penalty_}{\.{\\hyphenpenalty} primitive@> primitive("exhyphenpenalty",assign_int,int_base+ex_hyphen_penalty_code);@/ @!@:ex_hyphen_penalty_}{\.{\\exhyphenpenalty} primitive@> primitive("clubpenalty",assign_int,int_base+club_penalty_code);@/ @!@:club_penalty_}{\.{\\clubpenalty} primitive@> primitive("widowpenalty",assign_int,int_base+widow_penalty_code);@/ @!@:widow_penalty_}{\.{\\widowpenalty} primitive@> primitive("displaywidowpenalty", assign_int,int_base+display_widow_penalty_code);@/ @!@:display_widow_penalty_}{\.{\\displaywidowpenalty} primitive@> primitive("brokenpenalty",assign_int,int_base+broken_penalty_code);@/ @!@:broken_penalty_}{\.{\\brokenpenalty} primitive@> primitive("binoppenalty",assign_int,int_base+bin_op_penalty_code);@/ @!@:bin_op_penalty_}{\.{\\binoppenalty} primitive@> primitive("relpenalty",assign_int,int_base+rel_penalty_code);@/ @!@:rel_penalty_}{\.{\\relpenalty} primitive@> primitive("predisplaypenalty",assign_int,int_base+pre_display_penalty_code);@/ @!@:pre_display_penalty_}{\.{\\predisplaypenalty} primitive@> primitive("postdisplaypenalty",assign_int,int_base+post_display_penalty_code);@/ @!@:post_display_penalty_}{\.{\\postdisplaypenalty} primitive@> primitive("interlinepenalty",assign_int,int_base+inter_line_penalty_code);@/ @!@:inter_line_penalty_}{\.{\\interlinepenalty} primitive@> primitive("doublehyphendemerits", assign_int,int_base+double_hyphen_demerits_code);@/ @!@:double_hyphen_demerits_}{\.{\\doublehyphendemerits} primitive@> primitive("finalhyphendemerits", assign_int,int_base+final_hyphen_demerits_code);@/ @!@:final_hyphen_demerits_}{\.{\\finalhyphendemerits} primitive@> primitive("adjdemerits",assign_int,int_base+adj_demerits_code);@/ @!@:adj_demerits_}{\.{\\adjdemerits} primitive@> primitive("mag",assign_int,int_base+mag_code);@/ @!@:mag_}{\.{\\mag} primitive@> primitive("delimiterfactor",assign_int,int_base+delimiter_factor_code);@/ @!@:delimiter_factor_}{\.{\\delimiterfactor} primitive@> primitive("looseness",assign_int,int_base+looseness_code);@/ @!@:looseness_}{\.{\\looseness} primitive@> primitive("time",assign_int,int_base+time_code);@/ @!@:time_}{\.{\\time} primitive@> primitive("day",assign_int,int_base+day_code);@/ @!@:day_}{\.{\\day} primitive@> primitive("month",assign_int,int_base+month_code);@/ @!@:month_}{\.{\\month} primitive@> primitive("year",assign_int,int_base+year_code);@/ @!@:year_}{\.{\\year} primitive@> primitive("showboxbreadth",assign_int,int_base+show_box_breadth_code);@/ @!@:show_box_breadth_}{\.{\\showboxbreadth} primitive@> primitive("showboxdepth",assign_int,int_base+show_box_depth_code);@/ @!@:show_box_depth_}{\.{\\showboxdepth} primitive@> primitive("hbadness",assign_int,int_base+hbadness_code);@/ @!@:hbadness_}{\.{\\hbadness} primitive@> primitive("vbadness",assign_int,int_base+vbadness_code);@/ @!@:vbadness_}{\.{\\vbadness} primitive@> primitive("pausing",assign_int,int_base+pausing_code);@/ @!@:pausing_}{\.{\\pausing} primitive@> primitive("tracingonline",assign_int,int_base+tracing_online_code);@/ @!@:tracing_online_}{\.{\\tracingonline} primitive@> primitive("tracingmacros",assign_int,int_base+tracing_macros_code);@/ @!@:tracing_macros_}{\.{\\tracingmacros} primitive@> primitive("tracingstats",assign_int,int_base+tracing_stats_code);@/ @!@:tracing_stats_}{\.{\\tracingstats} primitive@> primitive("tracingparagraphs",assign_int,int_base+tracing_paragraphs_code);@/ @!@:tracing_paragraphs_}{\.{\\tracingparagraphs} primitive@> primitive("tracingpages",assign_int,int_base+tracing_pages_code);@/ @!@:tracing_pages_}{\.{\\tracingpages} primitive@> primitive("tracingoutput",assign_int,int_base+tracing_output_code);@/ @!@:tracing_output_}{\.{\\tracingoutput} primitive@> primitive("tracinglostchars",assign_int,int_base+tracing_lost_chars_code);@/ @!@:tracing_lost_chars_}{\.{\\tracinglostchars} primitive@> primitive("tracingcommands",assign_int,int_base+tracing_commands_code);@/ @!@:tracing_commands_}{\.{\\tracingcommands} primitive@> primitive("tracingrestores",assign_int,int_base+tracing_restores_code);@/ @!@:tracing_restores_}{\.{\\tracingrestores} primitive@> primitive("uchyph",assign_int,int_base+uc_hyph_code);@/ @!@:uc_hyph_}{\.{\\uchyph} primitive@> primitive("outputpenalty",assign_int,int_base+output_penalty_code);@/ @!@:output_penalty_}{\.{\\outputpenalty} primitive@> primitive("maxdeadcycles",assign_int,int_base+max_dead_cycles_code);@/ @!@:max_dead_cycles_}{\.{\\maxdeadcycles} primitive@> primitive("hangafter",assign_int,int_base+hang_after_code);@/ @!@:hang_after_}{\.{\\hangafter} primitive@> primitive("floatingpenalty",assign_int,int_base+floating_penalty_code);@/ @!@:floating_penalty_}{\.{\\floatingpenalty} primitive@> primitive("globaldefs",assign_int,int_base+global_defs_code);@/ @!@:global_defs_}{\.{\\globaldefs} primitive@> primitive("fam",assign_int,int_base+cur_fam_code);@/ @!@:fam_}{\.{\\fam} primitive@> primitive("escapechar",assign_int,int_base+escape_char_code);@/ @!@:escape_char_}{\.{\\escapechar} primitive@> primitive("defaulthyphenchar",assign_int,int_base+default_hyphen_char_code);@/ @!@:default_hyphen_char_}{\.{\\defaulthyphenchar} primitive@> primitive("defaultskewchar",assign_int,int_base+default_skew_char_code);@/ @!@:default_skew_char_}{\.{\\defaultskewchar} primitive@> primitive("endlinechar",assign_int,int_base+end_line_char_code);@/ @!@:end_line_char_}{\.{\\endlinechar} primitive@> primitive("newlinechar",assign_int,int_base+new_line_char_code);@/ @!@:new_line_char_}{\.{\\newlinechar} primitive@> primitive("language",assign_int,int_base+language_code);@/ @!@:language_}{\.{\\language} primitive@> primitive("lefthyphenmin",assign_int,int_base+left_hyphen_min_code);@/ @!@:left_hyphen_min_}{\.{\\lefthyphenmin} primitive@> primitive("righthyphenmin",assign_int,int_base+right_hyphen_min_code);@/ @!@:right_hyphen_min_}{\.{\\righthyphenmin} primitive@> primitive("holdinginserts",assign_int,int_base+holding_inserts_code);@/ @!@:holding_inserts_}{\.{\\holdinginserts} primitive@> primitive("errorcontextlines",assign_int,int_base+error_context_lines_code);@/ @!@:error_context_lines_}{\.{\\errorcontextlines} primitive@> @ @= assign_int: if chr_code @= for k:=int_base to del_code_base-1 do eqtb[k].int:=0; mag:=1000; tolerance:=10000; hang_after:=1; max_dead_cycles:=25; escape_char:="\"; end_line_char:=carriage_return; for k:=0 to 255 do del_code(k):=-1; del_code("."):=0; {this null delimiter is used in error recovery} @ The following procedure, which is called just before \TeX\ initializes its input and output, establishes the initial values of the date and time. @^system dependencies@> Since standard \PASCAL\ cannot provide such information, something special is needed. The program here simply specifies July 4, 1776, at noon; but users probably want a better approximation to the truth. @p procedure fix_date_and_time; begin time:=12*60; {minutes since midnight} day:=4; {fourth day of the month} month:=7; {seventh month of the year} year:=1776; {Anno Domini} end; @ @= begin if n= @!old_setting:0..max_selector; @ The final region of |eqtb| contains the dimension parameters defined here, and the 256 \.{\\dimen} registers. @d par_indent_code=0 {indentation of paragraphs} @d math_surround_code=1 {space around math in text} @d line_skip_limit_code=2 {threshold for |line_skip| instead of |baseline_skip|} @d hsize_code=3 {line width in horizontal mode} @d vsize_code=4 {page height in vertical mode} @d max_depth_code=5 {maximum depth of boxes on main pages} @d split_max_depth_code=6 {maximum depth of boxes on split pages} @d box_max_depth_code=7 {maximum depth of explicit vboxes} @d hfuzz_code=8 {tolerance for overfull hbox messages} @d vfuzz_code=9 {tolerance for overfull vbox messages} @d delimiter_shortfall_code=10 {maximum amount uncovered by variable delimiters} @d null_delimiter_space_code=11 {blank space in null delimiters} @d script_space_code=12 {extra space after subscript or superscript} @d pre_display_size_code=13 {length of text preceding a display} @d display_width_code=14 {length of line for displayed equation} @d display_indent_code=15 {indentation of line for displayed equation} @d overfull_rule_code=16 {width of rule that identifies overfull hboxes} @d hang_indent_code=17 {amount of hanging indentation} @d h_offset_code=18 {amount of horizontal offset when shipping pages out} @d v_offset_code=19 {amount of vertical offset when shipping pages out} @d emergency_stretch_code=20 {reduces badnesses on final pass of line-breaking} @d dimen_pars=21 {total number of dimension parameters} @d scaled_base=dimen_base+dimen_pars {table of 256 user-defined \.{\\dimen} registers} @d eqtb_size=scaled_base+255 {largest subscript of |eqtb|} @# @d dimen(#)==eqtb[scaled_base+#].sc @d dimen_par(#)==eqtb[dimen_base+#].sc {a scaled quantity} @d par_indent==dimen_par(par_indent_code) @d math_surround==dimen_par(math_surround_code) @d line_skip_limit==dimen_par(line_skip_limit_code) @d hsize==dimen_par(hsize_code) @d vsize==dimen_par(vsize_code) @d max_depth==dimen_par(max_depth_code) @d split_max_depth==dimen_par(split_max_depth_code) @d box_max_depth==dimen_par(box_max_depth_code) @d hfuzz==dimen_par(hfuzz_code) @d vfuzz==dimen_par(vfuzz_code) @d delimiter_shortfall==dimen_par(delimiter_shortfall_code) @d null_delimiter_space==dimen_par(null_delimiter_space_code) @d script_space==dimen_par(script_space_code) @d pre_display_size==dimen_par(pre_display_size_code) @d display_width==dimen_par(display_width_code) @d display_indent==dimen_par(display_indent_code) @d overfull_rule==dimen_par(overfull_rule_code) @d hang_indent==dimen_par(hang_indent_code) @d h_offset==dimen_par(h_offset_code) @d v_offset==dimen_par(v_offset_code) @d emergency_stretch==dimen_par(emergency_stretch_code) @p procedure print_length_param(@!n:integer); begin case n of par_indent_code:print_esc("parindent"); math_surround_code:print_esc("mathsurround"); line_skip_limit_code:print_esc("lineskiplimit"); hsize_code:print_esc("hsize"); vsize_code:print_esc("vsize"); max_depth_code:print_esc("maxdepth"); split_max_depth_code:print_esc("splitmaxdepth"); box_max_depth_code:print_esc("boxmaxdepth"); hfuzz_code:print_esc("hfuzz"); vfuzz_code:print_esc("vfuzz"); delimiter_shortfall_code:print_esc("delimitershortfall"); null_delimiter_space_code:print_esc("nulldelimiterspace"); script_space_code:print_esc("scriptspace"); pre_display_size_code:print_esc("predisplaysize"); display_width_code:print_esc("displaywidth"); display_indent_code:print_esc("displayindent"); overfull_rule_code:print_esc("overfullrule"); hang_indent_code:print_esc("hangindent"); h_offset_code:print_esc("hoffset"); v_offset_code:print_esc("voffset"); emergency_stretch_code:print_esc("emergencystretch"); othercases print("[unknown dimen parameter!]") endcases; end; @ @= primitive("parindent",assign_dimen,dimen_base+par_indent_code);@/ @!@:par_indent_}{\.{\\parindent} primitive@> primitive("mathsurround",assign_dimen,dimen_base+math_surround_code);@/ @!@:math_surround_}{\.{\\mathsurround} primitive@> primitive("lineskiplimit",assign_dimen,dimen_base+line_skip_limit_code);@/ @!@:line_skip_limit_}{\.{\\lineskiplimit} primitive@> primitive("hsize",assign_dimen,dimen_base+hsize_code);@/ @!@:hsize_}{\.{\\hsize} primitive@> primitive("vsize",assign_dimen,dimen_base+vsize_code);@/ @!@:vsize_}{\.{\\vsize} primitive@> primitive("maxdepth",assign_dimen,dimen_base+max_depth_code);@/ @!@:max_depth_}{\.{\\maxdepth} primitive@> primitive("splitmaxdepth",assign_dimen,dimen_base+split_max_depth_code);@/ @!@:split_max_depth_}{\.{\\splitmaxdepth} primitive@> primitive("boxmaxdepth",assign_dimen,dimen_base+box_max_depth_code);@/ @!@:box_max_depth_}{\.{\\boxmaxdepth} primitive@> primitive("hfuzz",assign_dimen,dimen_base+hfuzz_code);@/ @!@:hfuzz_}{\.{\\hfuzz} primitive@> primitive("vfuzz",assign_dimen,dimen_base+vfuzz_code);@/ @!@:vfuzz_}{\.{\\vfuzz} primitive@> primitive("delimitershortfall", assign_dimen,dimen_base+delimiter_shortfall_code);@/ @!@:delimiter_shortfall_}{\.{\\delimitershortfall} primitive@> primitive("nulldelimiterspace", assign_dimen,dimen_base+null_delimiter_space_code);@/ @!@:null_delimiter_space_}{\.{\\nulldelimiterspace} primitive@> primitive("scriptspace",assign_dimen,dimen_base+script_space_code);@/ @!@:script_space_}{\.{\\scriptspace} primitive@> primitive("predisplaysize",assign_dimen,dimen_base+pre_display_size_code);@/ @!@:pre_display_size_}{\.{\\predisplaysize} primitive@> primitive("displaywidth",assign_dimen,dimen_base+display_width_code);@/ @!@:display_width_}{\.{\\displaywidth} primitive@> primitive("displayindent",assign_dimen,dimen_base+display_indent_code);@/ @!@:display_indent_}{\.{\\displayindent} primitive@> primitive("overfullrule",assign_dimen,dimen_base+overfull_rule_code);@/ @!@:overfull_rule_}{\.{\\overfullrule} primitive@> primitive("hangindent",assign_dimen,dimen_base+hang_indent_code);@/ @!@:hang_indent_}{\.{\\hangindent} primitive@> primitive("hoffset",assign_dimen,dimen_base+h_offset_code);@/ @!@:h_offset_}{\.{\\hoffset} primitive@> primitive("voffset",assign_dimen,dimen_base+v_offset_code);@/ @!@:v_offset_}{\.{\\voffset} primitive@> primitive("emergencystretch",assign_dimen,dimen_base+emergency_stretch_code);@/ @!@:emergency_stretch_}{\.{\\emergencystretch} primitive@> @ @= assign_dimen: if chr_code= for k:=dimen_base to eqtb_size do eqtb[k].sc:=0; @ @= begin if n@@;@/ @!stat procedure show_eqtb(@!n:pointer); begin if n else if n else if n else if n else if n<=eqtb_size then @ else print_char("?"); {this can't happen either} end; tats @ The last two regions of |eqtb| have fullword values instead of the three fields |eq_level|, |eq_type|, and |equiv|. An |eq_type| is unnecessary, but \TeX\ needs to store the |eq_level| information in another array called |xeq_level|. @= @!eqtb:array[active_base..eqtb_size] of memory_word; @!xeq_level:array[int_base..eqtb_size] of quarterword; @ @= for k:=int_base to eqtb_size do xeq_level[k]:=level_one; @ When the debugging routine |search_mem| is looking for pointers having a given value, it is interested only in regions 1 to~3 of~|eqtb|, and in the first part of region~4. @= for q:=active_base to box_base+255 do begin if equiv(q)=p then begin print_nl("EQUIV("); print_int(q); print_char(")"); end; end @* \[18] The hash table. Control sequences are stored and retrieved by means of a fairly standard hash table algorithm called the method of ``coalescing lists'' (cf.\ Algorithm 6.4C in {\sl The Art of Computer Programming\/}). Once a control sequence enters the table, it is never removed, because there are complicated situations involving \.{\\gdef} where the removal of a control sequence at the end of a group would be a mistake preventable only by the introduction of a complicated reference-count mechanism. The actual sequence of letters forming a control sequence identifier is stored in the |str_pool| array together with all the other strings. An auxiliary array |hash| consists of items with two halfword fields per word. The first of these, called |next(p)|, points to the next identifier belonging to the same coalesced list as the identifier corresponding to~|p|; and the other, called |text(p)|, points to the |str_start| entry for |p|'s identifier. If position~|p| of the hash table is empty, we have |text(p)=0|; if position |p| is either empty or the end of a coalesced hash list, we have |next(p)=0|. An auxiliary pointer variable called |hash_used| is maintained in such a way that all locations |p>=hash_used| are nonempty. The global variable |cs_count| tells how many multiletter control sequences have been defined, if statistics are being kept. A global boolean variable called |no_new_control_sequence| is set to |true| during the time that new hash table entries are forbidden. @d next(#) == hash[#].lh {link for coalesced lists} @d text(#) == hash[#].rh {string number for control sequence name} @d hash_is_full == (hash_used=hash_base) {test if all positions are occupied} @d font_id_text(#) == text(font_id_base+#) {a frozen font identifier's name} @= @!hash: array[hash_base..undefined_control_sequence-1] of two_halves; {the hash table} @!hash_used:pointer; {allocation pointer for |hash|} @!no_new_control_sequence:boolean; {are new identifiers legal?} @!cs_count:integer; {total number of known identifiers} @ @= no_new_control_sequence:=true; {new identifiers are usually forbidden} next(hash_base):=0; text(hash_base):=0; for k:=hash_base+1 to undefined_control_sequence-1 do hash[k]:=hash[hash_base]; @ @= hash_used:=frozen_control_sequence; {nothing is used} cs_count:=0; eq_type(frozen_dont_expand):=dont_expand; text(frozen_dont_expand):="notexpanded:"; @.notexpanded:@> @ Here is the subroutine that searches the hash table for an identifier that matches a given string of length |l>1| appearing in |buffer[j.. (j+l-1)]|. If the identifier is found, the corresponding hash table address is returned. Otherwise, if the global variable |no_new_control_sequence| is |true|, the dummy address |undefined_control_sequence| is returned. Otherwise the identifier is inserted into the hash table and its location is returned. @p function id_lookup(@!j,@!l:integer):pointer; {search the hash table} label found; {go here if you found it} var h:integer; {hash code} @!d:integer; {number of characters in incomplete current string} @!p:pointer; {index in |hash| array} @!k:pointer; {index in |buffer| array} begin @; p:=h+hash_base; {we start searching here; note that |0<=h0 then if length(text(p))=l then if str_eq_buf(text(p),j) then goto found; if next(p)=0 then begin if no_new_control_sequence then p:=undefined_control_sequence else @; goto found; end; p:=next(p); end; found: id_lookup:=p; end; @ @= begin if text(p)>0 then begin repeat if hash_is_full then overflow("hash size",hash_size); @:TeX capacity exceeded hash size}{\quad hash size@> decr(hash_used); until text(hash_used)=0; {search for an empty location in |hash|} next(p):=hash_used; p:=hash_used; end; str_room(l); d:=cur_length; while pool_ptr>str_start[str_ptr] do begin decr(pool_ptr); str_pool[pool_ptr+l]:=str_pool[pool_ptr]; end; {move current string up to make room for another} for k:=j to j+l-1 do append_char(buffer[k]); text(p):=make_string; pool_ptr:=pool_ptr+d; @!stat incr(cs_count);@+tats@;@/ end @ The value of |hash_prime| should be roughly 85\pct! of |hash_size|, and it should be a prime number. The theory of hashing tells us to expect fewer than two table probes, on the average, when the search is successful. [See J.~S. Vitter, {\sl Journal of the ACM\/ \bf30} (1983), 231--258.] @^Vitter, Jeffrey Scott@> @= h:=buffer[j]; for k:=j+1 to j+l-1 do begin h:=h+h+buffer[k]; while h>=hash_prime do h:=h-hash_prime; end @ Single-character control sequences do not need to be looked up in a hash table, since we can use the character code itself as a direct address. The procedure |print_cs| prints the name of a control sequence, given a pointer to its address in |eqtb|. A space is printed after the name unless it is a single nonletter or an active character. This procedure might be invoked with invalid data, so it is ``extra robust.'' The individual characters must be printed one at a time using |print|, since they may be unprintable. @= procedure print_cs(@!p:integer); {prints a purported control sequence} begin if p=single_base then if p=null_cs then begin print_esc("csname"); print_esc("endcsname"); end else begin print_esc(p-single_base); if cat_code(p-single_base)=letter then print_char(" "); end else if p else print(p-active_base) else if p>=undefined_control_sequence then print_esc("IMPOSSIBLE.") else if (text(p)<0)or(text(p)>=str_ptr) then print_esc("NONEXISTENT.") @.NONEXISTENT@> else begin print_esc(""); slow_print(text(p)); print_char(" "); end; end; @ Here is a similar procedure; it avoids the error checks, and it never prints a space after the control sequence. @= procedure sprint_cs(@!p:pointer); {prints a control sequence} begin if p= primitive(" ",ex_space,0);@/ @!@:Single-character primitives /}{\quad\.{\\\ }@> primitive("/",ital_corr,0);@/ @!@:Single-character primitives /}{\quad\.{\\/}@> primitive("accent",accent,0);@/ @!@:accent_}{\.{\\accent} primitive@> primitive("advance",advance,0);@/ @!@:advance_}{\.{\\advance} primitive@> primitive("afterassignment",after_assignment,0);@/ @!@:after_assignment_}{\.{\\afterassignment} primitive@> primitive("aftergroup",after_group,0);@/ @!@:after_group_}{\.{\\aftergroup} primitive@> primitive("begingroup",begin_group,0);@/ @!@:begin_group_}{\.{\\begingroup} primitive@> primitive("char",char_num,0);@/ @!@:char_}{\.{\\char} primitive@> primitive("csname",cs_name,0);@/ @!@:cs_name_}{\.{\\csname} primitive@> primitive("delimiter",delim_num,0);@/ @!@:delimiter_}{\.{\\delimiter} primitive@> primitive("divide",divide,0);@/ @!@:divide_}{\.{\\divide} primitive@> primitive("endcsname",end_cs_name,0);@/ @!@:end_cs_name_}{\.{\\endcsname} primitive@> primitive("endgroup",end_group,0); @!@:end_group_}{\.{\\endgroup} primitive@> text(frozen_end_group):="endgroup"; eqtb[frozen_end_group]:=eqtb[cur_val];@/ primitive("expandafter",expand_after,0);@/ @!@:expand_after_}{\.{\\expandafter} primitive@> primitive("font",def_font,0);@/ @!@:font_}{\.{\\font} primitive@> primitive("fontdimen",assign_font_dimen,0);@/ @!@:font_dimen_}{\.{\\fontdimen} primitive@> primitive("halign",halign,0);@/ @!@:halign_}{\.{\\halign} primitive@> primitive("hrule",hrule,0);@/ @!@:hrule_}{\.{\\hrule} primitive@> primitive("ignorespaces",ignore_spaces,0);@/ @!@:ignore_spaces_}{\.{\\ignorespaces} primitive@> primitive("insert",insert,0);@/ @!@:insert_}{\.{\\insert} primitive@> primitive("mark",mark,0);@/ @!@:mark_}{\.{\\mark} primitive@> primitive("mathaccent",math_accent,0);@/ @!@:math_accent_}{\.{\\mathaccent} primitive@> primitive("mathchar",math_char_num,0);@/ @!@:math_char_}{\.{\\mathchar} primitive@> primitive("mathchoice",math_choice,0);@/ @!@:math_choice_}{\.{\\mathchoice} primitive@> primitive("multiply",multiply,0);@/ @!@:multiply_}{\.{\\multiply} primitive@> primitive("noalign",no_align,0);@/ @!@:no_align_}{\.{\\noalign} primitive@> primitive("noboundary",no_boundary,0);@/ @!@:no_boundary_}{\.{\\noboundary} primitive@> primitive("noexpand",no_expand,0);@/ @!@:no_expand_}{\.{\\noexpand} primitive@> primitive("nonscript",non_script,0);@/ @!@:non_script_}{\.{\\nonscript} primitive@> primitive("omit",omit,0);@/ @!@:omit_}{\.{\\omit} primitive@> primitive("parshape",set_shape,0);@/ @!@:par_shape_}{\.{\\parshape} primitive@> primitive("penalty",break_penalty,0);@/ @!@:penalty_}{\.{\\penalty} primitive@> primitive("prevgraf",set_prev_graf,0);@/ @!@:prev_graf_}{\.{\\prevgraf} primitive@> primitive("radical",radical,0);@/ @!@:radical_}{\.{\\radical} primitive@> primitive("read",read_to_cs,0);@/ @!@:read_}{\.{\\read} primitive@> primitive("relax",relax,256); {cf.\ |scan_file_name|} @!@:relax_}{\.{\\relax} primitive@> text(frozen_relax):="relax"; eqtb[frozen_relax]:=eqtb[cur_val];@/ primitive("setbox",set_box,0);@/ @!@:set_box_}{\.{\\setbox} primitive@> primitive("the",the,0);@/ @!@:the_}{\.{\\the} primitive@> primitive("toks",toks_register,0);@/ @!@:toks_}{\.{\\toks} primitive@> primitive("vadjust",vadjust,0);@/ @!@:vadjust_}{\.{\\vadjust} primitive@> primitive("valign",valign,0);@/ @!@:valign_}{\.{\\valign} primitive@> primitive("vcenter",vcenter,0);@/ @!@:vcenter_}{\.{\\vcenter} primitive@> primitive("vrule",vrule,0);@/ @!@:vrule_}{\.{\\vrule} primitive@> @ Each primitive has a corresponding inverse, so that it is possible to display the cryptic numeric contents of |eqtb| in symbolic form. Every call of |primitive| in this program is therefore accompanied by some straightforward code that forms part of the |print_cmd_chr| routine below. @= accent: print_esc("accent"); advance: print_esc("advance"); after_assignment: print_esc("afterassignment"); after_group: print_esc("aftergroup"); assign_font_dimen: print_esc("fontdimen"); begin_group: print_esc("begingroup"); break_penalty: print_esc("penalty"); char_num: print_esc("char"); cs_name: print_esc("csname"); def_font: print_esc("font"); delim_num: print_esc("delimiter"); divide: print_esc("divide"); end_cs_name: print_esc("endcsname"); end_group: print_esc("endgroup"); ex_space: print_esc(" "); expand_after: print_esc("expandafter"); halign: print_esc("halign"); hrule: print_esc("hrule"); ignore_spaces: print_esc("ignorespaces"); insert: print_esc("insert"); ital_corr: print_esc("/"); mark: print_esc("mark"); math_accent: print_esc("mathaccent"); math_char_num: print_esc("mathchar"); math_choice: print_esc("mathchoice"); multiply: print_esc("multiply"); no_align: print_esc("noalign"); no_boundary:print_esc("noboundary"); no_expand: print_esc("noexpand"); non_script: print_esc("nonscript"); omit: print_esc("omit"); radical: print_esc("radical"); read_to_cs: print_esc("read"); relax: print_esc("relax"); set_box: print_esc("setbox"); set_prev_graf: print_esc("prevgraf"); set_shape: print_esc("parshape"); the: print_esc("the"); toks_register: print_esc("toks"); vadjust: print_esc("vadjust"); valign: print_esc("valign"); vcenter: print_esc("vcenter"); vrule: print_esc("vrule"); @ We will deal with the other primitives later, at some point in the program where their |eq_type| and |equiv| values are more meaningful. For example, the primitives for math mode will be loaded when we consider the routines that deal with formulas. It is easy to find where each particular primitive was treated by looking in the index at the end; for example, the section where |"radical"| entered |eqtb| is listed under `\.{\\radical} primitive'. (Primitives consisting of a single nonalphabetic character, @!like `\.{\\/}', are listed under `Single-character primitives'.) @!@^Single-character primitives@> Meanwhile, this is a convenient place to catch up on something we were unable to do before the hash table was defined: @= print_esc(font_id_text(font(p))) @* \[19] Saving and restoring equivalents. The nested structure provided by `$\.{\char'173}\ldots\.{\char'175}$' groups in \TeX\ means that |eqtb| entries valid in outer groups should be saved and restored later if they are overridden inside the braces. When a new |eqtb| value is being assigned, the program therefore checks to see if the previous entry belongs to an outer level. In such a case, the old value is placed on the |save_stack| just before the new value enters |eqtb|. At the end of a grouping level, i.e., when the right brace is sensed, the |save_stack| is used to restore the outer values, and the inner ones are destroyed. Entries on the |save_stack| are of type |memory_word|. The top item on this stack is |save_stack[p]|, where |p=save_ptr-1|; it contains three fields called |save_type|, |save_level|, and |save_index|, and it is interpreted in one of four ways: \yskip\hang 1) If |save_type(p)=restore_old_value|, then |save_index(p)| is a location in |eqtb| whose current value should be destroyed at the end of the current group and replaced by |save_stack[p-1]|. Furthermore if |save_index(p)>=int_base|, then |save_level(p)| should replace the corresponding entry in |xeq_level|. \yskip\hang 2) If |save_type(p)=restore_zero|, then |save_index(p)| is a location in |eqtb| whose current value should be destroyed at the end of the current group, when it should be replaced by the current value of |eqtb[undefined_control_sequence]|. \yskip\hang 3) If |save_type(p)=insert_token|, then |save_index(p)| is a token that should be inserted into \TeX's input when the current group ends. \yskip\hang 4) If |save_type(p)=level_boundary|, then |save_level(p)| is a code explaining what kind of group we were previously in, and |save_index(p)| points to the level boundary word at the bottom of the entries for that group. @d save_type(#)==save_stack[#].hh.b0 {classifies a |save_stack| entry} @d save_level(#)==save_stack[#].hh.b1 {saved level for regions 5 and 6, or group code} @d save_index(#)==save_stack[#].hh.rh {|eqtb| location or |save_stack| location} @d restore_old_value=0 {|save_type| when a value should be restored later} @d restore_zero=1 {|save_type| when an undefined entry should be restored} @d insert_token=2 {|save_type| when a token is being saved for later use} @d level_boundary=3 {|save_type| corresponding to beginning of group} @ Here are the group codes that are used to discriminate between different kinds of groups. They allow \TeX\ to decide what special actions, if any, should be performed when a group ends. \def\grp{\.{\char'173...\char'175}} Some groups are not supposed to be ended by right braces. For example, the `\.\$' that begins a math formula causes a |math_shift_group| to be started, and this should be terminated by a matching `\.\$'. Similarly, a group that starts with \.{\\left} should end with \.{\\right}, and one that starts with \.{\\begingroup} should end with \.{\\endgroup}. @d bottom_level=0 {group code for the outside world} @d simple_group=1 {group code for local structure only} @d hbox_group=2 {code for `\.{\\hbox}\grp'} @d adjusted_hbox_group=3 {code for `\.{\\hbox}\grp' in vertical mode} @d vbox_group=4 {code for `\.{\\vbox}\grp'} @d vtop_group=5 {code for `\.{\\vtop}\grp'} @d align_group=6 {code for `\.{\\halign}\grp', `\.{\\valign}\grp'} @d no_align_group=7 {code for `\.{\\noalign}\grp'} @d output_group=8 {code for output routine} @d math_group=9 {code for, e.g, `\.{\char'136}\grp'} @d disc_group=10 {code for `\.{\\discretionary}\grp\grp\grp'} @d insert_group=11 {code for `\.{\\insert}\grp', `\.{\\vadjust}\grp'} @d vcenter_group=12 {code for `\.{\\vcenter}\grp'} @d math_choice_group=13 {code for `\.{\\mathchoice}\grp\grp\grp\grp'} @d semi_simple_group=14 {code for `\.{\\begingroup...\\endgroup}'} @d math_shift_group=15 {code for `\.{\$...\$}'} @d math_left_group=16 {code for `\.{\\left...\\right}'} @d max_group_code=16 @= @!group_code=0..max_group_code; {|save_level| for a level boundary} @ The global variable |cur_group| keeps track of what sort of group we are currently in. Another global variable, |cur_boundary|, points to the topmost |level_boundary| word. And |cur_level| is the current depth of nesting. The routines are designed to preserve the condition that no entry in the |save_stack| or in |eqtb| ever has a level greater than |cur_level|. @ @= @!save_stack : array[0..save_size] of memory_word; @!save_ptr : 0..save_size; {first unused entry on |save_stack|} @!max_save_stack:0..save_size; {maximum usage of save stack} @!cur_level: quarterword; {current nesting level for groups} @!cur_group: group_code; {current group type} @!cur_boundary: 0..save_size; {where the current level begins} @ At this time it might be a good idea for the reader to review the introduction to |eqtb| that was given above just before the long lists of parameter names. Recall that the ``outer level'' of the program is |level_one|, since undefined control sequences are assumed to be ``defined'' at |level_zero|. @= save_ptr:=0; cur_level:=level_one; cur_group:=bottom_level; cur_boundary:=0; max_save_stack:=0; @ The following macro is used to test if there is room for up to six more entries on |save_stack|. By making a conservative test like this, we can get by with testing for overflow in only a few places. @d check_full_save_stack==if save_ptr>max_save_stack then begin max_save_stack:=save_ptr; if max_save_stack>save_size-6 then overflow("save size",save_size); @:TeX capacity exceeded save size}{\quad save size@> end @ Procedure |new_save_level| is called when a group begins. The argument is a group identification code like `|hbox_group|'. After calling this routine, it is safe to put five more entries on |save_stack|. In some cases integer-valued items are placed onto the |save_stack| just below a |level_boundary| word, because this is a convenient place to keep information that is supposed to ``pop up'' just when the group has finished. For example, when `\.{\\hbox to 100pt}\grp' is being treated, the 100pt dimension is stored on |save_stack| just before |new_save_level| is called. We use the notation |saved(k)| to stand for an integer item that appears in location |save_ptr+k| of the save stack. @d saved(#)==save_stack[save_ptr+#].int @p procedure new_save_level(@!c:group_code); {begin a new level of grouping} begin check_full_save_stack; save_type(save_ptr):=level_boundary; save_level(save_ptr):=cur_group; save_index(save_ptr):=cur_boundary; if cur_level=max_quarterword then overflow("grouping levels", @:TeX capacity exceeded grouping levels}{\quad grouping levels@> max_quarterword-min_quarterword); {quit if |(cur_level+1)| is too big to be stored in |eqtb|} cur_boundary:=save_ptr; incr(cur_level); incr(save_ptr); cur_group:=c; end; @ Just before an entry of |eqtb| is changed, the following procedure should be called to update the other data structures properly. It is important to keep in mind that reference counts in |mem| include references from within |save_stack|, so these counts must be handled carefully. @^reference counts@> @p procedure eq_destroy(@!w:memory_word); {gets ready to forget |w|} var q:pointer; {|equiv| field of |w|} begin case eq_type_field(w) of call,long_call,outer_call,long_outer_call: delete_token_ref(equiv_field(w)); glue_ref: delete_glue_ref(equiv_field(w)); shape_ref: begin q:=equiv_field(w); {we need to free a \.{\\parshape} block} if q<>null then free_node(q,info(q)+info(q)+1); end; {such a block is |2n+1| words long, where |n=info(q)|} box_ref: flush_node_list(equiv_field(w)); othercases do_nothing endcases; end; @ To save a value of |eqtb[p]| that was established at level |l|, we can use the following subroutine. @p procedure eq_save(@!p:pointer;@!l:quarterword); {saves |eqtb[p]|} begin check_full_save_stack; if l=level_zero then save_type(save_ptr):=restore_zero else begin save_stack[save_ptr]:=eqtb[p]; incr(save_ptr); save_type(save_ptr):=restore_old_value; end; save_level(save_ptr):=l; save_index(save_ptr):=p; incr(save_ptr); end; @ The procedure |eq_define| defines an |eqtb| entry having specified |eq_type| and |equiv| fields, and saves the former value if appropriate. This procedure is used only for entries in the first four regions of |eqtb|, i.e., only for entries that have |eq_type| and |equiv| fields. After calling this routine, it is safe to put four more entries on |save_stack|, provided that there was room for four more entries before the call, since |eq_save| makes the necessary test. @p procedure eq_define(@!p:pointer;@!t:quarterword;@!e:halfword); {new data for |eqtb|} begin if eq_level(p)=cur_level then eq_destroy(eqtb[p]) else if cur_level>level_one then eq_save(p,eq_level(p)); eq_level(p):=cur_level; eq_type(p):=t; equiv(p):=e; end; @ The counterpart of |eq_define| for the remaining (fullword) positions in |eqtb| is called |eq_word_define|. Since |xeq_level[p]>=level_one| for all |p|, a `|restore_zero|' will never be used in this case. @p procedure eq_word_define(@!p:pointer;@!w:integer); begin if xeq_level[p]<>cur_level then begin eq_save(p,xeq_level[p]); xeq_level[p]:=cur_level; end; eqtb[p].int:=w; end; @ The |eq_define| and |eq_word_define| routines take care of local definitions. @^global definitions@> Global definitions are done in almost the same way, but there is no need to save old values, and the new value is associated with |level_one|. @p procedure geq_define(@!p:pointer;@!t:quarterword;@!e:halfword); {global |eq_define|} begin eq_destroy(eqtb[p]); eq_level(p):=level_one; eq_type(p):=t; equiv(p):=e; end; @# procedure geq_word_define(@!p:pointer;@!w:integer); {global |eq_word_define|} begin eqtb[p].int:=w; xeq_level[p]:=level_one; end; @ Subroutine |save_for_after| puts a token on the stack for save-keeping. @p procedure save_for_after(@!t:halfword); begin if cur_level>level_one then begin check_full_save_stack; save_type(save_ptr):=insert_token; save_level(save_ptr):=level_zero; save_index(save_ptr):=t; incr(save_ptr); end; end; @ The |unsave| routine goes the other way, taking items off of |save_stack|. This routine takes care of restoration when a level ends; everything belonging to the topmost group is cleared off of the save stack. @p@t\4@>@@;@/ procedure@?back_input; forward; @t\2@> procedure unsave; {pops the top level off the save stack} label done; var p:pointer; {position to be restored} @!l:quarterword; {saved level, if in fullword regions of |eqtb|} @!t:halfword; {saved value of |cur_tok|} begin if cur_level>level_one then begin decr(cur_level); @; end else confusion("curlevel"); {|unsave| is not used when |cur_group=bottom_level|} @:this can't happen curlevel}{\quad curlevel@> end; @ @= loop@+begin decr(save_ptr); if save_type(save_ptr)=level_boundary then goto done; p:=save_index(save_ptr); if save_type(save_ptr)=insert_token then @ else begin if save_type(save_ptr)=restore_old_value then begin l:=save_level(save_ptr); decr(save_ptr); end else save_stack[save_ptr]:=eqtb[undefined_control_sequence]; @; end; end; done: cur_group:=save_level(save_ptr); cur_boundary:=save_index(save_ptr) @ A global definition, which sets the level to |level_one|, @^global definitions@> will not be undone by |unsave|. If at least one global definition of |eqtb[p]| has been carried out within the group that just ended, the last such definition will therefore survive. @= if p0 then restore_trace(p,"retaining");@+tats@;@/ end else begin eq_destroy(eqtb[p]); {destroy the current value} eqtb[p]:=save_stack[save_ptr]; {restore the saved value} @!stat if tracing_restores>0 then restore_trace(p,"restoring");@+tats@;@/ end else if xeq_level[p]<>level_one then begin eqtb[p]:=save_stack[save_ptr]; xeq_level[p]:=l; @!stat if tracing_restores>0 then restore_trace(p,"restoring");@+tats@;@/ end else begin @!stat if tracing_restores>0 then restore_trace(p,"retaining");@+tats@;@/ end @ @= @!stat procedure restore_trace(@!p:pointer;@!s:str_number); {|eqtb[p]| has just been restored or retained} begin begin_diagnostic; print_char("{"); print(s); print_char(" "); show_eqtb(p); print_char("}"); end_diagnostic(false); end; tats @ When looking for possible pointers to a memory location, it is helpful to look for references from |eqtb| that might be waiting on the save stack. Of course, we might find spurious pointers too; but this routine is merely an aid when debugging, and at such times we are grateful for any scraps of information, even if they prove to be irrelevant. @^dirty \PASCAL@> @= if save_ptr>0 then for q:=0 to save_ptr-1 do begin if equiv_field(save_stack[q])=p then begin print_nl("SAVE("); print_int(q); print_char(")"); end; end @ Most of the parameters kept in |eqtb| can be changed freely, but there's an exception: The magnification should not be used with two different values during any \TeX\ job, since a single magnification is applied to an entire run. The global variable |mag_set| is set to the current magnification whenever it becomes necessary to ``freeze'' it at a particular value. @= @!mag_set:integer; {if nonzero, this magnification should be used henceforth} @ @= mag_set:=0; @ The |prepare_mag| subroutine is called whenever \TeX\ wants to use |mag| for magnification. @p procedure prepare_mag; begin if (mag_set>0)and(mag<>mag_set) then begin print_err("Incompatible magnification ("); print_int(mag); @.Incompatible magnification@> print(");"); print_nl(" the previous value will be retained"); help2("I can handle only one magnification ratio per job. So I've")@/ ("reverted to the magnification you used earlier on this run.");@/ int_error(mag_set); geq_word_define(int_base+mag_code,mag_set); {|mag:=mag_set|} end; if (mag<=0)or(mag>32768) then begin print_err("Illegal magnification has been changed to 1000");@/ @.Illegal magnification...@> help1("The magnification ratio must be between 1 and 32768."); int_error(mag); geq_word_define(int_base+mag_code,1000); end; mag_set:=mag; end; @* \[20] Token lists. A \TeX\ token is either a character or a control sequence, and it is @^token@> represented internally in one of two ways: (1)~A character whose ASCII code number is |c| and whose command code is |m| is represented as the number $2^8m+c$; the command code is in the range |1<=m<=14|. (2)~A control sequence whose |eqtb| address is |p| is represented as the number |cs_token_flag+p|. Here |cs_token_flag=@t$2^{12}-1$@>| is larger than $2^8m+c$, yet it is small enough that |cs_token_flag+p< max_halfword|; thus, a token fits comfortably in a halfword. A token |t| represents a |left_brace| command if and only if |t= if cs_token_flag+undefined_control_sequence>max_halfword then bad:=21; @ A token list is a singly linked list of one-word nodes in |mem|, where each word contains a token and a link. Macro definitions, output-routine definitions, marks, \.{\\write} texts, and a few other things are remembered by \TeX\ in the form of token lists, usually preceded by a node with a reference count in its |token_ref_count| field. The token stored in location |p| is called |info(p)|. Three special commands appear in the token lists of macro definitions. When |m=match|, it means that \TeX\ should scan a parameter for the current macro; when |m=end_match|, it means that parameter matching should end and \TeX\ should start reading the macro text; and when |m=out_param|, it means that \TeX\ should insert parameter number |c| into the text at this point. The enclosing \.{\char'173} and \.{\char'175} characters of a macro definition are omitted, but the final right brace of an output routine is included at the end of its token list. Here is an example macro definition that illustrates these conventions. After \TeX\ processes the text $$\.{\\def\\mac a\#1\#2 \\b \{\#1\\-a \#\#1\#2 \#2\}}$$ the definition of \.{\\mac} is represented as a token list containing $$\def\,{\hskip2pt} \vbox{\halign{\hfil#\hfil\cr (reference count), |letter|\,\.a, |match|\,\#, |match|\,\#, |spacer|\,\.\ , \.{\\b}, |end_match|,\cr |out_param|\,1, \.{\\-}, |letter|\,\.a, |spacer|\,\.\ , |mac_param|\,\#, |other_char|\,\.1,\cr |out_param|\,2, |spacer|\,\.\ , |out_param|\,2.\cr}}$$ The procedure |scan_toks| builds such token lists, and |macro_call| does the parameter matching. @^reference counts@> Examples such as $$\.{\\def\\m\{\\def\\m\{a\}\ b\}}$$ explain why reference counts would be needed even if \TeX\ had no \.{\\let} operation: When the token list for \.{\\m} is being read, the redefinition of \.{\\m} changes the |eqtb| entry before the token list has been fully consumed, so we dare not simply destroy a token list when its control sequence is being redefined. If the parameter-matching part of a definition ends with `\.{\#\{}', the corresponding token list will have `\.\{' just before the `|end_match|' and also at the very end. The first `\.\{' is used to delimit the parameter; the second one keeps the first from disappearing. @ The procedure |show_token_list|, which prints a symbolic form of the token list that starts at a given node |p|, illustrates these conventions. The token list being displayed should not begin with a reference count. However, the procedure is intended to be robust, so that if the memory links are awry or if |p| is not really a pointer to a token list, nothing catastrophic will happen. An additional parameter |q| is also given; this parameter is either null or it points to a node in the token list where a certain magic computation takes place that will be explained later. (Basically, |q| is non-null when we are printing the two-line context information at the time of an error message; |q| marks the place corresponding to where the second line should begin.) For example, if |p| points to the node containing the first \.a in the token list above, then |show_token_list| will print the string $$\hbox{`\.{a\#1\#2\ \\b\ ->\#1\\-a\ \#\#1\#2\ \#2}';}$$ and if |q| points to the node containing the second \.a, the magic computation will be performed just before the second \.a is printed. The generation will stop, and `\.{\\ETC.}' will be printed, if the length of printing exceeds a given limit~|l|. Anomalous entries are printed in the form of control sequences that are not followed by a blank space, e.g., `\.{\\BAD.}'; this cannot be confused with actual control sequences because a real control sequence named \.{BAD} would come out `\.{\\BAD\ }'. @= procedure show_token_list(@!p,@!q:integer;@!l:integer); label exit; var m,@!c:integer; {pieces of a token} @!match_chr:ASCII_code; {character used in a `|match|'} @!n:ASCII_code; {the highest parameter number, as an ASCII digit} begin match_chr:="#"; n:="0"; tally:=0; while (p<>null) and (tally; @; p:=link(p); end; if p<>null then print_esc("ETC."); @.ETC@> exit: end; @ @= if (pmem_end) then begin print_esc("CLOBBERED."); return; @.CLOBBERED@> end; if info(p)>=cs_token_flag then print_cs(info(p)-cs_token_flag) else begin m:=info(p) div @'400; c:=info(p) mod @'400; if info(p)<0 then print_esc("BAD.") @.BAD@> else @; end @ The procedure usually ``learns'' the character code used for macro parameters by seeing one in a |match| command before it runs into any |out_param| commands. @= case m of left_brace,right_brace,math_shift,tab_mark,sup_mark,sub_mark,spacer, letter,other_char: print(c); mac_param: begin print(c); print(c); end; out_param: begin print(match_chr); if c<=9 then print_char(c+"0") else begin print_char("!"); return; end; end; match: begin match_chr:=c; print(c); incr(n); print_char(n); if n>"9" then return; end; end_match: print("->"); @.->@> othercases print_esc("BAD.") @.BAD@> endcases @ Here's the way we sometimes want to display a token list, given a pointer to its reference count; the pointer may be null. @p procedure token_show(@!p:pointer); begin if p<>null then show_token_list(link(p),null,10000000); end; @ The |print_meaning| subroutine displays |cur_cmd| and |cur_chr| in symbolic form, including the expansion of a macro or mark. @p procedure print_meaning; begin print_cmd_chr(cur_cmd,cur_chr); if cur_cmd>=call then begin print_char(":"); print_ln; token_show(cur_chr); end else if cur_cmd=top_bot_mark then begin print_char(":"); print_ln; token_show(cur_mark[cur_chr]); end; end; @* \[21] Introduction to the syntactic routines. Let's pause a moment now and try to look at the Big Picture. The \TeX\ program consists of three main parts: syntactic routines, semantic routines, and output routines. The chief purpose of the syntactic routines is to deliver the user's input to the semantic routines, one token at a time. The semantic routines act as an interpreter responding to these tokens, which may be regarded as commands. And the output routines are periodically called on to convert box-and-glue lists into a compact set of instructions that will be sent to a typesetter. We have discussed the basic data structures and utility routines of \TeX, so we are good and ready to plunge into the real activity by considering the syntactic routines. Our current goal is to come to grips with the |get_next| procedure, which is the keystone of \TeX's input mechanism. Each call of |get_next| sets the value of three variables |cur_cmd|, |cur_chr|, and |cur_cs|, representing the next input token. $$\vbox{\halign{#\hfil\cr \hbox{|cur_cmd| denotes a command code from the long list of codes given above;}\cr \hbox{|cur_chr| denotes a character code or other modifier of the command code;}\cr \hbox{|cur_cs| is the |eqtb| location of the current control sequence,}\cr \hbox{\qquad if the current token was a control sequence, otherwise it's zero.}\cr}}$$ Underlying this external behavior of |get_next| is all the machinery necessary to convert from character files to tokens. At a given time we may be only partially finished with the reading of several files (for which \.{\\input} was specified), and partially finished with the expansion of some user-defined macros and/or some macro parameters, and partially finished with the generation of some text in a template for \.{\\halign}, and so on. When reading a character file, special characters must be classified as math delimiters, etc.; comments and extra blank spaces must be removed, paragraphs must be recognized, and control sequences must be found in the hash table. Furthermore there are occasions in which the scanning routines have looked ahead for a word like `\.{plus}' but only part of that word was found, hence a few characters must be put back into the input and scanned again. To handle these situations, which might all be present simultaneously, \TeX\ uses various stacks that hold information about the incomplete activities, and there is a finite state control for each level of the input mechanism. These stacks record the current state of an implicitly recursive process, but the |get_next| procedure is not recursive. Therefore it will not be difficult to translate these algorithms into low-level languages that do not support recursion. @= @!cur_cmd: eight_bits; {current command set by |get_next|} @!cur_chr: halfword; {operand of current command} @!cur_cs: pointer; {control sequence found here, zero if none found} @!cur_tok: halfword; {packed representative of |cur_cmd| and |cur_chr|} @ The |print_cmd_chr| routine prints a symbolic interpretation of a command code and its modifier. This is used in certain `\.{You can\'t}' error messages, and in the implementation of diagnostic routines like \.{\\show}. The body of |print_cmd_chr| is a rather tedious listing of print commands, and most of it is essentially an inverse to the |primitive| routine that enters a \TeX\ primitive into |eqtb|. Therefore much of this procedure appears elsewhere in the program, together with the corresponding |primitive| calls. @d chr_cmd(#)==begin print(#); print_ASCII(chr_code); end @= procedure print_cmd_chr(@!cmd:quarterword;@!chr_code:halfword); begin case cmd of left_brace: chr_cmd("begin-group character "); right_brace: chr_cmd("end-group character "); math_shift: chr_cmd("math shift character "); mac_param: chr_cmd("macro parameter character "); sup_mark: chr_cmd("superscript character "); sub_mark: chr_cmd("subscript character "); endv: print("end of alignment template"); spacer: chr_cmd("blank space "); letter: chr_cmd("the letter "); other_char: chr_cmd("the character "); @t\4@>@@/ othercases print("[unknown command code!]") endcases; end; @ Here is a procedure that displays the current command. @p procedure show_cur_cmd_chr; begin begin_diagnostic; print_nl("{"); if mode<>shown_mode then begin print_mode(mode); print(": "); shown_mode:=mode; end; print_cmd_chr(cur_cmd,cur_chr); print_char("}"); end_diagnostic(false); end; @* \[22] Input stacks and states. This implementation of \TeX\ uses two different conventions for representing sequential stacks. @^stack conventions@>@^conventions for representing stacks@> \yskip\hang 1) If there is frequent access to the top entry, and if the stack is essentially never empty, then the top entry is kept in a global variable (even better would be a machine register), and the other entries appear in the array $\\{stack}[0\to(\\{ptr}-1)]$. For example, the semantic stack described above is handled this way, and so is the input stack that we are about to study. \yskip\hang 2) If there is infrequent top access, the entire stack contents are in the array $\\{stack}[0\to(\\{ptr}-1)]$. For example, the |save_stack| is treated this way, as we have seen. \yskip\noindent The state of \TeX's input mechanism appears in the input stack, whose entries are records with six fields, called |state|, |index|, |start|, |loc|, |limit|, and |name|. This stack is maintained with convention~(1), so it is declared in the following way: @= @!in_state_record = record @!state_field, @!index_field: quarterword; @!start_field,@!loc_field, @!limit_field, @!name_field: halfword; end; @ @= @!input_stack : array[0..stack_size] of in_state_record; @!input_ptr : 0..stack_size; {first unused location of |input_stack|} @!max_in_stack: 0..stack_size; {largest value of |input_ptr| when pushing} @!cur_input : in_state_record; {the ``top'' input state, according to convention (1)} @ We've already defined the special variable |loc==cur_input.loc_field| in our discussion of basic input-output routines. The other components of |cur_input| are defined in the same way: @d state==cur_input.state_field {current scanner state} @d index==cur_input.index_field {reference for buffer information} @d start==cur_input.start_field {starting position in |buffer|} @d limit==cur_input.limit_field {end of current line in |buffer|} @d name==cur_input.name_field {name of the current file} @ Let's look more closely now at the control variables (|state|,~|index|,~|start|,~|loc|,~|limit|,~|name|), assuming that \TeX\ is reading a line of characters that have been input from some file or from the user's terminal. There is an array called |buffer| that acts as a stack of all lines of characters that are currently being read from files, including all lines on subsidiary levels of the input stack that are not yet completed. \TeX\ will return to the other lines when it is finished with the present input file. (Incidentally, on a machine with byte-oriented addressing, it might be appropriate to combine |buffer| with the |str_pool| array, letting the buffer entries grow downward from the top of the string pool and checking that these two tables don't bump into each other.) The line we are currently working on begins in position |start| of the buffer; the next character we are about to read is |buffer[loc]|; and |limit| is the location of the last character present. If |loc>limit|, the line has been completely read. Usually |buffer[limit]| is the |end_line_char|, denoting the end of a line, but this is not true if the current line is an insertion that was entered on the user's terminal in response to an error message. The |name| variable is a string number that designates the name of the current file, if we are reading a text file. It is zero if we are reading from the terminal; it is |n+1| if we are reading from input stream |n|, where |0<=n<=16|. (Input stream 16 stands for an invalid stream number; in such cases the input is actually from the terminal, under control of the procedure |read_toks|.) The |state| variable has one of three values, when we are scanning such files: $$\baselineskip 15pt\vbox{\halign{#\hfil\cr 1) |state=mid_line| is the normal state.\cr 2) |state=skip_blanks| is like |mid_line|, but blanks are ignored.\cr 3) |state=new_line| is the state at the beginning of a line.\cr}}$$ These state values are assigned numeric codes so that if we add the state code to the next character's command code, we get distinct values. For example, `|mid_line+spacer|' stands for the case that a blank space character occurs in the middle of a line when it is not being ignored; after this case is processed, the next value of |state| will be |skip_blanks|. @d mid_line=1 {|state| code when scanning a line of characters} @d skip_blanks=2+max_char_code {|state| code when ignoring blanks} @d new_line=3+max_char_code+max_char_code {|state| code at start of line} @ Additional information about the current line is available via the |index| variable, which counts how many lines of characters are present in the buffer below the current level. We have |index=0| when reading from the terminal and prompting the user for each line; then if the user types, e.g., `\.{\\input paper}', we will have |index=1| while reading the file \.{paper.tex}. However, it does not follow that |index| is the same as the input stack pointer, since many of the levels on the input stack may come from token lists. For example, the instruction `\.{\\input paper}' might occur in a token list. The global variable |in_open| is equal to the |index| value of the highest non-token-list level. Thus, the number of partially read lines in the buffer is |in_open+1|, and we have |in_open=index| when we are not reading a token list. If we are not currently reading from the terminal, or from an input stream, we are reading from the file variable |input_file[index]|. We use the notation |terminal_input| as a convenient abbreviation for |name=0|, and |cur_file| as an abbreviation for |input_file[index]|. The global variable |line| contains the line number in the topmost open file, for use in error messages. If we are not reading from the terminal, |line_stack[index]| holds the line number for the enclosing level, so that |line| can be restored when the current file has been read. Line numbers should never be negative, since the negative of the current line number is used to identify the user's output routine in the |mode_line| field of the semantic nest entries. If more information about the input state is needed, it can be included in small arrays like those shown here. For example, the current page or segment number in the input file might be put into a variable |@!page|, maintained for enclosing levels in `\ignorespaces|@!page_stack:array[1..max_in_open] of integer|\unskip' by analogy with |line_stack|. @^system dependencies@> @d terminal_input==(name=0) {are we reading from the terminal?} @d cur_file==input_file[index] {the current |alpha_file| variable} @= @!in_open : 0..max_in_open; {the number of lines in the buffer, less one} @!open_parens : 0..max_in_open; {the number of open text files} @!input_file : array[1..max_in_open] of alpha_file; @!line : integer; {current line number in the current source file} @!line_stack : array[1..max_in_open] of integer; @ Users of \TeX\ sometimes forget to balance left and right braces properly, and one of the ways \TeX\ tries to spot such errors is by considering an input file as broken into subfiles by control sequences that are declared to be \.{\\outer}. A variable called |scanner_status| tells \TeX\ whether or not to complain when a subfile ends. This variable has six possible values: \yskip\hang|normal|, means that a subfile can safely end here without incident. \yskip\hang|skipping|, means that a subfile can safely end here, but not a file, because we're reading past some conditional text that was not selected. \yskip\hang|defining|, means that a subfile shouldn't end now because a macro is being defined. \yskip\hang|matching|, means that a subfile shouldn't end now because a macro is being used and we are searching for the end of its arguments. \yskip\hang|aligning|, means that a subfile shouldn't end now because we are not finished with the preamble of an \.{\\halign} or \.{\\valign}. \yskip\hang|absorbing|, means that a subfile shouldn't end now because we are reading a balanced token list for \.{\\message}, \.{\\write}, etc. \yskip\noindent If the |scanner_status| is not |normal|, the variable |warning_index| points to the |eqtb| location for the relevant control sequence name to print in an error message. @d skipping=1 {|scanner_status| when passing conditional text} @d defining=2 {|scanner_status| when reading a macro definition} @d matching=3 {|scanner_status| when reading macro arguments} @d aligning=4 {|scanner_status| when reading an alignment preamble} @d absorbing=5 {|scanner_status| when reading a balanced text} @= @!scanner_status : normal..absorbing; {can a subfile end now?} @!warning_index : pointer; {identifier relevant to non-|normal| scanner status} @!def_ref : pointer; {reference count of token list being defined} @ Here is a procedure that uses |scanner_status| to print a warning message when a subfile has ended, and at certain other crucial times: @= procedure runaway; var p:pointer; {head of runaway list} begin if scanner_status>skipping then begin print_nl("Runaway "); @.Runaway...@> case scanner_status of defining: begin print("definition"); p:=def_ref; end; matching: begin print("argument"); p:=temp_head; end; aligning: begin print("preamble"); p:=hold_head; end; absorbing: begin print("text"); p:=def_ref; end; end; {there are no other cases} print_char("?");print_ln; show_token_list(link(p),null,error_line-10); end; end; @ However, all this discussion about input state really applies only to the case that we are inputting from a file. There is another important case, namely when we are currently getting input from a token list. In this case |state=token_list|, and the conventions about the other state variables are different: \yskip\hang|loc| is a pointer to the current node in the token list, i.e., the node that will be read next. If |loc=null|, the token list has been fully read. \yskip\hang|start| points to the first node of the token list; this node may or may not contain a reference count, depending on the type of token list involved. \yskip\hang|token_type|, which takes the place of |index| in the discussion above, is a code number that explains what kind of token list is being scanned. \yskip\hang|name| points to the |eqtb| address of the control sequence being expanded, if the current token list is a macro. \yskip\hang|param_start|, which takes the place of |limit|, tells where the parameters of the current macro begin in the |param_stack|, if the current token list is a macro. \yskip\noindent The |token_type| can take several values, depending on where the current token list came from: \yskip\hang|parameter|, if a parameter is being scanned; \hang|u_template|, if the \ part of an alignment template is being scanned; \hang|v_template|, if the \ part of an alignment template is being scanned; \hang|backed_up|, if the token list being scanned has been inserted as `to be read again'. \hang|inserted|, if the token list being scanned has been inserted as the text expansion of a \.{\\count} or similar variable; \hang|macro|, if a user-defined control sequence is being scanned; \hang|output_text|, if an \.{\\output} routine is being scanned; \hang|every_par_text|, if the text of \.{\\everypar} is being scanned; \hang|every_math_text|, if the text of \.{\\everymath} is being scanned; \hang|every_display_text|, if the text of \.{\\everydisplay} is being scanned; \hang|every_hbox_text|, if the text of \.{\\everyhbox} is being scanned; \hang|every_vbox_text|, if the text of \.{\\everyvbox} is being scanned; \hang|every_job_text|, if the text of \.{\\everyjob} is being scanned; \hang|every_cr_text|, if the text of \.{\\everycr} is being scanned; \hang|mark_text|, if the text of a \.{\\mark} is being scanned; \hang|write_text|, if the text of a \.{\\write} is being scanned. \yskip\noindent The codes for |output_text|, |every_par_text|, etc., are equal to a constant plus the corresponding codes for token list parameters |output_routine_loc|, |every_par_loc|, etc. The token list begins with a reference count if and only if |token_type>=macro|. @^reference counts@> @d token_list=0 {|state| code when scanning a token list} @d token_type==index {type of current token list} @d param_start==limit {base of macro parameters in |param_stack|} @d parameter=0 {|token_type| code for parameter} @d u_template=1 {|token_type| code for \ template} @d v_template=2 {|token_type| code for \ template} @d backed_up=3 {|token_type| code for text to be reread} @d inserted=4 {|token_type| code for inserted texts} @d macro=5 {|token_type| code for defined control sequences} @d output_text=6 {|token_type| code for output routines} @d every_par_text=7 {|token_type| code for \.{\\everypar}} @d every_math_text=8 {|token_type| code for \.{\\everymath}} @d every_display_text=9 {|token_type| code for \.{\\everydisplay}} @d every_hbox_text=10 {|token_type| code for \.{\\everyhbox}} @d every_vbox_text=11 {|token_type| code for \.{\\everyvbox}} @d every_job_text=12 {|token_type| code for \.{\\everyjob}} @d every_cr_text=13 {|token_type| code for \.{\\everycr}} @d mark_text=14 {|token_type| code for \.{\\topmark}, etc.} @d write_text=15 {|token_type| code for \.{\\write}} @ The |param_stack| is an auxiliary array used to hold pointers to the token lists for parameters at the current level and subsidiary levels of input. This stack is maintained with convention (2), and it grows at a different rate from the others. @= @!param_stack:array [0..param_size] of pointer; {token list pointers for parameters} @!param_ptr:0..param_size; {first unused entry in |param_stack|} @!max_param_stack:integer; {largest value of |param_ptr|, will be |<=param_size+9|} @ The input routines must also interact with the processing of \.{\\halign} and \.{\\valign}, since the appearance of tab marks and \.{\\cr} in certain places is supposed to trigger the beginning of special \ template text in the scanner. This magic is accomplished by an |align_state| variable that is increased by~1 when a `\.{\char'173}' is scanned and decreased by~1 when a `\.{\char'175}' is scanned. The |align_state| is nonzero during the \ template, after which it is set to zero; the \ template begins when a tab mark or \.{\\cr} occurs at a time that |align_state=0|. @= @!align_state:integer; {group level with respect to current alignment} @ Thus, the ``current input state'' can be very complicated indeed; there can be many levels and each level can arise in a variety of ways. The |show_context| procedure, which is used by \TeX's error-reporting routine to print out the current input state on all levels down to the most recent line of characters from an input file, illustrates most of these conventions. The global variable |base_ptr| contains the lowest level that was displayed by this procedure. @= @!base_ptr:0..stack_size; {shallowest level shown by |show_context|} @ The status at each level is indicated by printing two lines, where the first line indicates what was read so far and the second line shows what remains to be read. The context is cropped, if necessary, so that the first line contains at most |half_error_line| characters, and the second contains at most |error_line|. Non-current input levels whose |token_type| is `|backed_up|' are shown only if they have not been fully read. @p procedure show_context; {prints where the scanner is} label done; var old_setting:0..max_selector; {saved |selector| setting} @!nn:integer; {number of contexts shown so far, less one} @!bottom_line:boolean; {have we reached the final context to be shown?} @@/ begin base_ptr:=input_ptr; input_stack[base_ptr]:=cur_input; {store current state} nn:=-1; bottom_line:=false; loop@+begin cur_input:=input_stack[base_ptr]; {enter into the context} if (state<>token_list) then if (name>17) or (base_ptr=0) then bottom_line:=true; if (base_ptr=input_ptr)or bottom_line or(nn else if nn=error_context_lines then begin print_nl("..."); incr(nn); {omitted if |error_context_lines<0|} end; if bottom_line then goto done; decr(base_ptr); end; done: cur_input:=input_stack[input_ptr]; {restore original state} end; @ @= begin if (base_ptr=input_ptr) or (state<>token_list) or (token_type<>backed_up) or (loc<>null) then {we omit backed-up token lists that have already been read} begin tally:=0; {get ready to count characters} old_setting:=selector; if state<>token_list then begin @; @; end else begin @; @; end; selector:=old_setting; {stop pseudoprinting} @; incr(nn); end; end @ This routine should be changed, if necessary, to give the best possible indication of where the current line resides in the input file. For example, on some systems it is best to print both a page and line number. @^system dependencies@> @= if name<=17 then if terminal_input then if base_ptr=0 then print_nl("<*>") else print_nl(" ") else begin print_nl(" print_char(">"); end else begin print_nl("l."); print_int(line); end; print_char(" ") @ @= case token_type of parameter: print_nl(" "); u_template,v_template: print_nl("