Protokoll des 18. Kolloquiums über die Anwendung der
Elektronischen Datenverarbeitung in den Geisteswissenschaften
an der Universität Tübingen vom 30. Juni 1979


Hans Walter Gabler (University of Munich)

Computer-aided critical edition of Ulysses


Work is in progress at the University of Munich (Germany) on a critical edition of Ulysses (1922) by James Joyce for which TUSTEP, the standard system of computer programs for text processing written by Wilhelm Ott and his group in Tübingen, is being used. Given a complex and richly documented pre-publication history, computer processing ensures the highest possible degree of completeness and accuracy in the recording, collation, and presentation of the extant textual materials. Among the available programs are a text editor, and collation, correction, copying, index-sorting, and type-setting programs. The methods of their use for this project have been developed in a pilot project for the editing and book production of a single chapter. The editorial result comprises a clear reading text and a synopsis of the pre- and post-publication evolution of the author's text on facing pages. Historical Collation and Emendations lists are given in an appendix.

Work is in progress at the University of Munich (Germany) on a critical edition of Ulysses (1922). As a technical and procedural resource, the TUSTEP system of computer programs for text processing which has been developed at the computing centre of the University of Tübingen, is being employed. It operates on the Konstanz TR440 computer. The Ulysses edition is a first step towards a complete critical edition of the works of James Joyce.

Principles of present-day editorial theory and practice in Anglo-American and German textual scholarship form the rationale for the edition which in itself does not depend on the computer for its realisation. Yet, in practical terms, computer processing ensures the highest possible degree of completeness and accuracy in the recording, collating, and presentation of the vast volume of its textual material. While in no way impairing the critical quality of the editorial activity, the computer processing organizes the editorial procedures by adapting them to the handling facilities of the TUSTEP system. We wish to describe how, in a given textual situation, this well co-ordinated standard system of text processing and printing programs may be successfully employed in the services of scholarly textual editing.

I. The Textual Situation

Following a preparatory period of several years from which no textual record survives, Ulysses began to be written, chapter by chapter, in late 1917. The final chapter in the order of composition (the penultimate one of the novel's eighteen chapters) was concluded in manuscript in October 1921. From drafts, of which only a few survive, each chapter reached a state of consolidation in a final working draft. Thereafter, the chapters were to remain essentially unaffected in structure by even the most copious subsequent verbal revision and expansion. The final working drafts are lost for all but three chapters. From each of them as it was ready, though not from the three surviving ones, a holograph fair copy was prepared. Fifteen chapters in fair copy and three in final working draft state form what is today known as the Rosenbach Manuscript (R) of Ulysses.

When a chapter had reached the R state, a typescript (T) was soon made. For a total of nine chapters this was copied from R, while for the other nine it was typed from the final working draft (W) at a stage of revision beyond that at which it had been authorially copied into R. Thus, R stands outside the direct line of transmission for half of the novel's chapters. Yet by this very circumstance it becomes possible to reconstruct lost W for these chapters, and to distinguish its basic text and post-R revisions, from the combined evidence of its radiating derivatives R and T.

Though it happens to be lost for the first five chapters, T is the essential link in the text's transmission from manuscript to print. It served throughout (though in different exemplars) as printer's copy: first, in an exemplar that does not survive, for the 'work in progress' serialization of fourteen chapters in the New York/Chicago Little Review (L) (March 1918 to September/December 1920), with parallel excerpts in the London Egoist (E); and then for the complete Paris book publication of 1922.

Joyce revised and expanded the novel's text, in multiple holograph overlay to T and the series of up to nine successive proof stages of the book publication, by altogether about a third over the state of R, or reconstructed W. The cumulation of holograph notation from W/R to the last additions to each final proof may be defined as a continuous manuscript of the novel. This multi-layered continuous manuscript is documented in, or, as in the case of the several brief and clearly defined lacunae in the transmission, it is recoverable from, the range of the extant textual witnesses, R, T, L/E, and the successive book publication proofs.

The critical edition's reading text is established on the basis of the edited continuous manuscript. The appended Historical Collation and Emendations lists record where the first and subsequent book publications deviate from the edition text, and where the edition text itself results from critical emendation of the continuous manuscript text. In addition, the continuous manuscript accompanies the reading text as a synoptic display apparatus of the textual development.

A trial edition of the novel's eighth chapter has already been completed and is shown in Figures 1 and 2. The reading text appears in Figure 2. In Figure 1, it is accompanied by the synoptic display of the growth of the text from final authorial manuscript draft (in this case, the earliest genetic stage recovered as no preliminary drafts of the chapter survive) up to the first book publication of 1922, with a system of superimposed diacritics to indicate the text's predominantly accretive development. Pairs of carets enclose revision within a document. The opening carets rotate clockwise (and the closing ones anti-clockwise correspondingly) where there is more than one such level of revision. Alphanumeric indices in opening and closing halfbrackets delimit the extension of manuscript additions and revisions to typescript and successive sets of proofs. Pointed brackets enclose deletions within a given stage of textual development: they indicate immediate or early deletion of textual elements as operations occurring in one document. Square brackets invalidate, at the genetic stage indexed, text that was valid up to the preceding stage: they indicate where text of one document, often transmitted through several witnesses, is removed, and frequently replaced, in a later document.

II. The Input

The OCR-A input, totalling about a million words, consists of transcripts in full of R, of the basic typed text of T (or, where this is missing, the initial typesetting of the book publication proofs (P)), and further of L, of E, of the 1922 first edition (22) and the 1926 second edition (26); lastly, of all holograph overlay only from T and the successive book publication proofs.

The multiple input serves initially to obtain error-free computer records of the textual data. The variation revealed in a preparatory series of machine collations is checked back against the documents transcribed, and all input error is eliminated by file corrections carried out by hand at the computer terminal, or automatically with the program TXTKORRIGIERE. The data checks operate and interact as follows.

R is computer-filed with full caret coding of its intra-linear, interlinear, and marginal revisions. To avoid compounding input error with misreadings of the manuscript, the input is first controlled by triple independent eye collation of the OCR transcript before computer collation with T. The input of T in its turn requires throughout a check independent of its collation run against R, and especially so where it derives not from R, but radiates with R from W and thus possesses a higher degree of primary authority as a textual witness. Hence, L (and E) serve as double input for T, and are machine-collated with it. As edition material subsequently to be utilized, the early authorial revisions made immediately on completion of each chapter typescript, i.e. before its serialization, can be isolated on the basis of the evidence of L and E from the total holograph overlay of the extant T exemplar (which was the printer's copy for 22). In addition, occasional further authorial revisions are recoverable from L. These appear inadvertently to have been entered only on the lost exemplar of T which was used as its printer's copy. For the text of lost sections of T, the input of its parallel derivations L and P serve as double evidence. Finally, a strictly verbatim and literatim record of 22 is ascertained by using the full transcript of the fresh typesetting of 26 as double input. Besides revealing the 22 and 26 input errors, this yields the true 22:26 variation, from which a small number of post-publication authorial revisions can be culled.

III. The Editing

1. Establishing an Early-Version Sub-Edition

The author's commissioning of T, and the serialized publication of fourteen chapters of the novel from an only slightly revised exemplar of it, marks a recognisable point of incision in the continuous textual development. This justifies, and the shifting stemmatic relationship of R and T positively requires, an early-version text as an initial auxiliary sub-edition in the editing process. Established from the combined evidence of R, T, L/E [and P], this provides an ideal text corresponding to the stage of textual development reached with the revision of T in preparation for the serialization (or, in the case of the last four chapters, for the initial book typesetting), but free from the authorial fair-copying mistakes of R, the typist's mistypings of T, and the compositorial errors of L/E and P. Recorded with all necessary coding to indicate the stages of textual growth it covers, as well as all emendation required to eliminate actual corruption from the document representation of the continuous manuscript text, this sub-edition F (for German 'Frühtext') is established with R as its copy-text.

The automatized editing of F begins with a series of parallel machine collations, all based on R and keyed to it as the collation base text. The collations are R:T, R:L, where possible R:E, and where necessary (in the absence of T), R:P. The program TXTVERGLEICH files each batch of variants separately and tags them all for sorting. By one of several available parameters, it formats the collation result as '[base text reading] variant' for each entry.

The sorting reference provided by TXTVERGLEICH permits the use of the program TXTVDRUCKE to print out a synoptic display in parallel lines of the entire variation among the witnesses collated as a first aid to the critical editing as shown in Figure 3. A second aid is provided by printouts of the collation files for pre-marking the editorial decisions as facilitated by the synoptic display (Figure 4). Each variant is labelled as either a genetic variant (an authorial revision or addition), an authorial correction of an earlier error in the autograph notation, a transmissional variant to be admitted into the critically established text (e.g. a full stop at the end of a paragraph in T where Joyce in his speedy inscription of R omitted it), or a transmissional variant to be rejected or filed away in preparation for the Historical Collation list (e.g. mistypings in T, many of which survive into L, or 22).

The markings are transferred to the actual collation files at the terminal. The group of genetic variants from each file, already formatted as '[base text reading] variant', are first extracted to be provided automatically with the opening and closing symbols for the diacritics to indicate the genetic stage of the textual change, and are then all collected in one new file. To this are added the emendation variants, for which the '[base text reading]' is automatically reformatted into an emendation note appended to the emended reading. Within bracketing symbols, this gives the document code for the source of the emendation, and the rejected reading with its source code. Thus




The string of characters between the brackets &* and *&, by which the critically established reading appears extended, is carried through all subsequent computing operations. In the end, supplied with its current reference, it is bracketed off into the Emendations list. Purely editorial emendations unsupported by a source document are hand-supplied in the same format. All material so collected is sorted in rising sequence of the base text reference. The subsequent program TXTKORRIGIERE interprets the entry references by their tags of =, +, or - as instructions to update the base text R: the combined file thus constitutes a correction file to convert a copy of R into F, the ideal early version text critically constituted and diacritically stratified to represent the textual growth from W to (ideal) L as shown in Figure 5. A convenient check on the resulting edited text is possible by generating a clear text FC from F, employing the particularly versatile program TXTKOPIERE to exclude all bracketed material, i.e. text genetically superseded, to eliminate all diacritics, and to close up (Figure 6), then running a new series of collations from FC as collation base text with, in turn, R, T, L, E, and P. The synoptic display in parallel lines quickly reveals infelicitous or inconsistent editorial decisions. If a separate presentation, e.g. publication, of Ulysses in its ideal early version text is desired, a full variants list of the documents from which it was constituted can be automatically established from the same collation material.

2. Integrating the Continuous Manuscript Text

From F, which represents the initial phases of the novel's textual development, the text attains its full extension by expanding over the post-early-version authorial revisions and additions into a completeness corresponding to the 1922 book edition. Yet the computer-assisted editing cannot easily build up the continuous manuscript in the sequential manner of its growth. Since with each level of revision and accretion the text moves one step further away from F, F cannot act as the basic frame of reference necessary for co-ordinated computer operations. 22 on the other hand, though in its actual state it abounds in the corruptions which the critical edition endeavours to eradicate, constitutes the fullest extant version of the novel's text and can therefore provide the reference framework needed.

Fundamentally, the editing may be seen to proceed as a grafting of the critical text onto the physical frame of the actual 1922 first edition text in its computer-file form. In the course of the grafting, every element in the verbatim and literatim record of 22 is eliminated or exchanged, if it does not conform to the continuous manuscript text as critically established with a full record of the textual growth. To obtain the several sections of the critical edition, the computer file of 22 is doubled into the graft copy (22G), and a second copy (22U) where the text is left untouched. In 22U, every element which has been exchanged, expanded, or eliminated in 22G is automatically marked to indicate the place and time of entry into the transmission of the deviation from the continuous manuscript it represents. Transmissional variants editorially accepted from 22 are similarly labelled. At the conclusion of the editing process, this will permit a simple computer collation of the critically established reading text against 22U to generate the Historical Collation list automatically.

To integrate the continuous manuscript text in the graft copy, the post-early-version levels of the novel's revision and accretion are pieced together from the authorial overlay only of the typescript and the successive book publication proofs. The holograph material is transcribed and assembled into lists by source document. For the purposes of the subsequent processing, each entry in the assembly of each list is defined as an addition to the last preceding invariant word by the page/line/word number of that word in 22. Every instance of authorial change is registered, except for simple corrections of transmissional errors that fully restore readings preserved in the autograph inscription of a preceding witness. To guard against oversights and misreadings of Joyce's handwriting, the assembled material is twice eye-collated back against its source before input via OCR. In the computer, the entries of each list as a whole are automatically supplied with the symbols for the opening and closing diacritics which are to delimit them in the integrated synoptic text. Thereafter, utilizing the hand-supplied common system of reference, the individual lists are fused and sequentially sorted into one cumulative list (List I) (Figure 7).

As a parallel operation, FC, the early version clear text, is collated against basic 22, producing as List II the actual FC:22 difference in bulk (Figure 8). An additional parameter of the collation program TXTVERGLEICH automatically generates the 22 p/1/w reference of each instance of variation. Ideally, List II should be textually identical with the cumulative holograph overlay assembled in List I. In fact. List I may contain authorial revisions overlooked by the book compositors. More importantly, List II exceeds List I in all instances of transmissional variation affecting the early-version portion of 22. These are traced to their source (a majority are usually already accounted for among the group of transmissional typescript variants rejected in establishing F) and are subfiled, with a view to correcting 22G and, in complementary form, to label the basic readings in question in 22U.

Each entry in List II as reduced by the instances of transmissional error so treated now corresponds to an entry, or a group of successive entries, in List I. By means of the 22 references present in both lists (hand-supplied in List I, and generated automatically in List II) the lists may be combined and the entries sequentially sorted into List III (Figure 9). In III, the textually corresponding entries from I and II should appear grouped together. Where they do not so appear, an error in the hand-supplied reference of List I needs correction, or else a compositorial misplacing of an authorial revision in the published 22 text is revealed. The grouped double entries in III are formatted as hand-supplied reference (rh) with text transcribed from holograph (th) from List I, and automatically generated reference (ra ) and text (ta) from List II. Processing with TXTKOPIERE combines ra with th, and eliminates rh and ta. The resulting correction file, expanded by the sub-file of transmissional variant corrections and emendations already prepared, converts a copy of 22 into a first version of 22G, the graft copy (Figure 10).

In 22G, all post-early-version text is now contained not in the form of the 1922 publication, but in that of the editorial transcription of the holograph overlay in the pre-publication documents, emended and correctly stratified genetically by means of the automatically supplied diacritic code symbols of List I. 22G is also correct as a clear text in its early-version portion. But the full early-version genetic material and stratification symbols remain to be introduced, and the hand-compiled text from List I needs a machine collation to eliminate errors of the OCR transcription, and to locate transmissional variation in the post-early-version sections of published 22. To automate both operations, 22G is re-lined so that lines or units of lines of post-early-version text alternate with lines or units of lines of early-version text (Figure 11). Thus re-formatted, 22G as a whole is collated against base text 22; or, since textually this amounts to the same thing, against 22U with the transmissional variants in its early version portion already labelled. The collations for the early version sections are discarded, since they simply represent once more the transmissional variants already rejected. Following a collation against 22U, this can be done automatically by instructing TXTKOPIERE to eliminate all textual units containing a 22U labelling symbol. From the collation result as it refers to the post-early-version sections of 22G, the input errors (OCR mistypings as well as possible misreadings of Joyce's holograph revisions) are extracted into the first part of a correction file to update re-lined 22G. The transmissional variants revealed in the published 22 text are traced to their respective documents of origin and are formatted into a supplementary correction file to mark up 22U. To introduce the genetic stratification of F into 22G, the early-version sections only of 22G are extracted, preserving their lining and line numbering. A collation of this extract against F produces the second part of the correction file to update re-lined 22G.

The first and the second part of the correction file combined convert 22G into the fully edited and genetically stratified text, which may now be re-paginated and re-lined as desired. Thereafter, the occasional introduction of the few post-publication authorial revisions revealed in collations of the 1926, 1932, and 1936 editions of Ulysses complete the critically established synopsis of the novel's textual growth. From it, the Emendations list is separated and the edition's clear reading text is automatically generated. A collation of the clear text against 22U provides the Historical Collation list (Figure 12). The edition's several sections are processed through the typesetting programs of TUSTEP into book publication format.

Figure 1: Synoptic Text

back to the text

Figure 2: Clear Reading Text

back to the text

Figure 3: Parallel Line Printout of R, T, and L

back to the text

Figure 4: Collation [R] T and Collation [R] L

back to the text

Figure 5: F

back to the text

Figure 6: FC

back to the text

Figure 7: List I

back to the text

Figure 8: List II

back to the text

Figure 9: List III

back to the text

Figure 10: Correction File to Obtain 22G, First Version

back to the text

Figure 11: 22G Lined to Separate Units of Post-Early-Version Text and Early-Version Text

back to the text

Figure 12: Historical Collation

back to the text

Zur Übersicht über die bisherigen Kolloquien - Stand: 29. April 2002