Textdata Processing with TUSTEP
Introduction
Basic Operations for Textdata Processing in TUSTEP
File Handling and Job Control in TUSTEP
Learning TUSTEP

Introduction

The "TUebingen System of Text Processing Programs" TUSTEP, developed at the University of Tuebingen Computing Center since the late 60es, is a professional toolbox for scholarly processing textual data. Long-term availability, platform independence for procedures and data, coverage of all steps of a typical humanities research project, flexibility based on a consequent modular design are important design principles.

The work began in 1966 when we first designed a series of functions and subroutines for character and string handling in FORTRAN (compatible, in their first version, to those previously developed by the Deutsches Rechenzentrum in Darmstadt) and implemented them on the mainframe of the Computing Center of the University of Tübingen. This made programming easier for projects such as the Metrical Analysis of Latin Hexameter Poetry, the Concordance to the Vulgate, or the edition and indexes to the works of Heinrich Kaufringer.

Proceeding from the experiences gained from those projects, the next step in supporting projects was to no longer rely on programming in FORTRAN or other "high level" languages, but to provide a toolbox consisting of several program modules, each covering one "basic operation" required for processing textual data. The function of each program is controlled by user-supplied parameters; the programs themselves may be combined in a variety of ways, thereby allowing the user to accomplish tasks of the most diversified kind. It was in 1978 when these programs got the name TUSTEP.

We have chosen the term "textdata processing" in order to distinguish between TUSTEP's prime field of application and what is commonly understood by the term text processing or word processing. Naturally, TUSTEP is also equipped with the same functions needed for preparing documents (such as input, editing, formatting, printing of texts, also those in non-latin alphabets); these functions are required for the documentation and for the preparation of publications in all fields of scholarly work, including both humanities and sciences. However, TUSTEP has been developed in particular to serve those academic fields where the texts themselves are the object of scholarly research: philology, literary studies, linguistics, historical sciences, librarianship: i.e. fields of research where not only new texts are to be produced and published as the result of scholarly work, but where existing texts (including literary texts and historical sources) are to be preserved for the future in the form of new critical editions, are to be analyzed in terms of language, style, contents, or are to be catalogued in bibliographical form.

The basic operations required by those tasks include: Automatic collation of different versions of a text; text correction not only by using an editor, but also in batch mode by means of correction instructions prepared beforehand (by manual transcription, or by program); decomposing texts into elements (e.g. word forms) according to rules provided by the user; building logical enities (e.g. bibliographic records) consisting of more than one line of text; sorting such elements or entities (according to non-latin alphabetical rules and other sorting criteria as well); preparing indexes by building entries from the sorted elements; processing textual data by selecting records or elements, by replacing strings or text parts, by rearranging, completing, compressing and comparing text parts on the basis of rules and conditions provided by the user, by retrieving numerical values which are already given in the text (like calendar-dates) or which can be derived from it (such as the number of words in a paragraph); transforming textual data from TUSTEP files into file formats used by other systems (e.g. for statistical analysis or for electronic publication).

The tasks which can be accomplished with the help of TUSTEP range from composing a brief seminar paper to preparing extensive bibliographies, lexica, indexes, concordances, dictionaries, critcal editions and of course monographs; the final output can be formatted for fotocomposition in a quality one is accustomed to in letterpress printing, or can be prepared in a form (e.g. XML, HTML) and encoding (e.g. Unicode) which is required fpr electronic publishing.

In addition to programs for the aforementioned textdata processing operations, TUSTEP features all necessary organizational functions such as file handling and defining new commands, functions which are normally covered by the job control language (JCL) of the respective operating system (OS). Thus, an identical user interface independent of the computer and its OS is provided. This not only saves the user the trouble of having to relearn when he switches to a computer with a different operating system, but also allows him to adopt existing TUSTEP command sequences unchanged.

TUSTEP is constantly being improved and expanded in order to facilitate solutions for new problems in the field of scholarly textdata processing and to take advantage of new developments of hardware and operating systems. Recently added features include a CGI interface and improved support for SGML / TEI / XML markup.

Many projects from almost all humanities disciplines and from many places have contributed to the development of TUSTEP. The TUSTEP development team gratefully acknowledges these contributions and welcomes suggestions for further improvement.

The following list contains a selection of TUSTEP programs for the basic operations of textdata processing and of organizational commands. The names in square brackets are the names of the commands.

1. Basic Operations for Textdata Processing in TUSTEP

2. File handling and job control in TUSTEP

3. Learning TUSTEP

TUSTEP is bilingual: it accepts commands in German and in English and responds in the language used for the latest command.
 
There is also a (preliminary) English translation of the (German)
TUSTEP user's manual. Both versions are available also online.
 
The users manual is not meant to be a teach-yourself text; it is a reference guide for those acquainted with the basic TUSTEP functions. For beginners, there are introductory texts (in German) available from your booksellers: "Lernbuch TUSTEP", bearb. v. Winfried Bader (Tübingen: Niemeyer 1995, XII+384 pages, ISBN 3-484-73019-6), and "Tustep für Einsteiger" by Peter Stahl (Würzburg: Königshausen und Neumann 1996, 308 pages, ISBN 3-8260-1254-2).
 
We advise beginners to take one of our courses (in German). At the University of Tuebingen, we offer short (half-day) introductions once a month; during the semester breaks, there are two courses: a one-week introductory course (5 hours a day, plus exercises) given in March and September, which concentrates on the use of the TUSTEP editor and the other TUSTEP programs required for entering, correcting, searching, formatting and printing texts, plus file handling and other control commands. In September, this introduction is followed by the two weeks long main course (5 hours a day, plus exercises; introductory course or equivalent knowlege required) which covers the full range of TUSTEP commands and teaches the user how to solve complex problems with the help of TUSTEP.
 
Every first Thursday of a month, we offer 2 hours of continuing training where typical applications, new features, special problems are being presented and discussed.
 
In October 1993, in Würzburg/Germany the International TUSTEP User Group (ITUG) has been founded as a forum of information and communication for TUSTEP users. Under http://www.itug.de it offers information on new features contained in TUSTEP, on courses and other meetings, gives access to sample solutions and useful procedures; a mailing list (TUSTEP discussion group) can be subscribed there. Surface mail address: ITUG, c/o Universität Würzburg, Deutsches Seminar, Am Hubland, D-97074 Würzburg, Fax +49-931-888-4616; e-mail: itug@germanistik.uni-wuerzburg.de
 
To facilitate the exchange of information between scholars who use (or plan to use) computers in the humanities, a Colloquium on the Use of Electronic Data Processing in the Humanities at the University of Tübingen is held three times a year - from 1973 - at the University of Tübingen Computing Center. The reports of these Colloquia are published in the journal Literary and Linguistic Computing (prior to 1985: ALLC-- Bulletin); recent reports are available also online (see http://www.tustep.uni-tuebingen.de/kolloq.html).
Revision: July 2001
TUSTEP has been developed at the
Universität Tübingen
Zentrum für Datenverarbeitung
Wächterstrasse 76
D-72074 Tübingen
E-mail: tustep@zdv.uni-tuebingen.de, Tel. +49-7071-2970347, Fax +49-7071-295912

... link to TUSTEP homepage
tustep@zdv.uni-tuebingen.de - last revised: 10. July 2003