An introduction to the Text Encoding Initiative for the 2015 Digital Antiquarian Workshop at the American Antiquarian Society.
TEI
Dawn Childress @kirschbombe
Digital Antiquarian Workshop #daw2015
Digital Scholarly Editing with
XML is an “eXtensible Markup Language”; or,
a metalanguage for defining a set of tags. XML
creates smart documents that contain tags
describing their own structure and content.
What is XML / TEI ?
There are many varieties of XML tag sets:
EAD, XHTML, MODS, MARC XML, & TEI...
Schema: defines the rules for encoding an XML document.
So, the TEI Guidelines define a set of XML tags
for encoding text documents, and provide extensive
documentation on how to use these tags...
What is XML / TEI ?
...as well as allow us to describe various features
of a text, like...
Physical features (paper, condition, binding...)
Metadata (authorship, provenance, encoding rationale...)
Structure (title, chapter, paragraph, line...)
Presentation (italics, underlined, centered...)
Context (named entities, dates, references...)
Editorial (additions, deletions, marginalia, corrections...)
Linguistic (parts of speech)
TEI embeds information about textual features within
the text itself and records this in an explicit, standard,
and machine-readable way, which enables us to
analyze, share, and preserve texts.
TEI is
Scholarly
It creates a strategic representation of a text
It is not simple, neutral or objective
It involves analysis and decision-making
It requires specialized knowledge of the discipline, texts, and research methods
It is intrinsic to the scholarly research that will be performed on the text
Why use TEI ?
XML Components
Processing instructions
Elements (tags)
Attributes
Values
Comments
<?xml version="1.0" encoding="UTF-8"?>
<body>
<!-- text goes here -->
<head rend="bold">Heading of an XML example.</head>
<p n="1">Paragraph of this example.</p>
</body>
XML Syntax
Basic XML element and attribute syntax:
<element attribute="value"> </element>
Every start tag has a closing tag:
<element> </element>
Tags must nest cleanly:
<publicationStmt><p>Not for distribution.</p></publicationStmt>
Tags are case sensitive:
<titlePage> ≠ </Titlepage>
TEI Document Structure
There are other required elements within the <teiHeader>
and <text> elements which we'll discuss in the upcoming sections.
Every TEI document consists of a
TEI Header <teiHeader> and Text <text> section,
all enclosed within the<TEI> element.
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<!-- -->
</teiHeader>
<text>
<!-- -->
</text>
</TEI>
Contextual
identification of names, titles, places, languages, emphasis, etc.
Editorial/analytic
annotation, explication, correction, normalization, etc.
Structural
divisions, chapters, lists, paragraphs, tables, line groups, lines, etc.
Presentational
typographic features like bold, italics, small case, indentations, etc.
Categories of TEI Markup
Three main sections within <text>
Within the <text> element, the <body> element is required.
We end up with something like this:
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<!-- Header goes here -->
</teiHeader>
<text>
<front><!-- Front matter --></front>
<body><!-- Main body of text --></body>
<back><!-- Back matter --></back>
</text>
</TEI>
Mark up the Poe print handout with the major text divisions: front, body, and back. What other structural elements do we have?
TEI has predefined tags for some structural elements. For other structural elements, we use the <div> element, with or without the type="" attribute...
<p> paragraph
<lg> line group
<l> line
<head> heading
<fw> forme work
<item>
<table>
<list>
Exercise 1
Front
"preface"
"ack"
"dedication"
"abstract"
"contents"
"frontispiece"
Body
"volume"
"book"
"chapter"
"section"
"part"
"letter"
"act"
"scene"
"poem"
"canto"
Back
"appendix"
"glossary"
"notes"
"bibliogr"
"index"
"colophon"
Exercise 1
Open the poe.txt.xml document in oXygen and encode the text structures you defined in your print out. (See the TEI Exercises.pdf handout)
<front>
<titlePage>
<titlePart type="main">THE INCOMPLETE WORKS OF EDGAR ALLAN POE:</titlePart>
<titlePart type="sub">A VERY BRIEF ANTHOLOGY</titlePart>
<docImprint>
<publisher>Association of College and Research Libraries</publisher>
<docDate>2012</docDate>
</docImprint>
</titlePage>
<pb/>
<div type="contents">
<head>TABLE OF CONTENTS.</head>
<list>
<item>The Raven ................................ 1</item>
<item>The Angel of the Odd ..................... 2</item>
<item>Scenes from “Politian”.................... 7</item>
<item>Notes .................................... 10</item>
</list>
</div>
</front>
<pb n="1"/>
<body>
<div type="poem">
<head>THE RAVEN.</head>
<lg type="stanza">
<l>Once upon a midnight dreary, while I pondered, weak and weary,</l>
<l>Over many a quaint and curious volume of forgotten lore,</l>
<l>While I nodded, nearly napping, suddenly there came a tapping,</l>
<l>As of some one gently rapping, rapping at my chamber door.</l>
<l>“'Tis some visiter,” I muttered, “tapping at my chamber door— </l>
<l>Only this, and nothing more.”</l>
The <teiHeader> is mandatory and contains descriptive metadata about the document.
<fileDesc> is required and must contain:
<titleStmt>: information about the title and the parties responsible for its content
<publicationStmt>: publication details about the digital text
<sourceDesc>: bibliographic details about the physical source of the digital text
Create a new TEI Lite document, complete the teiHeader, and add out tagged text. (See the TEI Exercises.pdf handout)
<teiHeader>
<fileDesc>
<titleStmt>
<title>The Incomplete Works of Edgar Allan Poe, Digital Edition</title>
<respStmt>
<resp>Encoded with basic TEILite tags</resp>
<name>Dawn Childress</name>
</respStmt>
</titleStmt>
<publicationStmt>
<p>Produced for 'Introduction to TEI' at the Digital Antiquarian Workshop 2015.</p>
</publicationStmt>
<sourceDesc>
<p>Excerpted from electronic texts at the University of Virginia Library.</p>
</sourceDesc>
</fileDesc>
</teiHeader>
Milestones help solve the problem of "overlapping heirarchies".
Milestones are used to mark-up physical and presentational boundaries such as pages, gatherings, and columns that may not coincide with the structure of the text.
<pb /> page break
<cb /> column break
<lb /> line break
<gb /> gathering or folio
<milestone /> generic
The <hi rend=" "> element is a generic tag for marking a word or phrase as typographically distinct.
@rend (attribute), when used with any element, allows us to describe how text is rendered in the original document.
<head rend="center">
<emph rend="italic">
<p rend="indent">
<del rend="strikethrough">
<hi rend="blackletter">
Codes for the Representation of Names of Languages:
www.loc.gov/standards/iso639-2/php/code_list.php
@xml:lang (attribute), when used with a WC3 value, is used to note or define the language of the text (or other languages present in the text).
"en" : : English
"ang" : : English, Old
"de" : : German
"cze" : : Czech
"kor" : : Korean
<emph xml:lang="de">Schmetterling</emph>
Add some milestones, @rend values, and tag a language. (See the TEI Exercises.pdf handout)
The WCS standard date value is a machine-readable date (can include time) and must be constructed with accurate punctuation. For more info, visit:
http://www.w3.org/TR/NOTE-datetime
TEI uses the <date> element with an attribute and WC3 standard date values (yyyy-mm-dd) to encode dates.
@when
@notBefore
@notAfter
@from
<date when="1792-02-28">Feb. 28, 1792</date>
The generic elements <rs> (referring string) and <name> can be used with @type to distinguish the type of entity being named...
<persName> proper name
<placeName> named place
<geogName> river, mountain, etc.
<orgName> name of organization
<roleName> social roles or titles
<name type="person">Isaiah Thomas</name>
...but there are also specialized tags for many named entities...
"Ographies" can be kept in your encoded file, usually in a <div> in the <back>, or in an external file that is referenced in your encoded file.
"Ographies" are structured lists that provide a place to define these named entities. Think of these as local authority files or lists that are created to give context to parts or all of the text. These can be as simple or complex as you need.
personography
placeography
orgography
bibliography
shipography
<div type="editorial">
<listPlace>
<place type="state" xml:id="l_rhode_island">
<placeName>The State of Rhode Island and Providence Plantations</placeName>
<country>United States of America</country>
<region>New England</region>
</place>
</listPlace>
<listOrg>
<org xml:id="o_federal_reserve">
<orgName>The Federal Reserve</orgName>
<desc>Bank through which the US currency ... </desc>
</org>
</listOrg>
<listBibl>
<bibl xml:id="b_lee_1964">
<author>Harper Lee</author>
<title>To Kill a Mockingbird</title>
<date>1964</date>
</bibl>
</listBibl>
<list type="filmography">
<item xml:id="raiders">
<name>Raiders of the Lost Ark</name>
<desc>First in a series of action-adventure movies
detailing the career of Indiana Jones ... </desc>
</item>
</list>
</div>
The people, places, and other named entities defined in our "ographies" can be referenced in the text using the @ref and @xml:id attributes.
In the text: <persName ref="#RLP">Richard L. Parker</persName>
In the "ography": <person xml:id="RLP"><!-- --></person>
<text>
<persName ref="#P1234"> Elder Edmond Lougee </persName>
</text>
<!-- .... -->
<back>
<person xml:id="P1234">
<p>Edmund or Edmond Lougee was born in Exeter Newmarket, Rockingham,
New Hampshire, USA on 1731 to John Lougee and Anne Gilman. He married
Hannah Lord and had 7 children. He passed away on 3 Jun 1807 in Loudon,
New Hampshire, USA.</p>
</person>
</back>
Tag a few people, places, or other named entities, then define and reference them using the provided "ographies" . (See the TEI Exercises.pdf handout)
<subst> surrounds <add> & <del> tags when the added text is a substitution for the deleted text.
TEI allows us to record deletions, additions, corrections, and other evidence of the writing process, whether by the author of a literary text or by a scribe copying out a manuscript.
<add> addition to the text
<del> deleted text
<subst> substitution
<sic> an error in the text
<corr> a corrected error
<supplied>
@resp : who's responsible? usually with defined xml:id
@place : supralinear, margin, above, below, etc.
@rend : strikethrough, etc.
use with @resp to show something was added by the editor
In some cases, there may be more than one transcription or encoding possibility to choose from, or the editor may want to normalize a part of the text; in these cases, TEI provides the <reg> and <orig> elements with the <choice> wrapper element.
<l>My
<choice>
<reg>Mistress'</reg><orig>Mistres</orig>
</choice> eyes are nothing like the
<choice>
<reg>sun</reg><orig>Sunne</orig>
</choice>,
</l>
<l>
<choice>
<reg>Coral</reg><orig>Curral</orig>
</choice> is far more red
<choice>
<reg>than</reg><orig>then</orig>
</choice> her lips red,
</l>
Later, the interface can be designed to display either the normalized or the diplomatic transcription, or both.
A note is any additional comment found in a text, marked in some way as being out of the main textual stream.
Notes can be marked using the <note> tag
Notes may be in a different location, hand or typeface, may be authorial or editorial, and may have been added later
It is important to distinguish between authorial and editorial notes — use @type for this
Headings or descriptive labels for a passage, can use <label> instead of <note>
Use attributes to specify these and other characteristics of notes (@place, @resp, @type)
Notes can be encoded directly at the point of attachment...
<p>Why does the language-maven in the street
(or the senior common-room, or the bar at the Groucho
Club <note>An establishment patronized by media folk
in London (provided the club will have them as
members).</note>) have such a low opinion of
linguists? Because...</p>
...or can be added to a <div type="notes"> in the back matter, using the <ref> and <ptr> tags at the point of attachment pointing back to the note.
<div>
<head>Beyond "anything goes"</head>
<p> Why does the language-maven in the street
(or the senior common-room, or the bar at the Groucho
Club <ptr target="#note6"/>) have such a low opinion of
linguists? Because...</p>
</div>
<back>
<head>Notes</head>
<!-- other notes here -->
<note xml:id="note6">An establishment patronized
by media folk in London (provided the club will have
them as members).
</note> <!-- and here -->
</back>
Tag some additions, deletions, corrections, notes, etc. (See the TEI Exercises.pdf handout)
OxGarage
The TAPAS Project
JuxtaCommons / JuxtaEditions
TEI Boilerplate
VersioningMachine
Maps & Texts (in development)
EpiDoc
XSLT
JavaScript
CSS
Get some help...
... or roll your own
TEI P5 Guidelines
TEI by Example
Women Writers Project (resources)
JuxtaCommons / JuxtaEditions