An introduction to the Text Encoding Initiative for the 2015 Digital Antiquarian Workshop at the American Antiquarian Society.
Dawn Childress @kirschbombe
Digital Antiquarian Workshop #daw2015
Digital Scholarly Editing with
XML is an “eXtensible Markup Language”; or,
a metalanguage for defining a set of tags. XML
creates smart documents that contain tags
describing their own structure and content.
What is XML / TEI ?
There are many varieties of XML tag sets:
EAD, XHTML, MODS, MARC XML, & TEI...
So, the TEI Guidelines define a set of XML tags
for encoding text documents, and provide extensive
documentation on how to use these tags...
What is XML / TEI ?
...as well as allow us to describe various features
of a text, like...
TEI embeds information about textual features within
the text itself and records this in an explicit, standard,
and machine-readable way, which enables us to
analyze, share, and preserve texts.
Why use TEI ?
Basic XML element and attribute syntax:
<element attribute="value"> </element>
Every start tag has a closing tag:
Tags must nest cleanly:
<publicationStmt><p>Not for distribution.</p></publicationStmt>
Tags are case sensitive:
<titlePage> ≠ </Titlepage>
TEI Document Structure
Every TEI document consists of a
TEI Header <teiHeader> and Text <text> section,
all enclosed within the<TEI> element.
<TEI xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader> <!-- --> </teiHeader> <text> <!-- --> </text> </TEI>
Categories of TEI Markup
Three main sections within <text>
We end up with something like this:
<TEI xmlns="http://www.tei-c.org/ns/1.0"> <teiHeader> <!-- Header goes here --> </teiHeader> <text> <front><!-- Front matter --></front> <body><!-- Main body of text --></body> <back><!-- Back matter --></back> </text> </TEI>
<front> <titlePage> <titlePart type="main">THE INCOMPLETE WORKS OF EDGAR ALLAN POE:</titlePart> <titlePart type="sub">A VERY BRIEF ANTHOLOGY</titlePart> <docImprint> <publisher>Association of College and Research Libraries</publisher> <docDate>2012</docDate> </docImprint> </titlePage> <pb/> <div type="contents"> <head>TABLE OF CONTENTS.</head> <list> <item>The Raven ................................ 1</item> <item>The Angel of the Odd ..................... 2</item> <item>Scenes from “Politian”.................... 7</item> <item>Notes .................................... 10</item> </list> </div> </front> <pb n="1"/> <body> <div type="poem"> <head>THE RAVEN.</head> <lg type="stanza"> <l>Once upon a midnight dreary, while I pondered, weak and weary,</l> <l>Over many a quaint and curious volume of forgotten lore,</l> <l>While I nodded, nearly napping, suddenly there came a tapping,</l> <l>As of some one gently rapping, rapping at my chamber door.</l> <l>“'Tis some visiter,” I muttered, “tapping at my chamber door— </l> <l>Only this, and nothing more.”</l>
<teiHeader> <fileDesc> <titleStmt> <title>The Incomplete Works of Edgar Allan Poe, Digital Edition</title> <respStmt> <resp>Encoded with basic TEILite tags</resp> <name>Dawn Childress</name> </respStmt> </titleStmt> <publicationStmt> <p>Produced for 'Introduction to TEI' at the Digital Antiquarian Workshop 2015.</p> </publicationStmt> <sourceDesc> <p>Excerpted from electronic texts at the University of Virginia Library.</p> </sourceDesc> </fileDesc> </teiHeader>
Milestones are used to mark-up physical and presentational boundaries such as pages, gatherings, and columns that may not coincide with the structure of the text.
@rend (attribute), when used with any element, allows us to describe how text is rendered in the original document.
@xml:lang (attribute), when used with a WC3 value, is used to note or define the language of the text (or other languages present in the text).
TEI uses the <date> element with an attribute and WC3 standard date values (yyyy-mm-dd) to encode dates.
<date when="1792-02-28">Feb. 28, 1792</date>
The generic elements <rs> (referring string) and <name> can be used with @type to distinguish the type of entity being named...
<name type="person">Isaiah Thomas</name>
...but there are also specialized tags for many named entities...
"Ographies" are structured lists that provide a place to define these named entities. Think of these as local authority files or lists that are created to give context to parts or all of the text. These can be as simple or complex as you need.
<div type="editorial"> <listPlace> <place type="state" xml:id="l_rhode_island"> <placeName>The State of Rhode Island and Providence Plantations</placeName> <country>United States of America</country> <region>New England</region> </place> </listPlace> <listOrg> <org xml:id="o_federal_reserve"> <orgName>The Federal Reserve</orgName> <desc>Bank through which the US currency ... </desc> </org> </listOrg> <listBibl> <bibl xml:id="b_lee_1964"> <author>Harper Lee</author> <title>To Kill a Mockingbird</title> <date>1964</date> </bibl> </listBibl> <list type="filmography"> <item xml:id="raiders"> <name>Raiders of the Lost Ark</name> <desc>First in a series of action-adventure movies detailing the career of Indiana Jones ... </desc> </item> </list> </div>
The people, places, and other named entities defined in our "ographies" can be referenced in the text using the @ref and @xml:id attributes.
In the text: <persName ref="#RLP">Richard L. Parker</persName>
In the "ography": <person xml:id="RLP"><!-- --></person>
<text> <persName ref="#P1234"> Elder Edmond Lougee </persName> </text> <!-- .... --> <back> <person xml:id="P1234"> <p>Edmund or Edmond Lougee was born in Exeter Newmarket, Rockingham, New Hampshire, USA on 1731 to John Lougee and Anne Gilman. He married Hannah Lord and had 7 children. He passed away on 3 Jun 1807 in Loudon, New Hampshire, USA.</p> </person> </back>
TEI allows us to record deletions, additions, corrections, and other evidence of the writing process, whether by the author of a literary text or by a scribe copying out a manuscript.
In some cases, there may be more than one transcription or encoding possibility to choose from, or the editor may want to normalize a part of the text; in these cases, TEI provides the <reg> and <orig> elements with the <choice> wrapper element.
<l>My <choice> <reg>Mistress'</reg><orig>Mistres</orig> </choice> eyes are nothing like the <choice> <reg>sun</reg><orig>Sunne</orig> </choice>, </l> <l> <choice> <reg>Coral</reg><orig>Curral</orig> </choice> is far more red <choice> <reg>than</reg><orig>then</orig> </choice> her lips red, </l>
Notes can be encoded directly at the point of attachment...
<p>Why does the language-maven in the street (or the senior common-room, or the bar at the Groucho Club <note>An establishment patronized by media folk in London (provided the club will have them as members).</note>) have such a low opinion of linguists? Because...</p>
...or can be added to a <div type="notes"> in the back matter, using the <ref> and <ptr> tags at the point of attachment pointing back to the note.
<div> <head>Beyond "anything goes"</head> <p> Why does the language-maven in the street (or the senior common-room, or the bar at the Groucho Club <ptr target="#note6"/>) have such a low opinion of linguists? Because...</p> </div> <back> <head>Notes</head> <!-- other notes here --> <note xml:id="note6">An establishment patronized by media folk in London (provided the club will have them as members). </note> <!-- and here --> </back>
Get some help...
... or roll your own