Skip to Main Content

University Library, University of Illinois at Urbana-Champaign

An Introduction to XML and TEI

This guide provides an introduction to XML and the Text Encoding Initiative.

Elements and Tags

Like HTML, XML marks the beginning and ends of sections of text using tags:

<sentence>This is a sentence.</sentence>

In this example, "sentence" is what XML calls an element. Elements can be considered the nouns of XML. The sentence element is tagged with <..> at the beginning and ends with </..>. 

XML is based on nested structure, which means that elements cannot overlap.  That is, each element must be contained entirely within another. For example:

<sentence>XML is <emphasis>fun.</emphasis></sentence>

Here, the <emphasis> tag opens and closes inside of the <sentence> tag.

If we were to do this incorrectly, it might look like this:

<sentence>XML is <emphasis>fun.</sentence></emphasis>

Here, the <emphasis> tag starts inside the <sentence> tag, and ends outside it. This is called overlap and it is not allowed in XML.

Attributes and Values

Elements in XML can have attributes, which in turn have values. Attributes are properties or characteristics of an Element. For example:

<name type="person">Abraham Lincoln</name>

<name type="place">Champaign-Urbana</name>

Here, we are using the type attribute with two different values, person and place, to distinguish between two different kinds of names.

To keep these all straight, here is a color coded example:

Elements (tags), attributes, values, content

<sentence type="declarative">This is a sentence.</sentence>

<sentence type="interrogative">Is this a sentence?</sentence>


Well-formedness and Validity

For XML to be well-formed it must:

  • Have everything properly delimited, that is, all elements tagged correctly.
  • Have one single root element into which all other elements are nested
  • Have no overlap

XML must also be valid, but validity must be measured against something called a schema (see the next section on validation in oXygen). Because XML itself contains no predetermined set of elements that are possible, a schema will include:

  • What elements are allowed
  • What elements can nest inside of others
  • An order in which elements must occur
  • Whether or not elements are repeatable
  • What attributes they may have
  • What values the attributes must or can have

Adapted from: