Skip to Main Content

University Library

LibGuides

An Introduction to XML and TEI

This guide provides an introduction to XML and the Text Encoding Initiative.

What is XML?

XML is a markup language which means that it encodes certain features of text to enable processing of that text by computers. In many ways it is similar to other markup languages you may have heard of or used, like (X)HTML, EAD or TEI.  But some things that make XML both important and useful are:

  • It is based on an international standard and is non-proprietary. This means everyone can use it and there are rules to ensure it is used in the same way.
  • It is expressed in plain text so that humans can read it without the use of computers
  • It is hardware and software independent, so that it can be used in many different current and future computing contexts
  • It is the basis for TEI, so you must understand its basic rules and structure to be able to work with TEI.

Why encode text?

Why do we want to mark up texts? Though it is human-readable without a computer, XML enables processing with computers. We want to encode because plain text is not good enough for the kinds of projects and research that we want to do.

For example, say you have a large collection of letters. You want to analyze these letters to find out things like how many were from a certain person, how many used different salutations like "Dear." Without making this information explicit by telling a computer, "this bit of text is the sender, and that bit is the salutation," a computer has difficulty performing this task. This is a simplistic example, but there are many types of complicated texts in the humanities that would benefit from this type of explicit structure. 

Subject Guide