Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

University Library, University of Illinois at Urbana-Champaign

Metadata Services

Acquisitions & Cataloging Services

Metadata/XML Terms



A class of XML markup object used to associate with an element a named value that elaborates, augments, or functions as the content of the element. Syntactically, an attribute is a name-quoted value pair appearing in an element start-tag or as part of an empty-element-tag.



XML applications parse XML metadata records in order to distinguish content from markup. CDATA (Character Data) is content that is not meant to be parsed. Segments of a metadata record designated as CDATA can include characters such as “<” and “&” without causing confusion between markup and content.



Markup used to insert a comment into an XML metadata record. Syntactically, XML Comments begin with “<!--” and end with “-->”. The text of the comment is contained between these character sequences.

Content object


In an XML metadata record each discrete unit of information can be modeled as a content object. For example, the title of a book is a common content object in a bibliographic record. XML is a way to organize a metadata record into an Ordered Hierarchy of Content Objects.

Document Object Model (DOM)


The World Wide Web Consortium (W3C) has formally defined a generic Document Object Model (DOM) application programming interface for accessing and manipulating the content and markup of XML and HTML documents. The DOM approach models such documents as trees of nodes. The DOM is similar to but not identical with the ways that schemas, DTDs, XPath, and XSLT model XML documents. The term “DOM” is often used imprecisely. The W3C DOM is primarily of interest to programmers.



A Document Type Definition is used to define the names of and relationships between markup objects (elements, attributes, and entities) allowed in XML metadata records conforming to a specific metadata standard. XML metadata records can be automatically validated for conformity to a DTD as a way to ensure record correctness.



A core class of XML markup object used in delineating the structure and hierarchy of an XML metadata record. Syntactically, elements are delimited by a start-tag and paired end-tag or by an empty-element-tag. An element may contain a content object value or other elements (children) or be empty.



In XML, empty elements are sometimes used as structural tokens (e.g., to indicate a chapter break) or to show the absence of information (e.g., an empty title element for an untitled work). Empty elements can be represented by a start-tag immediately followed by an end-tag but more typically are represented by a single empty-element-tag, such as <title />.



Markup denoting the end of an element. Always paired with a start-tag and always begins “</”. The name immediately following the slash must match the element name of the corresponding start-tag, such as <title></title>.

General entities and character references


Classes of XML markup objects used as placeholders in content or attribute values. Syntactically, these begin with an ampersand character (“&”) and end with a semicolon (“;”), for example, “&lt;” for “<”. They stand in for special characters (e.g., the copyright symbol, “©”), for sequences of characters and/or markup used in multiple places in a record (e.g., a date), or for nontextual content.



Everything in an XML metadata record that is not part of the metadata (i.e., content) is markup. XML markup begins and ends with left- and right-hand angle brackets (“< >”) and exposes the structure and content objects of an XML metadata record, facilitating record use and reuse.



XML Namespaces provide a means to associate element and attribute names used in XML metadata records with a particular metadata grammar (standard) and a specific DTD or Schema (for validation). Namespaces are globally identified by Uniform Resource Identifiers.



XML metadata records are often represented as tree structures. In tree data model views of XML, each part of the tree—each element, attribute, entity, segment of text, and so on—is called a node.



Unlike CDATA, Parsed Character Data are meant to be parsed and interpreted by XML parsers. For example, the character reference “&gt;” in PCDATA will be recognized as a stand-in for “>”, and the string <title> will be recognized as a start-tag.

Processing instructions


Markup used to insert an application-specific instruction into an XML metadata record. XML parsers are allowed to ignore these instructions, but certain processing instructions are widely recognized, such as the one for associating an XSLT style sheet with an XML document. Syntactically, XML processing instructions begin with “<?” (but not “<?xml”) and end with “?>”.



An XML schema (like a DTD) is used to define names of and relationships between markup objects (element, attribute, and entity) allowed in a class of XML metadata records. Records can be automatically validated for conformity to a schema (or schemas). As compared to DTDs, schemas can be used in combination and better specify constraints on the type of data an element may contain.



XML was derived from Standard Generalized Markup Language, an international standard meta-markup language that predates the World Wide Web. XML is largely a restricted subset of SGML that is better optimized for use on the Web.



Markup denoting the beginning of an element. Always paired with an end-tag and always begins with “<” followed immediately by the name of the element, such as <creator></creator>. Start-tags may also include attributes.

Style sheet


Used to process the contents of an XML metadata record for presentation. XSL style sheets also can be used to analyze, manipulate, and transform XML metadata records in order to reuse or repurpose their information.

Valid XML


A valid XML metadata record meets all the syntactic requirements of the XML specifications, such as for element naming, attribute and general entity syntax, and so on. Additionally, a valid XML metadata record also conforms to the constraints on semantics specified in a DTD or schema.

Well-formed XML


A well-formed XML metadata record meets all the syntactic requirements of XML, such as for element naming and attribute and general entity syntax, but has not been validated against a DTD or schema. A well-formed XML metadata record becomes valid XML when it is validated against a DTD or schema.



Extensible HTML (XHTML) is a version of HTML that conforms to the syntactic requirements of XML (which are generally stricter and cleaner than the syntactic requirements of HTML).



XML Inclusion provides syntax and a processing model for merging together segments of separate XML documents.



XML Linking Language is used to define links between XML metadata records and between XML metadata records and other kinds of content. Analogous to HTML anchor elements but more expressive.



Extensible Markup Language (XML) is an open standard that is used to serialize (i.e., encode and describe) structured data and to facilitate the maintenance, organization, sharing, and reuse of these data by computer applications. (See additional definitions in Chapter 1.)

XML Declaration


Every XML version 1.0 metadata record should start—and every XML version 1.1 metadata record must start—with an XML Declaration. Syntactically, an XML Declaration starts with “<?xml” and ends with “?>”. An XML Declaration specifies the version of XML to which the record conforms syntactically and may declare character encoding and other similar information about the record.



XML Path language is used to navigate, analyze, and retrieve specific information from XML metadata records. XPath is an expression-based language that includes a set of built-in core operators and functions to give the language additional intrinsic programming power. It is meant to be embedded within a host language such as XSLT.



XML Pointer allows direct citing of individual components of an XML document (based on the component’s XPath location path expression).



XML Query Language is a language for searching an XML document directly. It relies on XPath and shares the same underlying data model.



Schemas conforming to the W3C XML Schema Definition Language are commonly given the file extension “.xsd” and referred to as “XSDs.”



The Extensible Stylesheet Language (XSL) family of specifications defines the semantics that enable XML metadata record transformation and presentation. The main components of XSL are XPath, XSLT, and XSL-FO.



The Extensible Style Sheet Language for Formatting Objects is an expression-based language for formatting XML. XSL-FO borrows concepts from the Cascading Stylesheet (CSS) language, extending this model to support more complex formatting, such as the creation of a PDF from XML.



XSL Transformations is a declarative, functional programming language for manipulating XML. XSLT can be used to selectively transform, reuse, and repurpose components of XML metadata records, typically saving generated results in other XML formats, such as XHTML, or other metadata standards. XSLT depends on XPath.

Glosssary terms taken in part from Cole, Timothy W. and Myung-Ja Han. 2013. XML for Catalogers and Metadata Librarians. Third Millennium Cataloging, Susan Lazinger and Sheila Intner (series eds.). Westport, CT: Libraries Unlimited.