LibGuides: Metadata Services: Linked Data

What is Linked Data?

Linked data “[uses] the Web to connect related data that wasn't previously linked, or using the Web to lower the barriers to linking data currently linked using other methods.” Linked Data sits at the top of a five-star tiered deployment scheme proposed by Berners-Lee that describes how accessible data is, as below.

★	Available on the web (whatever format) but with an open licence, to be Open Data
★★	Available as machine-readable structured data (e.g. excel instead of image scan of a table)
★★★	as (2) plus non-proprietary format (e.g. CSV instead of excel)
★★★★	All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff
★★★★★	All the above, plus: Link your data to other people’s data to provide context

Berners-Lee, T. (2006). Linked Data. Design Issues. [revised 2009]. Retrieved from:
https://www.w3.org/DesignIssues/LinkedData.html

Linked Data relies on the W3C’s Resource Description Framework (RDF) standard for its essential structure, which provides the basis for the fourth tier of Berner-Lee’s 5-star deployment scheme. Designed for conveying descriptions of Web resources, the RDF data model supports simple three-part propositions called triples (serialization), consisting of a Subject, a Predicate, and an Object. Each triple asserts a relationship (Predicate) between the Subject and the Object. RDF is dependent on identifiers (URIs) to accurately and unambiguously identify subjects and objects that are resources. RDF is also dependent on well-defined vocabularies and ontologies that enumerate unambiguous, meaningful predicates.

Linked Data in Libraries

While libraries have been collecting and providing access to rich and unique resources, those resources are often inaccessible on the web. Linked data and semantic web technologies could do:

Make the library data available on the web
Provide additional web resources (information) to library users
Improve discovery service by creating knowledge card (knowledge graph).

The library community has been experimenting with Linked Data in two different ways: Schema.org, as led by the web community and OCLC, and BibFrame, as led by the Library of Congress. These two approaches have similarities and differences in approach, i.e., while BibFrame focuses on data creation and management, Schema.org focuses on data discovery in a web environment.

Schema.org

Developed in collaboration with and under the sponsorship of major web search engine providers (i.e., Google, Yahoo!, and the Microsoft Corporation), the semantics of Schema.org have gained good traction within the broader web community. Within a year of its introduction in June 2011, one researcher reported that 7%-10% of pages being indexed by major search engines contained Schema.org markup (Wallis 2012). For this reason, OCLC published all of its catalog records as linked data using Schema.org in 2012. Schema.org supports communities in creating extensions consistent with subclasses and vocabularies in addition to a core to meet each specific community’s needs. The library community has Bibliographic Extensions to support library specific vocabularies, including Audiobook, Thesis, ComicStory, and workTranslation.

BibFrame

Initiated by the Library of Congress, BibFrame was developed to ensure “retaining as much as possible the robust and beneficial aspects of the historic format” being used in the library domain (Library of Congress 2012). Like most LOD-compliant ontologies, Bibframe provides both a data model against which information retrieval systems can be designed, and a vocabulary that defines the objects and relationships of interest in complex metadata descriptions like MARC records. Bibframe also applies many of the principles that emerged from IFLA’s Functional Requirements for Bibliographic Records (FRBR) framework (IFLA 1998) which under ideal circumstances provide a set of powerful abstractions that allow various editions and versions of books and similar information objects to be linked at several levels of intellectual distinction.

Linked Data in the UIUC Library

Research outcomes

Cole, Timothy W., Myung-Ja K. Han, William Fletcher Weathers, and Eric Joyner. 2013. Library MARC Records into Linked Open Data: Challenges and Opportunities. Journal of Library Metadata, V. 13/Issue 2-3: pp. 163-196.
- We experimented with approximately 30,000 MARC records for digitized books to see what entities could be enhanced with URLs and what additional services could be done with linked data sources. The work was focused on publishing library data to the web by using Schema.org semantics.
Han, Myung-Ja K., Timothy W. Cole, Patricia Lampron, and Maria Janina Sarol. 2015. Exposing Library Holdings Metadata in RDF Using Schema.org Semantics. Proceedings of the International Conference on Dublin Core and Metadata Applications 2015: pp. 41-49.
- Focus was shifted from bibliographic data to holdings and item data describing item specific information. The paper looked at the current holdings/item data structure and recommended ways to encode the same data in Schema.org semantics to improve the visibility in the semantic web environment.
Lampron, Patricia, Jeff Mixter, and Myung-Ja K. Han. 2016. Challenges of Mapping Digital Collections Metadata to Schema.org: Working with CONTENTdm. Proceedings of the Metadata and Semantics Research Conference 2016. Springer Communications in Computer and Information Science, Vol. 37 Issue 6/7: pp. 308-316.
- Digital special collections pose unique challenges in linked data. We looked at digital special collections' metadata housed in CONTENTdm to identify challenges when converting metadata to linked data.

Ongoing research project

Exploring the Benefits for Users of Linked Open Data for Digitized Special Collections
- The University of Illinois at Urbana-Champaign has been awarded a new research grant by The Andrew W. Mellon Foundation to explore the benefits for users of linked open data (LOD) for digitized library special collections. This project will take a look at the use of linked data in special collections--specifically the Motley Collection of Theatre and Costume Design, the Kolb-Proust Archive for Research and Portraits of Actors, 1720-1920 collections, all housed at the University of Illinois at Urbana-Champaign.

Metadata Services

What is Linked Data?

Linked Data in Libraries

Linked Data in the UIUC Library

Further Reading