Linked data “[uses] the Web to connect related data that wasn't previously linked, or using the Web to lower the barriers to linking data currently linked using other methods.” Linked Data sits at the top of a five-star tiered deployment scheme proposed by Berners-Lee that describes how accessible data is, as below.
★ |
Available on the web (whatever format) but with an open licence, to be Open Data |
★★ |
Available as machine-readable structured data (e.g. excel instead of image scan of a table) |
★★★ |
as (2) plus non-proprietary format (e.g. CSV instead of excel) |
★★★★ |
All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff |
★★★★★ |
All the above, plus: Link your data to other people’s data to provide context |
Berners-Lee, T. (2006). Linked Data. Design Issues. [revised 2009]. Retrieved from:
https://www.w3.org/DesignIssues/LinkedData.html
Linked Data relies on the W3C’s Resource Description Framework (RDF) standard for its essential structure, which provides the basis for the fourth tier of Berner-Lee’s 5-star deployment scheme. Designed for conveying descriptions of Web resources, the RDF data model supports simple three-part propositions called triples (serialization), consisting of a Subject, a Predicate, and an Object. Each triple asserts a relationship (Predicate) between the Subject and the Object. RDF is dependent on identifiers (URIs) to accurately and unambiguously identify subjects and objects that are resources. RDF is also dependent on well-defined vocabularies and ontologies that enumerate unambiguous, meaningful predicates.
While libraries have been collecting and providing access to rich and unique resources, those resources are often inaccessible on the web. Linked data and semantic web technologies could do:
The library community has been experimenting with Linked Data in two different ways: Schema.org, as led by the web community and OCLC, and BibFrame, as led by the Library of Congress. These two approaches have similarities and differences in approach, i.e., while BibFrame focuses on data creation and management, Schema.org focuses on data discovery in a web environment.
Schema.org
Developed in collaboration with and under the sponsorship of major web search engine providers (i.e., Google, Yahoo!, and the Microsoft Corporation), the semantics of Schema.org have gained good traction within the broader web community. Within a year of its introduction in June 2011, one researcher reported that 7%-10% of pages being indexed by major search engines contained Schema.org markup (Wallis 2012). For this reason, OCLC published all of its catalog records as linked data using Schema.org in 2012. Schema.org supports communities in creating extensions consistent with subclasses and vocabularies in addition to a core to meet each specific community’s needs. The library community has Bibliographic Extensions to support library specific vocabularies, including Audiobook, Thesis, ComicStory, and workTranslation.
BibFrame
Initiated by the Library of Congress, BibFrame was developed to ensure “retaining as much as possible the robust and beneficial aspects of the historic format” being used in the library domain (Library of Congress 2012). Like most LOD-compliant ontologies, Bibframe provides both a data model against which information retrieval systems can be designed, and a vocabulary that defines the objects and relationships of interest in complex metadata descriptions like MARC records. Bibframe also applies many of the principles that emerged from IFLA’s Functional Requirements for Bibliographic Records (FRBR) framework (IFLA 1998) which under ideal circumstances provide a set of powerful abstractions that allow various editions and versions of books and similar information objects to be linked at several levels of intellectual distinction.
Research outcomes
Ongoing research project
About the Linked Data
Linked Data Research in the Library Community
MJ Han (2017-02-21)