It is recommended that all projects dealing with digital content attempt to follow the guidelines established in the most recent version of the PREMIS (PREservation Metadata Implementation Strategies) Data Dictionary for Preservation Metadata.
The PREMIS guidelines are very detailed in describing the preservation metadata that is ideal to capture. It is only recommended that all projects attempt to meet a few minimum requirements detailed below.
The PREMIS Data Model consists of five primary entities:
The PREMIS Data Dictionary details the recommended preservation metadata recommended to be captured for the last four entities (Objects, Events, Agents, and Rights). The Intellectual Entity is not covered by PREMIS, as it would be described using the best practices for descriptive metadata.
It's also worth noting that PREMIS deals with three types of Objects:
The PREMIS Data Model is described more completely in the Introduction of the PREMIS Data Dictionary for Preservation Metadata.
The PREMIS list of recommended preservation metadata is extensive. The following is a list of the minimal metadata which should be captured for each entity (Object, Event, Agent, Rights).
Please note that although these best practices recommend the minimal preservation metadata that should be gathered, PREMIS does not specify a metadata schema for implementation. We recommend storing this metadata in an appropriate metadata schema, based on the used packaging format. For example, if METS is used for packaging, there is an existing PREMIS metadata schema for usage as administrative metadata with METS: http://www.loc.gov/standards/premis/schemas.html
Objects
Minimally, the following preservation metadata should be captured about an Object:
Within the PREMIS data dictionary, this information is expressed as follows. Please note that object type abbreviations refer to: File (F), Representation (R), and Bitstream (B).
Semantic Unit/Component | Object Type | Note | Examples |
---|---|---|---|
objectIdentifierType |
R, F, B
|
The type of identifier used to locate the object within the preservation system in which it is stored. |
hdl (Handle) |
objectIdentifierValue |
R, F, B |
The value of the object's identifier |
2142/8796 |
objectCategory |
R, F, B |
The type of object being described. Controlled Vocab: representation, file, or bitstream |
file |
preservationLevelValue |
R, F |
Level of preservation support attempted for this object. (We need to establish our own controlled vocabulary for these values) |
Categories? 1, 2, 3, or 4? full or bit-level? |
preservationLevelDateAssigned |
R, F |
The date this preservation level was assigned |
2008-03-29 |
fixity |
F, B |
The information necessary to perform occasional fixity checks |
|
messageDigestAlgorithm |
F, B |
Algorithm used to generate the message digest |
MD5 |
messageDigest |
F, B |
Value of the message digest |
(a checksum value) |
size |
F, B |
The size (in bytes) of file |
1024 |
format |
F, B |
|
|
formatDesignation |
F, B |
|
|
formatName |
F, B |
The mime type of the file format |
application/pdf |
originalName |
R, F |
The original filename |
123456.pdf
|
Events
Although all events on objects can oftentimes be difficult to track and record, it is recommended that we attempt to record the following types of events (whenever possible):
Minimally, the following preservation metadata should be captured about an Event on an Object:
Within the PREMIS data dictionary, this information is expressed as follows:
Semantic Unit / Component | Note | Examples |
---|---|---|
eventIdentifierType |
A controlled vocabulary representing the Institution or Company that performed the event. This would likely usually be something like "UIUC Library". |
UIUC Library |
eventIdentifierValue |
An identifier which can be used to reference this event. This should likely be based on the date/time the event occurred, to ensure its uniqueness. |
scan-2008-03-23 |
eventType |
The type of event described. We need to establish our own Controlled Vocabulary of event types. PREMIS documents some suggested terms. |
ingestion |
eventDateTime |
The date/time when the event occurred. Recommended in ISO 8601 |
2006-07-16T19:20:30 |
eventDetail |
Detailed notes (human readable / understandable) of the event that occurred |
(Description of the event: who, what, why, what software was used, etc.) |
linkingAgentIdentifier |
Provides information about which agent performed event |
|
linkingAgentType |
References the agentIdenfierType of the Agent(s) performing the Event (see the Agent section below!) |
UIUC Library |
linkingAgentValue |
References the agentIdenfierValue of the Agent(s) performing the Event (see the Agent section below!) |
|
linkingObjectIdentifier |
Provides information about which object(s) were affected by the event |
|
linkingObjectType |
References the objectIdenfierType of the Object(s) affected by the Event (see the Object section above!) |
(a checksum value) |
linkingObjectValue |
References the objectIdenfierValue of the Object(s) affected by the Event (see the Object section above!) |
1024 |
Agents
Only Agents which perform actual Events on Objects need to be tracked. Agents may be organizations, software programs, systems or individual people.
Minimally, the following preservation metadata should be captured about an Agent which performs an Event:
Within the PREMIS data dictionary, this information is expressed as follows:
Semantic Unit/Component | Note | Examples |
---|---|---|
agentIdentifierType |
A controlled vocabulary representing the type of an agent identifier. For a person, this may be represented as "UIUC NetID". |
UIUC Library |
agentIdentifierValue |
An identifier which can be used to reference this agent. |
tdonohue |
agentType |
The type of agent described. We need to establish our own Controlled Vocabulary of event types. PREMIS documents some suggested terms. |
person |
agentName |
A human readable name for the agent |
Tim Donohue |
Rights
For the purpose of tracking simplistic provenance of digital files, Rights Statements are unnecessary. In PREMIS, Rights Statements tend to document the permissions of a repository on objects within it.
There are no minimally required preservation metadata that should be captured for Rights statements. However, if it is easily captured or available, it is recommended to attempt to record known Copyright Information about individual objects in the following PREMIS data dictionary units.
copyrightInformation
Again, copyright information is not necessary to record, unless it is already known.