XML Bulk Loading Guide
From D3A2wiki
Contents |
D3A2 Metadata Bulk Loading Data Format Guide
(Draft 2)
Overview
- The D3A2 Resource Exchange will provide a shared pool of aligned academic content for Ohio’s K12 teachers. In order to populate this repository, Ohio’s public content providers, starting with the work of OPIPSAA and later under the banner of D3A2, have created a Dublin Core-based metadata profile to describe electronic resources.
- The first phase of implementing the D3A2 Resource Exchange will involve bulk uploads of selected resources from the participating content providers. This information will use the XML form of the D3A2 Metadata Profile v1.0 described in this document. Content providers will need to extract selected resources from their internal content repositories and format their native metadata into the D3A2 Metadata Profile. The content provider will then upload this data file to the Resource Exchange (a web site), where it will be checked and, if no formatting errors are found, the resources will be added to the collection.
- The future phases of the D3A2 Resource Exchange project will provide more automated content entry processes, but these will likely involve programming changes or extensions to the Content Providers’ collection management systems. The D3A2 community will work to establish a set of standards for this level of interoperability.
Levels of Detail
- The D3A2 Metadata Profile, version 1, specifies three levels of metadata. The most basic set, the “required” metadata can be seen in Example 1 below. The specification also describes a superset of “desirable” metadata, as well as a “complete” set of metadata contain all of the elements described in the Profile.
01 <?xml version="1.0" standalone="yes" encoding="UTF-8"?>
02 <dataload xmlns:d3a2="/d3a2.xsd" xmlns:dc="http://purl.org/dc/elements/1.1/">
03 <d3a2:resource>
04 <dc:identifier>CP:001</dc:identifier>
05 <d3a2:provider>Content Provider Name</d3a2:provider>
06 <d3a2:providerWebsite>http://Provider.com</d3a2:providerWebsite>
07 <dc:title xml:lang='en'>The Title</dc:title>
08 <dc:abstract xml:lang='en'>Some kind of abstract here.</dc:abstract>
09 <dc:uri>http://location_of_resource.org</dc:uri>
10 <d3a2:osic>Y2003.CMA.S04.GPK-03.B02.L01.I02</d3a2:osic>
11 <d3a2:osic>Y2003.CSC.S02.G02-04.B01.L03.I01</d3a2:osic>
12 <dc:type>Collection</dc:type>
13 <d3a2:function schema='D3A2-function_group'>instructional component</d3a2:function>
14 <d3a2:function schema='D3A2-instructional_component'>lesson plan</d3a2:function>
15 <d3a2:format>HTML</d3a2:format>
16 <d3a2:medium schema='D3A2:carrier'>electronic document</d3a2:medium>
17 </d3a2:resource>
18 </dataload>
Example 1: Sample D3A2 resource file containing only the required elements for one record.
Note: Assessment Item resources used by the D3A2 Item Analysis tool will contain an additional element: Ohio Assessment Item URN (d3a2:OhioAssessmentItemURN). This element contains a unique OSIC-like code that is used to look up a specific assessment item.
Formating Resource Records into an XML File for Upload
The majority of the effort for content providers to share resources will involve selecting and extracting resources from their internal collection management system and formating them into XML resource records in a text/XML file. The XML record format is fairly simple. The table below discusses each element, its controlled vocabulary (if it has any), as well as any attributes that will need to be specified. Note that all text characters should be UTF-8 compliant.
| Element | Common Name | Description | List of Values/Attributes |
|---|---|---|---|
| dc:identifier | Resource Identifier | This element contains a unique identifier for the resource. The format for the identifier is ContentProviderID:UniqueID.
Where ContentProviderID is a short version or initialization of the content provider's name. If there are multiple content providers with the same initialization, we will work with them to develop unique IDs for all of them. UniqueID is a string which uniquely identifies this resource from the provider's other resources. These values do not need to contain meaning other than being unique. They will be stored as strings, and there are not limits (practically) on the values contained. Example:
This element's value is critical to the proper functioning of relationships defined in the optional d3a2:relation element. It is also used to identify the record to update if the resource is loaded again; therefore it is important that the same identifier can be linked to the resource again. | |
| d3a2:dateEntered | N/A | This is another system-generated element. It contains the date/time the resource was added to the system. | |
| d3a2:provider | Provider | This element contains a simple text string containing the name of the content provider. Be Consistent! Make sure you pick an “official” name and stick with it. This name must match the name as shown on the Resource Exchange. | |
| d3a2:providerWebsite | Provider Website | This element provides a URL to the your organization’s website. While it may seem redundant to include this on every record, we’re doing this so the metatdata remains complete and self-contained, regardless of the system it lives in. | |
| dc:title | Title | This element should contain the complete title of the resource. | |
| dc:abstract | Abstract | The abstract element should contain a short (<50 word?) well-written, descriptive summary of the resource. The contents of this element will be displayed to people using the system when they search/browse for resources. Along with the standards alignment information, the information in this element is likely to have the biggest impact on whether a teacher chooses this resource. | |
| dc:uri | URI | The uri element contains the direct link to the resource on your (the content provider’s) website.
For phase one, all resources should be publicly available and with minimum access controls. If the resource needs to be behind an authentication/authorization checkpoint, the URI provided in this element should persist through the challenge/response process. Your system should not loose track of the target resource. | |
| d3a2:osic | OSIC Codes | The OSIC element contains a single reference to a statement in the Ohio Academic Content Standards. The actual content of the element is an OSIC URN (see <a href="/index.php?title=OSIC" title="OSIC">OSIC</a>). A resource can be associated with multiple OSICs by including multiple OSIC elements. Only the most specific OSICs need be included. Eg. if you have <d3a2:osic>Y2003.CMA.S04.GPK-03.B02.L01.I02</d3a2:osic> it is not necessary also to have <d3a2:osic>Y2003.CMA.S04.GPK-03.B02.L01</d3a2:osic> or <d3a2:osic>Y2003.CMA.S04.GPK-03.B02</d3a2:osic>. | |
| dc:type | Type | This is the standard Dublin Core type element. It describes, in a general sense, what type of resource this is. | Controlled Vocab:
|
| d3a2:function | Function | The d3a2:function element should provide information on how the resource should be used in an educational context. This element is qualified by the required attribute schema. Based on the valued of the schema attribute, this element will contain a value from one of six controlled vocabularies. These values are defined in Table 2 below.
Note that multiple d3a2:function elements can be provided for a single resource. For example, if a resource has a d3a2.function element containing the values “D3A2-function_group” and “instructional component” for the schema attribute and element payload, one would expect that the content provider might also include a qualifying d3a2.function element. In this case, if the resource describes a lesson plan, including a d3a2 function element with the values of “D3A2-instructional_component” and “lesson plan” for the values of the schema attribute and element payload, respectively, would provide the end-user with much more information to allow her/him make a decision about the resource. | Attribute Name:
Attribute Values:
See Table 2 below for corresponding element values for each attribute value. |
| d3a2:format | Format | The d3a2:format element describes the physical or electronic nature of the resource. It does this with the help of two additional elements, d3a2,medium (which is required), and d3a2.extent (which is recommended). | Attribute Name:
Attribute Values:
|
| d3a2:medium | Medium | d3a2:medium provides a qualifying value for the d3a2.format value. The medium of the resource describes granular form the resource. For example, if a resource (e.g., a movie) is described as a “video” in the d3a2:format element, the d3a2.medium element allows you to describe the resource’s physical form (DVD, VHS, laser disc, etc). | Attribute Name:
Attribute Values: |
| d3a2:extent (optional) | Extent | Extent provides a qualifying value for the d3a2.format value. It represents the size or duration of the resource. | |
| dc:tableOfContents (optional) | Table of Contents | This is a list describing the contents of the resource. | |
| dc:subject (optional) | Subject | This is the topic of the resource. It is much more specific than the main content area as identified in the OSIC code. | |
| d3a2:thinkingLevel (optional) | Thinking Level | The Thinking Level of the content. Values for Thinking Level have yet to be defined by D3A2. | |
| dc:audience (optional) | Audience | A class of entity for whom the resource is intended or useful. | |
| dc:language (optional) | Language | A language of the intellectual content of the resource. | |
| dc:creator (optional) | Creator | An entity primarily responsible for making/developing the content of the resource. | |
| dc:publisher (optional) | Publisher | An entity responsible for making the resource available. | |
| dc:source (optional) | Source | A reference to a resource from which the present resource is derived. | |
| d3a2:cost (optional) | Cost | Any costs associated with the use of the resource. E.g. purchase of content, license to use, and shipping. | |
| d3a2:approver (optional) | Approver | E-mail of person or organization approving the item for inclusion in the D3A2 system. | |
| dc:coverage (optional) | Coverage | The extent or scope of the content of the resource. | |
| dc:educationLevel (optional) | Education Level | Refines Audience. A general statement describing the education or training context. Alternatively, a more specific statement of the location of the audience in terms of its progression through an education or training context. | |
| dc:instructionalMethod (optional) | Instructional Method | A process, used to engender knowledge, attitudes and skills, that the resource is designed to support. | |
| d3a2:version (optional) | Version | Version of the D3A2 Metadata Standard applied. | |
| dc:rights (optional) | Rights | Information about rights held in and over the resource. | |
| dc:rightsHolder (optional) | Rights Holder | A person or organization owning or managing rights over the resource. | |
| d3a2:ohioAssessmentItemURN (optional) | Ohio Assessment Item URN | A unique identifier for assessment items. | |
| d3a2:technicalRequirement (optional) | Technical Requirement | Technical prerequisites for using this resource. | |
| dc:contributor (optional) | Contributor | An entity responsible for making contributions to the content of the resource. | |
| d3a2:review (optional) | Review | Third Party commentary or formal review of the resource. | |
| d3a2:reviewer (optional) | Reviewer | Name of person and/or organization or authority affiliated with the review. | |
| dc:relation (optional) | Relation | The relationship of this resource to other resources. The intent of this element is to provide a means to express relationships among resources that have formal relationships to others, but exist as discrete resources themselves. For example, images in a document, chapters in a book, or items in a collection.
This is a suggested implementation of the element and it's refinements (taken from the W3C's RDF spec found at: <a href="http://www.w3.org/TR/1998/WD-rdf-syntax-19980819/" class="external free" title="http://www.w3.org/TR/1998/WD-rdf-syntax-19980819/" rel="nofollow">http://www.w3.org/TR/1998/WD-rdf-syntax-19980819/</a> There are newer versions of this document, but the structures they implement seem overly complex for our purposes. We should discuss this....) <dc:Relation dc:RelationType="isPartOf" value="[dc:identifierValue]"/> The "value" attribute would contain the identifier of the resource that we're expressing a relationship about. Relation Example: <dc:Relation dc:RelationType="isPartOf" value="WNEO:Vid_Amazon_Collection"/>
This example is saying that the current resource is part of WNEO's video collection on the Amazon. To make this work, you (as a content provider) should include a resource that describes the collection as a whole, then the individual resources that make up the collection. Ideally, if you provide DC:Relation elements at both the collection and resource level, we'll be able to show a clickable list of related resources in the result set to the user. | Attribute Name:
Attribute Values:
|
| d3a2:technicalRequirement (optional) | Technical Requirement | A free-form, human-readable text field where the content provider can detail any special requirements that are needed to make use of the resource. |
| Schema Value | Possible Element Values |
|---|---|
| D3A2-function_group |
|
| D3A2-instructional_component |
|
| D3A2-item_source |
|
| D3A2-item_type |
|
| D3A2-content |
|
| D3A2-prof_dev |
|
| D3A2-format |
|
| D3A2-medium |
|
Table 2: Authoritative values for element schema
