Sunday, July 20, 2008

Summary reaction to: Promoting shareability: Metadata activities of the DLF Aquifer initiative

Citation:

Riley, Jenn, John Chapman, Sarah Shreeves, Laura Akerman, and Bill Landis. "Promoting shareability: Metadata activities of the DLF Aquifer initiative." Draft article under review.

Over the past year I have taken several opportunities to learn about metadata and to improve my understanding of the issues regarding new metadata schemas, particularly those expressed in xml. Interoperability is often said to be the key reason why non-MARC metadata is important. Despite the potential of sharability because of xml, few metadata schemas are functionally interoperable. I was very surprised to learn that most schemas cannot be easily converted to other formats with ease or be shared. Much of the blame lies in unestablished best practices and lack of coordination between units who wish to share data. Initiatives like DLF Aquifer are needed to realize the full potential of sharable metadata.

The first step to shareable metadata is agreeing on a set of guidelines and following the rules during metadata creation. The inclusion of “not recommended” classification will be especially helpful. I have observed that some catalogers will supply absolutely as much information as possible, even if the added details provide little or no benefit for the user. Frankly stating that providing certain data is “not recommended” will help catalogers provide consistent appropriate data. Additionally the guidelines provide useful information about content vs. the carrier and help to disambiguate elements which have historically problematic in MODS

Creating a set of guidelines is the first step but exposing metadata records in a live portal environment is the best way to test principles and best practices. The ASHO collection should provide valuable information and aid in determining best practices for sharing metadata, especially MODS. I agree with the article that exchanging data in simple Dublin Core won’t meet the needs of our user populations and would like to add that the library profession requires better solutions for providing access and sharing records through harvesting. I am interested to see how harvesting MODS using XQuery technology will enhance our ability to share robust metadata records.

Summary reaction to: Reengineering a national resource discovery service: MODS down under

Citation:

Missingham, Roxanne. (2004). Reengineering a national resource discovery service: MODS down under. D-Lib Magazine, 10(9)

This article demonstrates the flexibility and utility of MODS as an intermediary conversion schema. The National Library of Australia is using the full benefits of xml and new technology to convert records, contribute and share information to provide access to library materials. The authors describe a creative way that MODS was used to convert AGLS, a DC-like homegrown metadata schema, to the final product of a MARC record that could be integrated into a national database. MODS served as the middle ground between the two schemas. The conversion worked well because AGLS schema was able to be mapped to MODS, then enhanced, and finally the MODS was converted to MARC.

The human effort to set up conversion profiles seems to be a very labor intensive project. The authors mention that several batches of test records were needed for various formats. Despite the initial output of time and money, a sustainable workflow that can automate much of the record conversion is a worthwhile endeavor, especially if access to materials is enhanced across the national library and its patrons.

Sunday, July 13, 2008

Describe your hypothetical collection

The Hope library has received a large donation of bicycling related materials from the personal collection of Dr. Shimano, a retired faculty member of the sports science division. Not only has she donated materials including videos, books, manuals, and several cycling related memorabilia, she is donating her time as a subject specialist and 10,000 dollars for cataloging & curation of the collection. Dr. Shimano has always enjoyed the excellent services provided by the library during her career and would like to give back to the library as she enters retirement. Additionally she looks forward to working with the library 20 hours a week in order to keep her mind sharp. She is very enthusiastic and the library feels she will be able to contribute immensely to the project. We consider this project a priority as a way to show our gratitude for her involvement.The collection consists of the following:
Books: 2,000
Videos: 500 (Mostly amateur video of cycling races & training sessions)
Manuals: 350
Other Memorabilia: 300

The bulk of the collection is monogrpahs and will be described using MARC and AACR2. The remainder of the collection will be described with MODS. We have choosen MODS because of the high number of items which are not well suited for traditional cataloging. We intend to create a user interface for Dr. Shimano to aid in the cataloging of the collection. We feel that MODS is best suited for someone with subject expertise and little cataloging experience. A trained cataloger will review her work and supply any additional information.

Dr. Shimano's collection consists of nearly 2,000 books, of which around 95% have acceptable copy in the OCLC database. Since the donor of the collection does not wish or require the library to maintain a separate physical collection for the monographs, all books will be integrated into the main collection. Bookplates will be placed in the materials and donor notes will be added to the MARC records. The other five percent will need original cataloging, and the library intends to perform NACO work as needed on the name headings.

Most of the collection is in excellent condition and very few items will require preservation treatment. We felt that some items should be reconsidered, such as the inclusion of owners manuals for racing bicycles. Dr. Shimano noted that although the library may not consider the owner's manuals important she finds them essential for research. She contends that the manuals provide geometry specfications for bicycle frames which provide rich data for studying ergonomic effects of bicycle frame design on competitive performance. I intend to create a sample MODS record for a represented manual to see how well the schema can be applied to the object.


Summary reaction to: Sound Footings: Building a National Digital Library of Australian Music.

Citation:
Ayres, Marie-Louise, Toby Burrows, and Robyn Holmes. (2004). "Sound Footings: Building a National Digital Library of Australian Music." Research and Advanced Technology for Digital Libraries. Lecture Notes in Computer Science 3232. Berlin/Heidelberg: Springer, 281-291.

Developing a delivery system for discovery of music resources is a very complex task due to the varied formats of materials in typical music collections. Digital music collections have the potential to encompass a variety of audio formats, digital score formats, and text formats. Additionally these objects require a rich metadata schema in order to provide access to the collections. This article discussed some of the key points for consideration during the development and implementation of a digital library. I've listed below some of the key elements of the design plan that I considered vital to the success of the project.
  • MusicAustralia was designed to provide access to Australian music resources. This project was modeled after a similar successful project called PictureAustralia. The developers of MusicAustralia utilized the work of the previous national level project to improve upon and provide access to a more complicated corpus of resources.
  • Sought to provide cost-effective methods. They allowed contribution and distribution of MARC records through an established national database & non-MARC metadata through harvesters.
  • Incorporated the use of existing records & automated processes
MusicAustralia is in the position to be the framework for any similar project within the music subject area. The work accomplished on this project will benefit other initiatives like the PASH project mentioned in the article. The greatest benefit of this type of work will be seen when tested models can be implemented across subject areas.

Prior to the implementation of this project two separate institutions held similar music materials, but with different cataloging practices and access. A web-portal that brings together resources held by different institutions will be of great use.

Many institutions have collaborated to form our current cataloging practices and resource discovery. Although there are obvious limitations and recurring problems with the current system, there is a consistency in the way libraries present their resources. A MARC cataloging department does not need to research which metadata schema to use or how to display the information online. Digital projects must consider what type of metadata, what type of online delivery, and the entire workflow. This must consume a great deal of resources to develop a workflow for each project. Endeavors like MusicAustralia are needed to develop standards for the field.

Summary reaction to: Finding a Catalog: Generating Analytical Catalog Records from Well Structured Digital Texts.

Citation:

David Mimno, Alison Jones, and Gregory Crane. (2005) "Finding a Catalog: Generating Analytical Catalog Records from Well Structured Digital Texts." Proceedings of the 2005 Joint Conference on Digital Libraries, Denver, CO, June 7-11, 2005.

Providing analytical description is usually only a luxury few libraries can afford. Using well structured xml full text documents to provide analyticals would solve the cost issues and provide greater access. Even with the prospect of the ability to provide so much more access with little or no extra cost, I'm not sure if that is the best method for every item in a collection.

The collection, described in this article totaled 55 million words and was cataloged with 60,000 records, on average there is one record for every 900 words. Providing this level of access to a particular collection can be very helpful, but I question the balance if digitized collections were described in this way and then integrated into the larger catalog. I wonder if there will be a negative impact on searching for items with few access points. This level of access retrospectively added to catalogs would increase the number of records exponentially; will such growth negatively affect our current OPACs?

I think there is great potential in xml to help the library community generate cataloging records in a more efficient manner. I am struggling with the idea that extracting data from structured xml files and creating metadata records is cataloging. The article describes that someone provides TEI tags for an OCR and then a fairly standard server processes the job. This process is not cataloging, neither is the automated generation of subject headings. The article stated that 40,000 records had between 0-5 subject headings, I wonder how many had 0? Also 8,000 records had over 30 subject headings. I am curious to know if this process resulted in a system that would provide better results than simple keyword access?

Summary reaction to: Archiving Web Sites for Preservation and Access: MODS, METS and MINERVA.

Citation:
Guenther, Rebecca and Leslie Myrick. (2006) "Archiving Web Sites for Preservation and Access: MODS, METS and MINERVA." Journal of Archival Organization 4, No. 1/2: 145 - 170.

Creating subject specific web archives could prove useful for future researchers. I doubt that the library and archive community has the means to archive everything of scholarly interest on the web, but we should try to select major topics for which archives should be established. The LC decision to archive 2000 and 2002 election websites will provide a unique insight into how the web has changed politics. Archiving websites for selected topics will raise all of the concerns regarding selection, including balanced coverage, bias, and censorship. Web archiving will be another form of collection development.

Deriving most of the metadata from the existing website will help keep web archiving projects viable, and the selective intervention of catalogers will provide enhanced access and add value to the collections. The use of METS to document the structure of the website will serve as a key component in the preservation of the web resources. Although archiving these materials may serve researchers well into the future, I wonder how we will archive all of the digital material we currently purchase. I'm glad to see work addressing the issue of archiving freely available materials, but I worry about continued access to current purchased electronic materials. Some of the same methods could be applied to subscription materials if libraries and vendors could work out archival agreements.

Sunday, July 6, 2008

Appropriateness of MODS for 3 collections

In your journal, write an analysis of each of the following collections, describing the strengths and weaknesses of MODS for describing items in that collection.

Charles W. Cushman Photograph Collection, Indiana University

This collection consists of digitized slides and notes from the slides and a notebook maintained by the photographer. The collection is presented as a partnership between the digital library and the archives at Indiana University. The archival community typically focuses on collection level access while the digital library community focuses on item level description. Each image is digitized and presented as a single item within a collection, therefore EAD would not be the best choice for describing the items in this collection.

Due to the nature of the item level description a standard like DC or MODS would be more appropriate. Each of the records contains a small number of elements and the collection is browsable by a limited number of indexes. MODS provides a great deal of specificity which is more than is needed for the data provided.

MODS is well suited for providing access to corporate names and personal names, but the rest of the search index listed on the site would be covered adequately by Dublin Core, or a slightly extended version of Dublin Core.

Overall MODS would not be used to the fullest advantage in this collection.


After the Day of Infamy: 'Man on the Street' Interviews Following the Attack on Pearl Harbor, Library of Congress

This collection consists of audio files and transcripts of interviews after the attack on Pearl Harbor. The browse features rely on fields that are accommodated well in MODS and DC. Once a record is viewed it becomes clear that more information is needed than can be provided in simple Dublin Core. This project requires the ability to use role terms for the interviewer, interviewee, and the collector. The site appears not to provide searching on these elements, but these are important to the the description of the materials in the collection.

MODS will provide the granular access to role terms and permit greater search capabilities. However this collection is probably browsed more often than searched. Providing several access points with a record as detailed as a MODS record may become too time consuming. There is a trade off to be made between the number of records that need to be produced and the level of detail. The uses of the collection must be considered and basic functional requirements must be determined and met through an appropriate metadata description practice.


Jerome Hill Papers, Minnesota Historical Society

This site is an elaborate finding aid for the papers of Jerome Hill. The website does not appear to be searchable through a site search engine. The user is guided to browse features based on chronology of his life and his role as a composer. While browsing through the various sections a link listing records related to the specific topic was found. Brief item level descriptions are included as well as the occasional link to a digitized image.

MODS would be too robust to apply to all of the items in this collection. EAD was likely used to describe the various elements of this collection. EAD provides the archivist with the framework to describe the collection down to the lowest level of hierarchy. MODS cannot handle the description of various levels of hierarchy in one record. There are conventions to identify related items, but this is insufficient for use in this application.