Sunday, July 20, 2008

Summary reaction to: Promoting shareability: Metadata activities of the DLF Aquifer initiative

Citation:

Riley, Jenn, John Chapman, Sarah Shreeves, Laura Akerman, and Bill Landis. "Promoting shareability: Metadata activities of the DLF Aquifer initiative." Draft article under review.

Over the past year I have taken several opportunities to learn about metadata and to improve my understanding of the issues regarding new metadata schemas, particularly those expressed in xml. Interoperability is often said to be the key reason why non-MARC metadata is important. Despite the potential of sharability because of xml, few metadata schemas are functionally interoperable. I was very surprised to learn that most schemas cannot be easily converted to other formats with ease or be shared. Much of the blame lies in unestablished best practices and lack of coordination between units who wish to share data. Initiatives like DLF Aquifer are needed to realize the full potential of sharable metadata.

The first step to shareable metadata is agreeing on a set of guidelines and following the rules during metadata creation. The inclusion of “not recommended” classification will be especially helpful. I have observed that some catalogers will supply absolutely as much information as possible, even if the added details provide little or no benefit for the user. Frankly stating that providing certain data is “not recommended” will help catalogers provide consistent appropriate data. Additionally the guidelines provide useful information about content vs. the carrier and help to disambiguate elements which have historically problematic in MODS

Creating a set of guidelines is the first step but exposing metadata records in a live portal environment is the best way to test principles and best practices. The ASHO collection should provide valuable information and aid in determining best practices for sharing metadata, especially MODS. I agree with the article that exchanging data in simple Dublin Core won’t meet the needs of our user populations and would like to add that the library profession requires better solutions for providing access and sharing records through harvesting. I am interested to see how harvesting MODS using XQuery technology will enhance our ability to share robust metadata records.

Summary reaction to: Reengineering a national resource discovery service: MODS down under

Citation:

Missingham, Roxanne. (2004). Reengineering a national resource discovery service: MODS down under. D-Lib Magazine, 10(9)

This article demonstrates the flexibility and utility of MODS as an intermediary conversion schema. The National Library of Australia is using the full benefits of xml and new technology to convert records, contribute and share information to provide access to library materials. The authors describe a creative way that MODS was used to convert AGLS, a DC-like homegrown metadata schema, to the final product of a MARC record that could be integrated into a national database. MODS served as the middle ground between the two schemas. The conversion worked well because AGLS schema was able to be mapped to MODS, then enhanced, and finally the MODS was converted to MARC.

The human effort to set up conversion profiles seems to be a very labor intensive project. The authors mention that several batches of test records were needed for various formats. Despite the initial output of time and money, a sustainable workflow that can automate much of the record conversion is a worthwhile endeavor, especially if access to materials is enhanced across the national library and its patrons.

Sunday, July 13, 2008

Describe your hypothetical collection

The Hope library has received a large donation of bicycling related materials from the personal collection of Dr. Shimano, a retired faculty member of the sports science division. Not only has she donated materials including videos, books, manuals, and several cycling related memorabilia, she is donating her time as a subject specialist and 10,000 dollars for cataloging & curation of the collection. Dr. Shimano has always enjoyed the excellent services provided by the library during her career and would like to give back to the library as she enters retirement. Additionally she looks forward to working with the library 20 hours a week in order to keep her mind sharp. She is very enthusiastic and the library feels she will be able to contribute immensely to the project. We consider this project a priority as a way to show our gratitude for her involvement.The collection consists of the following:
Books: 2,000
Videos: 500 (Mostly amateur video of cycling races & training sessions)
Manuals: 350
Other Memorabilia: 300

The bulk of the collection is monogrpahs and will be described using MARC and AACR2. The remainder of the collection will be described with MODS. We have choosen MODS because of the high number of items which are not well suited for traditional cataloging. We intend to create a user interface for Dr. Shimano to aid in the cataloging of the collection. We feel that MODS is best suited for someone with subject expertise and little cataloging experience. A trained cataloger will review her work and supply any additional information.

Dr. Shimano's collection consists of nearly 2,000 books, of which around 95% have acceptable copy in the OCLC database. Since the donor of the collection does not wish or require the library to maintain a separate physical collection for the monographs, all books will be integrated into the main collection. Bookplates will be placed in the materials and donor notes will be added to the MARC records. The other five percent will need original cataloging, and the library intends to perform NACO work as needed on the name headings.

Most of the collection is in excellent condition and very few items will require preservation treatment. We felt that some items should be reconsidered, such as the inclusion of owners manuals for racing bicycles. Dr. Shimano noted that although the library may not consider the owner's manuals important she finds them essential for research. She contends that the manuals provide geometry specfications for bicycle frames which provide rich data for studying ergonomic effects of bicycle frame design on competitive performance. I intend to create a sample MODS record for a represented manual to see how well the schema can be applied to the object.


Summary reaction to: Sound Footings: Building a National Digital Library of Australian Music.

Citation:
Ayres, Marie-Louise, Toby Burrows, and Robyn Holmes. (2004). "Sound Footings: Building a National Digital Library of Australian Music." Research and Advanced Technology for Digital Libraries. Lecture Notes in Computer Science 3232. Berlin/Heidelberg: Springer, 281-291.

Developing a delivery system for discovery of music resources is a very complex task due to the varied formats of materials in typical music collections. Digital music collections have the potential to encompass a variety of audio formats, digital score formats, and text formats. Additionally these objects require a rich metadata schema in order to provide access to the collections. This article discussed some of the key points for consideration during the development and implementation of a digital library. I've listed below some of the key elements of the design plan that I considered vital to the success of the project.
  • MusicAustralia was designed to provide access to Australian music resources. This project was modeled after a similar successful project called PictureAustralia. The developers of MusicAustralia utilized the work of the previous national level project to improve upon and provide access to a more complicated corpus of resources.
  • Sought to provide cost-effective methods. They allowed contribution and distribution of MARC records through an established national database & non-MARC metadata through harvesters.
  • Incorporated the use of existing records & automated processes
MusicAustralia is in the position to be the framework for any similar project within the music subject area. The work accomplished on this project will benefit other initiatives like the PASH project mentioned in the article. The greatest benefit of this type of work will be seen when tested models can be implemented across subject areas.

Prior to the implementation of this project two separate institutions held similar music materials, but with different cataloging practices and access. A web-portal that brings together resources held by different institutions will be of great use.

Many institutions have collaborated to form our current cataloging practices and resource discovery. Although there are obvious limitations and recurring problems with the current system, there is a consistency in the way libraries present their resources. A MARC cataloging department does not need to research which metadata schema to use or how to display the information online. Digital projects must consider what type of metadata, what type of online delivery, and the entire workflow. This must consume a great deal of resources to develop a workflow for each project. Endeavors like MusicAustralia are needed to develop standards for the field.

Summary reaction to: Finding a Catalog: Generating Analytical Catalog Records from Well Structured Digital Texts.

Citation:

David Mimno, Alison Jones, and Gregory Crane. (2005) "Finding a Catalog: Generating Analytical Catalog Records from Well Structured Digital Texts." Proceedings of the 2005 Joint Conference on Digital Libraries, Denver, CO, June 7-11, 2005.

Providing analytical description is usually only a luxury few libraries can afford. Using well structured xml full text documents to provide analyticals would solve the cost issues and provide greater access. Even with the prospect of the ability to provide so much more access with little or no extra cost, I'm not sure if that is the best method for every item in a collection.

The collection, described in this article totaled 55 million words and was cataloged with 60,000 records, on average there is one record for every 900 words. Providing this level of access to a particular collection can be very helpful, but I question the balance if digitized collections were described in this way and then integrated into the larger catalog. I wonder if there will be a negative impact on searching for items with few access points. This level of access retrospectively added to catalogs would increase the number of records exponentially; will such growth negatively affect our current OPACs?

I think there is great potential in xml to help the library community generate cataloging records in a more efficient manner. I am struggling with the idea that extracting data from structured xml files and creating metadata records is cataloging. The article describes that someone provides TEI tags for an OCR and then a fairly standard server processes the job. This process is not cataloging, neither is the automated generation of subject headings. The article stated that 40,000 records had between 0-5 subject headings, I wonder how many had 0? Also 8,000 records had over 30 subject headings. I am curious to know if this process resulted in a system that would provide better results than simple keyword access?

Summary reaction to: Archiving Web Sites for Preservation and Access: MODS, METS and MINERVA.

Citation:
Guenther, Rebecca and Leslie Myrick. (2006) "Archiving Web Sites for Preservation and Access: MODS, METS and MINERVA." Journal of Archival Organization 4, No. 1/2: 145 - 170.

Creating subject specific web archives could prove useful for future researchers. I doubt that the library and archive community has the means to archive everything of scholarly interest on the web, but we should try to select major topics for which archives should be established. The LC decision to archive 2000 and 2002 election websites will provide a unique insight into how the web has changed politics. Archiving websites for selected topics will raise all of the concerns regarding selection, including balanced coverage, bias, and censorship. Web archiving will be another form of collection development.

Deriving most of the metadata from the existing website will help keep web archiving projects viable, and the selective intervention of catalogers will provide enhanced access and add value to the collections. The use of METS to document the structure of the website will serve as a key component in the preservation of the web resources. Although archiving these materials may serve researchers well into the future, I wonder how we will archive all of the digital material we currently purchase. I'm glad to see work addressing the issue of archiving freely available materials, but I worry about continued access to current purchased electronic materials. Some of the same methods could be applied to subscription materials if libraries and vendors could work out archival agreements.

Sunday, July 6, 2008

Appropriateness of MODS for 3 collections

In your journal, write an analysis of each of the following collections, describing the strengths and weaknesses of MODS for describing items in that collection.

Charles W. Cushman Photograph Collection, Indiana University

This collection consists of digitized slides and notes from the slides and a notebook maintained by the photographer. The collection is presented as a partnership between the digital library and the archives at Indiana University. The archival community typically focuses on collection level access while the digital library community focuses on item level description. Each image is digitized and presented as a single item within a collection, therefore EAD would not be the best choice for describing the items in this collection.

Due to the nature of the item level description a standard like DC or MODS would be more appropriate. Each of the records contains a small number of elements and the collection is browsable by a limited number of indexes. MODS provides a great deal of specificity which is more than is needed for the data provided.

MODS is well suited for providing access to corporate names and personal names, but the rest of the search index listed on the site would be covered adequately by Dublin Core, or a slightly extended version of Dublin Core.

Overall MODS would not be used to the fullest advantage in this collection.


After the Day of Infamy: 'Man on the Street' Interviews Following the Attack on Pearl Harbor, Library of Congress

This collection consists of audio files and transcripts of interviews after the attack on Pearl Harbor. The browse features rely on fields that are accommodated well in MODS and DC. Once a record is viewed it becomes clear that more information is needed than can be provided in simple Dublin Core. This project requires the ability to use role terms for the interviewer, interviewee, and the collector. The site appears not to provide searching on these elements, but these are important to the the description of the materials in the collection.

MODS will provide the granular access to role terms and permit greater search capabilities. However this collection is probably browsed more often than searched. Providing several access points with a record as detailed as a MODS record may become too time consuming. There is a trade off to be made between the number of records that need to be produced and the level of detail. The uses of the collection must be considered and basic functional requirements must be determined and met through an appropriate metadata description practice.


Jerome Hill Papers, Minnesota Historical Society

This site is an elaborate finding aid for the papers of Jerome Hill. The website does not appear to be searchable through a site search engine. The user is guided to browse features based on chronology of his life and his role as a composer. While browsing through the various sections a link listing records related to the specific topic was found. Brief item level descriptions are included as well as the occasional link to a digitized image.

MODS would be too robust to apply to all of the items in this collection. EAD was likely used to describe the various elements of this collection. EAD provides the archivist with the framework to describe the collection down to the lowest level of hierarchy. MODS cannot handle the description of various levels of hierarchy in one record. There are conventions to identify related items, but this is insufficient for use in this application.


Standards to use in MODS records

Content Standard:

AACR2 will be the main content standard used in MARC and MODS cataloging. There will be exceptions made and noted in our best practices for formats that are insufficiently covered in AACR2. Cataloging of images and art objects will be a notable exception to the adherence to AACR2. In the case of images for example, titles will be supplied without brackets for untitled images. Most of the deviations from our chosen content standard will be related to the the use of punctuation, and other stylistic conventions. ISBD rules will not be applied at the end of fields, we will rely on style sheets to provide most of the needed punctuation.

Our library cross trains, but we would like to focus most of our MODS records creation in the traditional MARC cataloging units. Because of this choice we feel that using AACR2 as our content standard will reduce our training cost and lead to more consistent records. This consistency in records will allow our MODS and MARC records to integrate seamlessly into our system and work flows. Additionally, since we serve an academic community all of our new records will be expected to provide the same level of access as traditional MARC cataloging.

Controlled Vocabulary:

Since there is a great deal of cross training in the Hope library, a number of options will be available for choice of controlled vocabulary terms. Our subject expertise situated across the units will be used to provide the most useful access points. Controlled vocabularies traditionally used in MARC cataloging will be our primary sources for controlled vocabulary. Specific situations will require the use of additional standards.

Names:
The Library of Congress Name Authority file will provide controlled headings for all names as a primary source. If a controlled version of the name is not found in the NAF, other sources should be consulted. In the case of artist's names the Getty's Union List of Artists' Names (ULAN) would be used.

Topical Terms:

Most topical term access will be provided using Library of Congress Subject Headings (LCSH). The Art and Architecture Thesaurus (AAT) may be used to provide access to images and art works. Subject Terms (TGM I) Will be used for visual resources.


Geographical Places:

Geographic terms will be taken from LCSH.

Genre headings:

The primary source will be the MARC Value List for Genre Terms maintained by the Network Development and MARC Standards Office-Library of Congress. The AAT may be used for art related descriptions. Genre and Physical Characteristic Terms (TGM II) Will be used for Visual resources.

Summary reaction to: The IN Harmony Project: Developing a Flexible Metadata Model for the Description and Discovery of Sheet Music

Citation: Riley, Jenn and Michelle Dalmau (2007). "The IN Harmony Project: Developing a Flexible Metadata Model for the Description and Discovery of Sheet Music." The Electronic Library 25(2): 132-147.

The primary focus for all metadata creation should be providing access to an end user. Sometimes this goal is obscured, especially in traditional MARC cataloging. The IN Harmony project consulted subject and domain specialists, user studies, and evaluated the current research in the music field to inform the creation of a set of standards for implementing MODS. Seeking the opinions of people with varying degrees of music expertise demonstrated that this project was attentive to all users, not a specific set of highly trained music scholars. This work has produced a rich and useful set of guidelines to describe individual titles that can be applied in other sheet music collections.

Although the primary goal of metadata should be to provide access, great care should be taken to create data that can be exchanged between institutions. The guidelines specified only two required elements and only limited a few repeatable elements. A simple, conservative approach to cataloging will enable numerous partners with various cataloging histories to create a standard uniform record. The IN harmony project developed a user interface to guide the entry of information into a database.

Although a user interface is very desirable for data consistency, there is concern among MARC catalogers that this type of system will de-professionalize cataloging. Their concern is valid and must be addressed as the library profession moves toward incorporating other metadata processes.

This article demonstrates the importance of consulting various stake holders while developing a metadata implementation for a digital library project. Most importantly this article highlights the value of user input, which, when one considers the purpose of cataloging, should be a primary informant.

Summary reaction to: MODS meets Manakin: Innovations in the Texas Digital Library's Thesis and Dissertation Collection.

Citation:

Surratt, Brian E. (2006). "MODS meets Manakin: Innovations in the Texas Digital Library's Thesis and Dissertation Collection." 9th International Symposium on Electronic Theses and Dissertations, Quebec City, June 7-10, 2006.

This article address the shortcomings of MARC and DC for the description of electronic thesis and Dissertations ETD. Both standards are currently used in most ETD applications with varying degrees of success, but most notably both are not used consistently to provide ideal access.

The application profile for the Texas Digital Library (TDL) is outlined and the drawbacks are named. For instance, publisher and rights elements were present in ETD-MS, but not defined in MODS. The concept of publisher is not well accepted for electronic dissertations in libraries. The author also discusses other complications in the use of MODS for the ETD community. The inclusion of the comparison table for ETD-MS and TDL MODS is very effective in demonstrating the application profile for the TDL.

Summary reaction to: MODS: Metadata Object Description Schema,

Citation: Gartner, Richard. (October 2003). "MODS: Metadata Object Description Schema, JISC Report TSW 03-06.

The main issue in metadata that needs to be resolved is the balance of rich and precise description and the need for interoperability. As more detailed description is accommodated the ability to map and exchange data becomes compromised. After attending a recent pre-conference meeting at ALA, I have learned that although interoperability is a primary focus of emerging metadata trends, the realization of the idea is far from present in our current information environment

MODS is presented by Gartner as an attempt to "reconcile the conflicting demands of breadth and specificity. MODS can be applied to a wide range of materials, accommodating breadth, and allows elements to be qualified, satisfying the need for specificity. The author provides instructive information about certain elements in MODS and their use. This discourse highlights the advantages of MODS and the ability to control certain elements. The ability to utilize commonly used tools in MARC cataloging makes MODS an attractive option to traditional catalogers.

I believe that it will be difficult for the larger library community to see the benefits of MODS until there are several institutions using the standard. The article mentions the need for a critical mass, and I agree that will be the turning point for the acceptance of MODS.

Summary reaction to: MODS: The Metadata Object Description Schema

Citation: Guenther, Rebecca. (January 2003). "MODS: The Metadata Object Description Schema." portal: Libraries and the Academy 3(1): 137-150. IU Libraries online subscription.

The commentary from the editor describes problems the library community has found with Dublin Core. The lack of granularity desired in the library community is a major concern, but is a founding principle in DC. The developers wanted a metadata framework that would be easy to apply and would not require the skill of an information professional. Even though DC is simple to use, I do agree with the statement that due to the lack of standards and documentation, the application in a controlled library setting could become very expensive and labor extensive during the implementation phase. Utilizing a well documented metadata standard does save the library time and money in terms of documenting and implementation.

Guenther describes the uses and advantages of MODS and places this metadata schema as more detailed and descriptive than Dublin Core, but cheaper and easier to apply than MARC. She refers to MODS as a simpler MARC, which I think simplifies the matter too much. Although MODS was derived from MARC, I considered it more than simple MARC because of the reorganization of elements, and the addition of new elements absent in the MARC formats. The syntax is similar and MARC catalogers will find it familiar, but it’s still a different schema with unique advantages.

The element comparison chart was very useful in understanding how MODS and Dublin Core elements compare. The detailed discussion allows for comparisons between how bibliographic information is typically represented in MARC, DC, and MODS. Each discussion leads to the conclusion that mapping from MARC to MODS retains much more detail than mapping to DC. The ability to qualify elements in MODS provides a robust array of options for recording bibliographic information.

I do question the need for an xml schema MARC holdings information. I have noticed while working with online journals that many providers include summary screens with complete information about the availability of a journal. As more and more information is online, I wonder if maintaining local holdings information in the traditional way will become obsolete.

Overall I think the library community will benefit from having a robust, but easily applied xml based metadata framework.