Metadata Quality Assurance in the DLESE Community Collection - PowerPoint PPT Presentation

1 / 12
About This Presentation
Title:

Metadata Quality Assurance in the DLESE Community Collection

Description:

1% of DCC collection is 'broken' at any given time. 12. 12. Ongoing development ... multiple frameworks (ADN, collection, anno) ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 13
Provided by: bobm180
Category:

less

Transcript and Presenter's Notes

Title: Metadata Quality Assurance in the DLESE Community Collection


1
Metadata Quality Assurance in the DLESE Community
Collection
2
DLESE Community Collection
  • Initial DLESE collection, continues to grow
  • Approx 4200 items
  • Public cataloging tool but majority of items from
    known sources and funded catalogers

3
Data estimated as of June 2003, little change
over last year
4
Quality Assurance Measures
Four stages
  • Catalog system provides feedback on duplicate and
    similar entires
  • Every record is reviewed by a person for metadata
    completeness and quality
  • Additional technical checks for vocabulary and
    required metadata completeness
  • Regular, periodic checks for URL viability,
    syntax and duplication/mirrors

5
DLESE Catalog System
  • Disallows exact duplicate URLs
  • Provides list of similar URLs in all stages of
    submission for decision to catalog or not
  • Discourage overlapping records

6
Human-mediated checks -1
  • URL functional
  • Appropriate URL is cataloged (granularity and
    duplication)
  • Written description aligns with content at site
  • Complete sentences, spelling
  • Avoid repeating redundant information (-
    (technical info, creator)

7
Human-mediated checks - 2
  • Required metadata is present review resource and
    add or amend to follow best practices
  • Controlled vocabularies properly assigned-
    resource type, technical
  • Suggested metadata reviewed for accuracy, if
    present
  • Keywords
  • Relation
  • Coverage
  • Standards

8
Pre-accessioning technical checks
  • URL viability checked
  • Check for missing required metadata and proper
    vocabularies.
  • Coverage errors are flagged, though some require
    a move to special directory for edit and
    subsequent accessioning (crossing the date line)
  • Upon accessioning, additional check for duplicate
    ID numbers and duplicate resource content

9
Post-accessioning, ongoing checks
  • Linkchecking 2x a day, reports issued twice a
    week or on demand
  • Provides report on resource and relation URLs,
    indicating error type
  • Vitality over time (too low is lt50 available
    over 6 previous days)
  • Duplication of URL or content (catches mirrors)
    and mirror URL differs from primary URL alerts
  • Email syntax

10
Actions taken
  • Email syntax and permanent redirects fixed
  • Duplications investigated
  • Vitality too low group receive further
    investigation to repair

11
Vitality too low broken link
  • First try to sleuth out new URL and fix it
  • If unsuccessful, send email to creator/contact
    inquiring about status
  • If creator replies, fix as indicated
  • If no reply, remove from discovery but dont
    delete
  • lt1 of DCC collection is broken at any given
    time

12
Ongoing development
  • New DCS will support
  • multiple frameworks (ADN, collection, anno)
  • more front-end quality controls spell check,
    completeness notification during cataloging
  • Suggest-a-URL to replace full public cataloging
  • Ongoing cataloging training and discussion with
    regular catalogers
Write a Comment
User Comments (0)
About PowerShow.com