Metadata Acquisition with XML - PowerPoint PPT Presentation

About This Presentation
Title:

Metadata Acquisition with XML

Description:

Metadata Acquisition with XML. Case studies from the. Swiss Federal Archives ... Person id='9005' vorname='Peter' nachname='Hess' kanton='ZG' ort='Zug' ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 17
Provided by: stephanh1
Learn more at: https://www.erpanet.org
Category:

less

Transcript and Presenter's Notes

Title: Metadata Acquisition with XML


1
Metadata Acquisition with XML
  • Case studies from the
  • Swiss Federal Archives
  • 9. October 2002 / Stephan Heuscher

2
Overview
  • Problems acquiring metadata
  • Why XML?
  • Featured Projects
  • Lessons learned
  • Conclusions

3
Problems acquiring metadata
  • Documentation
  • Data format
  • Data consistency
  • System borders
  • Money
  • Communication with stakeholders

4
Why XML?
  • XML
  • is an open standard
  • is self-explanatory
  • is human-readable
  • can be validated automatically
  • has a broad software support
  • Most products feature XML support

5
Featured Projects
  • SIARD
  • Archiving of relational databases
  • Manual generation of additional metadata
  • Metadata and content is stored in XML files
  • AMDA
  • Manages metadata for audio data from the Swiss
    Parliament
  • Does not manage audio data
  • Import of XML metadata
  • Must provide a variety of export formats

6
SIARD (System Independent Archiving of
Relational Databases)
Oracle
MS-SQL
???-DB
Database regeneration
Data and low-level metadata extraction
Digital Archive
(to be built)
Additional high-level descriptive metadata
7
XML use in SIARD
  • SQL-99 (ISO/IEC 9075)
  • Low-level data description
  • Structure
  • Datatypes
  • Constraints
  • XML
  • High level metadata
  • Table content (thin wrapper)

8
Data Logic (SQL)
CREATE TABLE "FLUGLE"."CLASS" ( "CLASS_ID"
NATIONAL CHARACTER VARYING(20) NOT NULL ,
"SCHEDULE_ID" NATIONAL CHARACTER VARYING(20) ,
"CLASS_BUILDING" NATIONAL CHARACTER VARYING(25) ,
"CLASS_ROOM" NATIONAL CHARACTER VARYING(25) ,
"COURSE_ID" NATIONAL CHARACTER VARYING(5) ,
"DEPARTMENT_ID" NATIONAL CHARACTER VARYING(20) ,
"INSTRUCTOR_ID" NATIONAL CHARACTER VARYING(20) ,
"SEMESTER" NATIONAL CHARACTER VARYING(6) ,
"SCHOOL_YEAR" TIMESTAMP(0) ) CREATE TABLE
"FLUGLE"."CLASS_LOCATION" ( "CLASS_BUILDING"
NATIONAL CHARACTER VARYING(25) NOT NULL ,
"CLASS_ROOM" NATIONAL CHARACTER VARYING(25) NOT
NULL ...
9
SIARD Metadata XML
lt?xml version"1.0" encoding"UTF-8"?gt ltarchivegt
ltdatabase product-name"Oracle"
product-version"Personal Oracle9i Release
9.0.1.1.1 - Production. With the Partitioning
option. JServer Release 9.0.1.1.1 - Production"
table-number"22" view-number"4"
archiv-size"175KB"gt ltschemasgt ltschema
tag-name"FLUGLE" table-number"22"
view-number"4"gt ltstatus sql3"true"
integrity"true" archiv"true" reason"0"
mandatory"true"/gt lttablesgt
lttable tag-name"BACKUP_CLASS" column-number"9"
row-number"10"gt ltstatus sql3"true"
integrity"false" archiv"true" reason"3"
mandatory"true"/gt ltcolumnsgt
ltcolumn tag-name"CLASS_ID"
sql3type"NATIONAL CHARACTER VARYING"
sql3size"(20)" type"VARCHAR2" length"20"
precision"" scale"" nullable"false"
defaultvalue""gt ltstatus
sql3"true" integrity"true" archiv"true"
reason"0" mandatory"true"/gt
lt/columngt ...
10
SIARD Data XML
lt?xml version"1.0" encoding"UTF-16"?gt ltdmp-file
xmlnsxsi"http//www.w3.org/2001/XMLSchema-instan
ce" xsinoNamespaceSchemaLocation"../dmp.xsd"gt
ltschema tag-name"FLUGLE"/gt lttable
tag-name"CLASS"/gt ltcolumn tag-name"CLASS_ID"
sql3type"NATIONAL CHARACTER VARYING"
sql3size"(20)" defaultvalue"" nullable"false"
constraints"PKPK_CLASS"/gt ltcolumn
tag-name"SCHEDULE_ID" sql3type"NATIONAL
CHARACTER VARYING" sql3size"(20)"
defaultvalue"" nullable"true"
constraints"FKFLUGLE.SCHEDULE_TYPE.SCHEDULE_ID"/
gt ... ltdatagt ltrowgt6,1042004,S1809,POCO
HALL3,1503,1985,PHILO4,E4916,SPRING19,1997-0
3-01 000000lt/rowgt ltrowgt6,1045003,T1511,NA
RROW HALL3,2003,1844,HIST4,D9446,SPRING19,19
97-03-01 000000lt/rowgt ...
11
AMDA (Audio MetaData Acquisition)
Access DB
Online parliament session metadata (XML)
Webinterface
Unified XML import
AMDA
Metadata
Digital Archive
(to be built)
12
XML use in AMDA
  • Import
  • XSLT transformation to common format
  • Online metadata
  • Legacy data (Access database)
  • Export
  • Raw XML output transformed using XSLT

13
AMDA Import XML (raw)
lt?xml version"1.0" encoding"iso-8859-1"?gt ltrootgt
ltsession oid"34695" session_id"session_4609"
text_update_time"1002882007656"gt ltmeeting
date"20010917" local_time"1430" location"N"
oid"34696" publish_status"final"gt
ltsubject oid"34697" publish_status"draft"
subject_type"gesch"gt ltgesch_list
oid"34698" publish_status"draft"
transfer_gesch_list"01.9001"gt 01.9001
ltgesch_info oid"000000000"gt
lta99_gesch last_modified"2001/03/05 144342
GMT0100"gt ltgesch_id
raw_id"20019001"gt2001.9001lt/gesch_idgt
lttitle language"d"gt
ltlinegtMitteilungenlt/linegt
ltlinegtdes Präsidentenlt/linegt
lt/titlegt lt/a99_geschgt
lt/gesch_infogt lt/gesch_listgt
ltspeech_text audio_channel"N" audio_end"10007299
95203" audio_start"1000729751250"
speaker_id"9005" turnus_nr"1000"
turnus_oid"155989"gt ltpd_textgt
ltpgtDer Beginn dieser Herbstsession ist
schmerzlich getrübt von unseren Gedanken an das
...
14
AMDA Import XML (transformed)
lt?xml version"1.0" encoding"iso8859-1"?gt ltSessio
n id"4609" start"20010917T14300200"gt
ltGeschaeftegt ltGeschaeft nummer"1998.0446"
themaDeutsch"Parlamentarische InitiativexAHämm
erle Andrea.xAPost, SBB, Swisscom.xAArbeitsp
lätzexAin der ganzen Schweiz"
themaFranzoesisch"Initiative parlementairexAHä
mmerle Andrea.xAPoste, CFF, Swisscom.xADes
emploisxAdans toute la Suisse" /gt
ltGeschaeft nummer"2001.9001" themaDeutsch"Mittei
lungenxAdes Präsidenten" themaFranzoesisch"Com
municationsxAdu président" /gt ...
lt/Geschaeftegt ltVerhandlungengt ltVerhandlung
geschaeftNummern"2001.9001" rat"V"
start"1000729751" dauer"244" bulletin""
bulletinSeiten"825"gt ltVotum
start"1000729751" dauer"20" sprache"de"gt
ltPerson id"9005" vorname"Peter"
nachname"Hess" kanton"ZG" ort"Zug" /gt
ltVotumTextgtDer Beginn dieser Herbstsession ist
schmerzlich getrübt von unseren Gedanken
...
15
Lessons learned
  • Transforming and reformatting of XML data is easy
  • Documentation and data integrity are crucial
  • Agree on rules and standards for XML formats
    early
  • Stakeholders uses of XML differ greatly

16
Conclusions
  • XML
  • is not a preservation strategy
  • is only a technology
  • is too new for a common understanding
  • XML provides tools and techniques for a concise
    metadata management
  • Working solutions need both XML and non-XML
    experience
  • Most problems are still of human nature
Write a Comment
User Comments (0)
About PowerShow.com