Digital Asset Management and Publication with LadyBird - PowerPoint PPT Presentation

Loading...

PPT – Digital Asset Management and Publication with LadyBird PowerPoint presentation | free to download - id: 722201-MjU0O



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Digital Asset Management and Publication with LadyBird

Description:

Eric James programmer/analyst library IT Yale University Library eric.james_at_yale.edu 12 July 2013 Example - simple content model require – PowerPoint PPT presentation

Number of Views:9
Avg rating:3.0/5.0
Slides: 52
Provided by: Patri322
Learn more at: http://or2013.net
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Digital Asset Management and Publication with LadyBird


1
Digital Asset Management and Publication with
LadyBird
  • Eric James
  • programmer/analyst library IT
  • Yale University Library
  • eric.james_at_yale.edu
  • 12 July 2013

2
What is LadyBird?
  • Bebop song by Tadd Dameron
  • First Lady, Lyndon B. Johnson presidency
  • Old dog from King of the Hill
  • Digital asset management tool

3
LadyBird - Digital Asset Management Tool
  • LadyBird from its origin is a system which
    processes metadata and temporarily houses digital
    assets to be published. It provides a
    configurable system for migrating digital objects
    and collections, normalizing metadata, and
    preserving and publishing content.
  • It was initially writing in Microsoft .Net and
    C, hosted on Windows 2008 using Microsoft SQL
    Server 2008.
  • Some work on java modules (for import)
  • Wish list To migrate to Jruby/rails.

4
LadyBird components
  • Web interface
  • Job processing engine - imports
  • Export processing engine exports
  • Bag creation
  • Heartbeat monitor
  • Application cleanup system
  • This presentation will focus on the workflow and
    concepts involved in publication of digital
    objects w/ metadata to fedora

5
LadyBird concepts I
  • Core of the application is the object table
  • Collection departments within the library and
    Yale (later will come into play when discussing
    c tables)
  • Project projects specific to a collection
  • An object belongs to a project and a project
    belongs to a collection
  • Currently 16 collections with 34 projects and
    1.53 million objects
  • We call objects oids, technically oid means
    object id column of the object table but we tend
    to use it to describe the whole ball of wax
  • User table cataloger is registered and roles
    and permissions setting are used throughout the
    app

6
LadyBird concepts II
  • Processing objects is all about the spreadsheet
  • Each row is an object
  • Each column represents either functions or
    metadata
  • Functions ex F1 is the object as identified
    by oid(primary key of object table), if left
    blank that is signal to create a new oid
  • F4 parent oid (for complex objects)
  • F40 can have a value PUBLISH telling ladybird
    to auto publish this object
  • Metadata ex FDID58 call number,FDID262
    Host,creator,etc.
  • The cataloger can take advantage of excel
    functionality (like repeating fields) to quickly
    create a spreadsheet for batch import,

7
LadyBird concepts III
  • field_definition (fdid) table (230 metadata
    fields)
  • 51 Cataloger
  • 52 Record source
  • 53 Record date
  • 54 Record modified date
  • 55 Record ID
  • 56 Local record ID
  • 57 Local record ID, other
  • 58 Call number
  • 59 Accession number
  • 60 Box
  • The values are either strings or acid values
    (more on acids later)

8
LadyBird concepts IV
  • Import tables all about the spreadsheets,
    though you can import MARC or EAD records by
    bibid, barcode, handle too, in that case the
    records are deserialized into fdids, and any
    spreadsheet data overrides the records
  • im_job (1 master row for spreadsheet)
  • Im_job_exHead (column headers from spreadsheet)
  • im_job_contents (values)
  • Im_files(for files)
  • import_checksum (for files)
  • im_job_contents_history
  • Job tracking (overall tracking associates a oid
    imported to a specific job)
  • trk_project
  • trk_job
  • trk_job_contents
  • trk_oid

9
LadyBird concepts V
  • The C tables c for current, for each
    collection
  • The Metadata home - data imported to the im
    tables finally transferred here
  • There is a set of tables for each collection.
  • Ex 13 (collectionHydra, project Hydra
    Test)
  • c13 master list of oids
  • c13_strings
  • c13_longstrings
  • c13_acid
  • Each row contains basically a oid/fdid/value,
    thus given an oid one could get all metadata
    fields for that object as rows from this table.
    It also has a favid for additional values
    associated with the fdid.
  • There also corresponding p tables, p for past
    that keep a audit trail of any updates to
    specific oids.
  • Ctable designed for high volumeExploring better
    options, hashing

10
LadyBird concepts VI
  • Acid authority control a system for using
    controlled vocabulary for metadata fields
  • Fdid 62 Host, Creator
  • Acid fdid value
  • 126434 62 Luhan, Mabel Dodge, 1879-1962
  • 126626 62 Dobbs, Arthur, 1689-1765
  • 126628 62 Filson, John, ca. 1747-1788
  • 126630 62 Thomson, Charles, 1729-1824
  • 126632 62 Hutchins, Thomas, 1730-1789
  • 126635 62 Adair, James, ca. 1709-1783
  • So If for an oid row in the spreadsheet the fdid
    62 column was given the value 126635, that field
    would resolve to Adair, James, ca. 1709
  • Currently 155,415 values.
  • Potential for more sophisticates uses with linked
    data.

11
LadyBird sample workflow start
  • Workstation mounted with a job folder for both
    import and export
  • Windows\\birdcage.library.yale.edu\project25\imp
    ort\
  • Mac
  • SMB//birdcage.library.yale.edu/project25//impor
    t//
  • Windows\\birdcage.library.yale.edu\project25\exp
    ort\
  • MacSMB//birdcage.library.yale.edu/project25//ex
    port//
  • Project25 corresponds to the project table
  • Create a folder in the import directory and drag
    files into folders or subfolders
  • LadyBird will now have detected that folder and
    have created a job for this under the Dashboard
    menu selection

12
LadyBird dashboard
13
add digital object to folder
14
Got to dashboard and process this folder
15
Receive email confirmation
  • Subject LadyBird Import Complete job
    test_open_rep
  • Your import has been processed.test_open_repVisi
    t your dashboard in Ladybird for your most recent
    jobs.http//ladybird.library.yale.edu/user_jobs.a
    spxView job http//ladybird.library.yale.edu/us
    er_jobs.aspx?qaqueryqid12307
  • A jobcomplete.txt file with the time is added
    to import folder so app know that directory is
    complete

16
View job
17
View set
18
New object-gtMetadata (form)
19
Or From View Set, Export as Job
20
Receive export email confirmation
  • Subject LadyBird Export Ready
  • Your export is ready. \\birdcage\project25\export
    \ermadmix_46371_06262013_165116.xls

21
Spreadsheet fill in and save as tab-delimited
text file
22
Import
23
Import Email Confirmation
  • Subject LadyBird Import Complete job
    ermadmix_import_062613_171134
  • Your import has been processed.ermadmix_import_06
    2613_171134Visit your dashboard in Ladybird for
    your most recent jobs.http//ladybird.library.yal
    e.edu/user_jobs.aspxView job
    http//ladybird.library.yale.edu/user_jobs.aspx?qa
    queryqid12313

24
Publish
  • Publishes automatically if F40publish
  • Or can use interface to check file and metadata
    and explicitly click the publish button

25
Publish (behind the scenes)
  • Oid is added to the hydra table with date (when
    added) and date published (when processing
    complete) timestamps
  • Id oid date
    date published

  • 39176 10684347 2013-06-26 160111.043 2013-06-26
    171405.900
  • 39177 10684348 2013-06-26 160111.043 2013-06-26
    171407.457
  • 39178 10684349 2013-06-26 160111.043 2013-06-26
    171409.017
  • 39179 10684350 2013-06-26 160111.043 2013-06-26
    171410.577
  • 39180 10684351 2013-06-26 160111.043 2013-06-26
    171412.137
  • 39181 10684352 2013-06-26 160111.043 2013-06-26
    171413.697


26
oid added to hydra_publish table
  • Key fields
  • hpid 23703
  • hcmid 2
  • cid9
  • Pid 27
  • Oid 10681633
  • _oid 0
  • zindex 0
  • hydraID null
  • dateReady 2013-06-26 160155.430
  • dateHydraStart null

27
Rows for oid added to hydra_publish_path table
  • Key fields w/ example
  • hppid 139004
  • Hpid 26340
  • Type jp2
  • pathHTTP http//lbfiles.library.yale.edu/10684274
    .jp2
  • pathUNC \\storage.yale.edu\home\ladybird-801001-y
    ul\ladybird\project27\publish\dl\10684274\1758.02.
    00.00_page1.jp2
  • Md5 35433b00ca9de2cdaed275c455339090
  • controlGroup M
  • mimeType image/jp2
  • Dsid jp2
  • ingestMethod filepath
  • oidPointer null

28
Hydra_publish_path typical files
  • xml rights (hydra rights)
  • Xml metadata (MODS descMetadata)
  • Xml access (home grown granular rights)
  • pdf (transcript YIPP)
  • pdf2 (annotated transcript YIPP)
  • jp2 (derivative)
  • jpg (derivatives)
  • tif (master)

29
descMetadata - creation
  • There is a service (c class and methods) that is
    called upon hydra publish that iterates through
    all the fdids for an oid and uses the XML DOM to
    create a MODS file. This is basically a mapping
    of field definitions to the MODS schema.
  • There is the potential to map the fdids to any
    metadata format.

30
accessMetadata
31
Rights metadata
32
Transition in fedora hydra world
  • select from hydra_content_model
  • id date
    uid contentModel
  • 1 2013-04-25 085020.043 1 simple
  • 2 2013-04-25 085026.350 1 complexParent
  • 2013-04-25 085030.420 1 complexChild
  • ContentModel maps to ActiveFedora model

33
Transition into fedora hydra world II
  • select from hydra_content_model_ds
  • id date
    uid hcmid dsid
    ingMethod required
  • 1 2013-04-25 085611.670 1 1 accessMetadata pullH
    TTP y
  • 2 2013-04-25 085611.670 1 1 descMetadata pullHTT
    P y
  • 3 2013-04-25 085611.670 1 1 rightsMetadata pullH
    TTP y
  • 4 2013-04-25 085611.670 1 1 tif
    filepath y
  • 5 2013-04-25 085611.670 1 1 jp2
    filepath y
  • 6 2013-04-25 085611.670 1 1 jpg
    filepath y
  • 7 2013-04-25 085611.670 1 2 accessMetadata pullH
    TTP y
  • 8 2013-04-25 085611.670 1 2 descMetadata pullHTT
    P y
  • 9 2013-04-25 085611.670 1 2 rightsMetadata pullH
    TTP y
  • 10 2013-04-25 085611.670 1 2 tif
    filepath n
  • 11 2013-04-25 085611.673 1 2 jp2
    filepath n
  • 12 2013-04-25 085611.673 1 2 jpg
    filepath n
  • 13 2013-04-25 085611.673 1 3 accessMetadata pull
    HTTP y
  • 14 2013-04-25 085611.673 1 3 descMetadata pullHT
    TP y
  • 15 2013-04-25 085611.673 1 3 rightsMetadata pull
    HTTP y
  • 16 2013-04-25 085611.673 1 3 tif
    filepath y

34
Example - simple content model
  • require "active-fedora"
  • class Simple lt ActiveFedoraBase
  •   belongs_to collection, propertygt
    is_member_of
  •   
  •   has_metadata name gt 'descMetadata', type gt
    HydraDatastreamSimpleMods
  •   has_metadata name gt 'accessMetadata', type
    gt HydraDatastreamAccessConditions
  •   has_metadata name gt 'rightsMetadata', type
    gt HydraDatastreamRights
  •   has_metadata name gt 'propertyMetadata', type
    gt HydraDatastreamProperties
  •   
  •   delegate oid, togt"propertyMetadata",
    uniquegttrue
  •   delegate projid, togt"propertyMetadata",
    uniquegttrue
  •   delegate cid, togt"propertyMetadata",
    uniquegttrue
  •   delegate zindex, togt"propertyMetadata",
    uniquegttrue
  •   delegate parentoid, togt"propertyMetadata",
    uniquegttrue
  • end

35
Example Properties Datastream
  • require 'active_fedora'
  •  
  • module Hydra
  •   module Datastream
  •     class Properties lt ActiveFedoraOmDatastream
  • ERJ note ladybird pid projid, ladybird _oid
    parentoid
  •       set_terminology do t
  •         t.root(pathgt"root")
  • t.oid(pathgt"oid")
  • t.cid(pathgt"cid")
  • t.projid(pathgt"projid")
  • t.zindex(pathgt"zindex")
  • t.parentoid(pathgt"parentoid")
  • t.ztotal(pathgt"ztotal")
  • t.oidpointer(pathgt"oidpointer")
  • end

36
Workflow review
  1. Add folder with files to import folder
  2. Process folder. This will create the records in
    the database (oids, job tracking,c instances,
    and file derivatives)
  3. Export spreadsheet. This will create a
    spreadsheet template for the folder of files in
    (1)
  4. Fill in metadata in spreadsheet the main
    cataloging task.
  5. Import spreadsheet. This will ultimately
    populate the c with metadata from the oid rows
    of the spreadsheet.
  6. Publish to hydra. This will create the hydra
    tables with serialized metadata files(MODS,
    access rights), and stage files in storage for
    ingest.

37
Ingest task
  • Set up within a hydra project
  • gem tiny_tds connect to the ladybird SQL Server
    database

38
app/models (objects)
  • collection.rb maps to pid (project) in
    ladybird, parent to simple.rb and
    complex_parent.rb
  • simple.rb 1 image w/derivatives, no hierarchy
  • complex_parent.rb parent to a set of images
    (like a book or image set)
  • complex_child.rb 1 image w/derivatives (like a
    page
  • These relate to the hydra_content_model table

39
app/model (datastreams)
  • coll_properties.rb
  • properties.rb
  • rights.rb
  • access_conditions.rb
  • simple_mods.rb

40
simple_mods.rb - indexing
41
rake yulhy4ingest I
  • Properties
  • SQL server connection config
  • Mount of ladybird storage
  • Uses the hydra_publish table as a queue (driven
    by this query until done)
  • select top 1 a.hpid,a.oid,a.cid,a.pid,b.contentMod
    el,a._oid from dbo.hydra_publish a,
    dbo.hydra_content_model b where a.dateHydraStart
    is null and a.dateReady is not null and a._oid0
    and a.hcmid is not null and a.hcmidb.hcmid and
    a.action'insert' order by a.dateReady")
  •     

42
rake yulhy4ingest II
  • ActiveFedora ingest
  • Create new object based on content model
  • obj Simple.new
  • obj ComplexParent.new
  • obj ComplexChild.new

43
Rake yulhy4ingest III
  • Iterate through all datastreams for the content
    model
  • select hcmds.dsid as dsid,hcmds.ingestMethod as
    ingestMethod, hcmds.required as required from
    dbo.hydra_content_model hcm, dbo.hydra_content_mod
    el_ds hcmds where hcm.contentModel
    'contentModel' and hcm.hcmid hcmds.hcmid/)
  • For each in above query get the datastream info
    for the oid
  • select type,pathHTTP,pathUNC,md5,controlGroup,mime
    Type,dsid,OIDpointer from dbo.hydra_publish_path
    where hpidi"hpid" and dsid'dsid'/)
  • Verify checksums and use activeFedora to ingest
    datastreams

44
rake yulhy4ingest IV
  • Add ladybird specific info to properties
    datastream
  • oid
  • cid
  • pid
  • zindex
  • _oid
  • Add hierarchical info to RELS-EXT
  • Simple and complex_parent is_member_of a
    collection
  • Complex_child is member of a complex_parent
  • Some discussion about adding more linked data.

45
Rake yulhy4ingest V
46
Rake yulhy4ingest VI
47
Blacklight
48
review
49
future
  • Hydra_publish revise already ingested content
  • actionupdate
  • actioninsert
  • Archivematica (by artefactual)
  • Replace the ingest task with a custom workflow
  • GUI interface
  • Human decision points and manual processing
  • Technical metadata generation (FITS)
  • Provenance (jhove)
  • Issues how to employ OAI packages (SIP,AIP,DIP)
    for objects without a natural package structure?

50
Contributors
  • Eric James
  • Lakeisha Robinson
  • Kalee Sprague
  • Osman Din
  • Jay Terray
  • Rebekeh Irwin
  • Mike Friscia

51
Thank you
About PowerShow.com