Globally Unique Identifiers and Life Science Identifiers - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Globally Unique Identifiers and Life Science Identifiers

Description:

BioMOBY an biological database interoperability program (biomoby.org) ... A collection's database record of the specimen? What about multiple labels? ... – PowerPoint PPT presentation

Number of Views:38
Avg rating:3.0/5.0
Slides: 25
Provided by: Dave1354
Category:

less

Transcript and Presenter's Notes

Title: Globally Unique Identifiers and Life Science Identifiers


1
Globally Unique IdentifiersandLife Science
Identifiers
  • Dave Thau
  • thau_at_learningsite.com
  • University of Kansas
  • California Academy of Sciences
  • www.learningsite.com

2
Outline
  • Describe Global Unique Identifiers
  • Show how theyre relevant
  • Describe one GUID system (LSIDs)
  • Outline some issues around using GUIDs for
    TDWG-related activities
  • Provide some resources
  • Open discussion

3
GUID Is Not An Ugly Word
It s guid to be merry and wise, It s guid to
be honest and true,        Robert Burns Heres a
Health to Them that s Awa.
Pteroptochos tarnii AKA Guidguid
Image From animaldiversity.ummz.umich.edu
4
GUID Globally Unique Identifier
  • A short name for a complex entity
  • Useful for locating information about the entity
  • Each name identifies only one entity
  • There is some sense of permanence

5
Some things which fit this description
  • GenBank accession numbers AP006480.1
  • US Patent numbers 5443036 (laser guided cat
    exercise)
  • Digital Object Identifier 10.121/3212

6
In Our Domain
SDD Document Representing some data
set. ltClassName id"1"gt ltLabelgt
ltRepresentation language"en"gt  
ltTextgtCypselurus heterurus (Rafinesque,
1810)lt/Textgt   lt/Representationgt   lt/Labelgt
ltLinkgt   ltLSIDgtlsid.gbif.netwww.fishbase.org10
29lt/LSIDgt   lt/Linkgt   ltRankgtsplt/Rankgt
lt/ClassNamegt
Napier Schema Document Representing some
taxon. ltTaxonConcept idurnlsidbioguid.orgsee
k121212
type"original"gt
ltName type"scientific"gt   ltNameSimplegtCanis
lupuslt/NameSimplegt lt/Namegt
ltRelationshipsgt ltRelationship typeis child
of"gt   ltToTaxonConcept refurnlsidbioguid.o
rgseek5743" /gt lt/Relationshipgt
lt/Relationshipsgt lt/TaxonConceptgt
7
Features of a GUID system
  • Global uniqueness scoped to Internet
  • Should be easily resolvable by a computer or
    human
  • Should identify things down to whatever level of
    granularity necessary
  • Should not be limited to proprietary systems
  • Should serve up all sorts of data
  • Database records
  • Text files
  • Images
  • It would be nice if the identifier had associated
    metadata

8
Life Science Identifiers
  • Official standard of the Object Management Group
    (OMG)
  • Support for metadata and authentication
  • Supports multiple protocols (e.g. HTTP, SOAP)
  • Can serve up data in any format
  • Decentralized anyone can issue an LSID
  • LSID code available in Java and Perl.
  • A young standard, but increasingly used.

9
Organizations Using LSIDs
  • National Center for Biotech Information (NCBI)
  • Pubmed
  • Genbank
  • European Bioinformatics Institute (EBI)
  • US Long Term Ecological Research Network (LTER)
  • BioMOBY an biological database interoperability
    program (biomoby.org)
  • Open Bioinformatics Foundation (open-bio.org)
  • myGrid a BioGRID project (mygrid.org.uk)

10
A Small Pause For More Squid Humor
11
LSID Format
urnlsidbioguid.orgseek117866v1
  • urn indicates that this is a URN
  • lsid indicates that its an LSID-type urn
  • bioguid.org the authority who issued the LSID
  • Doesnt have to be a domain name but for now
    probably should be.
  • bioguid.org does not necessarily have the data or
    metadata.
  • There may not even be a machine called
    bioguid.org.
  • seek a name space id internal to that authority
  • The name space is meaningless to systems outside
    that authority.
  • 117866 the local identifier within that
    authority
  • Also internal to the authority
  • v1 an optional version number
  • If no version, no trailing colon either.

12
Data and Metadata
  • An LSID has data
  • Examples
  • The gene sequence in GenBank
  • The actual LTER data set, maybe in excel, or in a
    text file
  • The data should never change
  • An LSID also has metadata
  • Example metadata
  • The format of the data
  • A display title for clients displaying the LSID
  • Dublin core metadata
  • Anything you want
  • The metadata can change

13
Example LSIDs
  • An LTER fish abundance data set
  • urnlsidlimnology.wisc.edudatasetntlfi02
  • A PubMed reference
  • urnlsidncbi.nlm.nih.gov.lsid.biopathways.orgpub
    med12441808
  • A GenBank sequence
  • urnlsidncbi.nlm.nih.gov.lsid.biopathways.orggen
    bank_gi30350027

14
How LSIDs work
LSID Client Maybe Launchpad Maybe Haystack Maybe
BioFerret Maybe myGRID Maybe Yours!
DNS Find DNS record Resolve it to get Address of
Authority
  • Find the authority for this LSID

Returns the LSID Authority Server
LSID Authority
2. Query authority for available services
Returns WSDL for this LSID
3. Chose a service, get the goods
Data Store
Metadata Store
HTTP, SOAP, FTP, others
15
LSID Promises
  • I promise to never change the data behind an
    LSID.
  • I will make sure my LSIDs are being served, or
    give them to someone who can do it.
  • I will give my LSIDs metadata at least give
    them a title and a format

16
Other GUID systems
  • URLs
  • Files move
  • The data change
  • Unstructured metadata
  • UUIDs 128 bit string, guaranteed unique
  • 58f202ac-22cf-11d1-b12d-002035b29092
  • No resolution
  • No metadata
  • Handle System / DOIs (10.12/2312)
  • Non standard protocol
  • Centralized resolution
  • Unstructured metadata (for Handle System)
  • High costs (for DOI)

17
Issues For This Community
  • What gets a GUID?
  • For each of those things, whats the data, whats
    the metadata?
  • One GUID per item?
  • Centralization who issues GUIDs?

18
What Gets a GUID?
  • These things probably should get GUIDs
  • Taxonomic concepts
  • Specimens
  • Publications
  • People
  • These things might get GUIDs
  • Taxonomic names
  • Journals
  • Data providers
  • Observations

19
Specimen Data? Metadata?
  • If specimens get a GUID what does it identify?
  • The physical specimen?
  • A collections database record of the specimen?
  • What about multiple labels?
  • Main question what doesnt change about a
    specimen?
  • Other main question how should the data be
    represented?
  • Darwin core includes current institution
    location. Not a good idea for the data of a GUID
    since that may change.

20
One GUID Per Item?
  • No GUID system inherently enforces a 11 mapping
    between GUID and data.
  • Everyone should TRY to limit the number of GUIDs
    per item.
  • Should there be any centralization to help
    achieve this?

21
Degrees of Centralization
  • An index
  • List your GUID authority in an index so your
    GUIDs are easy to find.
  • A central authority
  • One authority could be responsible for issuing
    GUIDs to the community for specific types of
    information youd have to get one from here.
  • GBIF?
  • The IC_Ns? (ICZN, ICBN.)
  • lsidauthority.org?
  • This would help enforce a 11 mapping of GUIDs
    and data items
  • It would also alleviate data providers from the
    need to maintain their own authorities
  • It MAY also reduce the likelihood of GUIDs
    becoming unresolvable
  • It may also be infeasible technically, or
    socially.
  • A respected authority
  • With LSIDs, an authority can be set up to serve
    its own GUIDs and proxy other authorities.
  • This would help enforce a 11 mapping for those
    who use the authority
  • It may also be more feasible.

22
LSID Resources
  • LSID Articles and code from IBM
  • http//www-124.ibm.com/developerworks/oss/lsid/wh
    atislsid
  • Current LSID specification
  • http//www.omg.org/cgi-bin/doc?dtc/04-05-01
  • Launchpad An LSID resolver for Windows IE
  • available from first link
  • A website which resolves LSIDs
  • http//lsid.biopathways.org/resolver/
  • URN specification
  • http//www.ietf.org/rfc/rfc2141.txt

23
Acknowledgements
  • My work on GUIDs has been funded by the SEEK
    project seek.ecoinformatics.org.
  • SEEK is funded by National Science Foundation
    award 0225676.
  • Thanks to Ben Szekely at IBM for his LSID
    articles, his LSID java code, and for answering
    all my questions.

24
Questions for Discussion
  • Do we need GUIDs?
  • What gets a GUID?
  • One GUID per item?
  • Centralization?
Write a Comment
User Comments (0)
About PowerShow.com