Information Architecture of the Global Biodiversity Information Facility - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Information Architecture of the Global Biodiversity Information Facility

Description:

Biodiversity informatics is new and really first became feasible with the web ... Commonwealth Agricultural Bureau (CABI) EASIANET. Expert Centre for Taxonomic ... – PowerPoint PPT presentation

Number of Views:128
Avg rating:3.0/5.0
Slides: 43
Provided by: hannusa6
Category:

less

Transcript and Presenter's Notes

Title: Information Architecture of the Global Biodiversity Information Facility


1
  • Information Architecture of the Global
    Biodiversity Information Facility
  • Open Forum for Metadata Registries
  • Santa Fe, New Mexico
  • 20-24 January 2003
  • Hannu Saarenmaa
  • hsaarenmaa_at_gbif.org
  • www.gbif.net

2
Outline of presentation
  • What is biodiversity informatics
  • What is GBIF
  • Architectural principles
  • Central registries
  • Data nodes
  • Participant nodes
  • Role of portals
  • Conclusion

3
Biodiversity informatics
  • The science of organisation, sharing,
    dissemination and use of data, information, and
    knowledge on biological diversity
  • Builds on distributed computing, data management,
    knowledge management, multimedia, GIS, e-business
    technologies, Grid...
  • Biodiversity information is extremely distributed
  • Biodiversity informatics is new and really first
    became feasible with the web

4
SCOPE OF BIODIVERSITY INFORMATICS WITHIN
BIOLOGICAL INFORMATICS
Environmental informatics
Biodiversity informatics
Bioinformatics
Scientific Name
Publica- tion
Population / Culture
Habitat
Land- scape
Geno- mics
Ecological Relation- ships
Species / Taxon
Eco- system
Individual / Specimen
Site
Molecular biology
Bio- sphere
Field Observation
Spatial Data
Collection
5
BIODIVERSITY IS AN INFORMATION MANAGEMENT
CHALLENGE
  • Total number of species about 10 million
  • 1.7 million species have been described and named
  • Total number of specimens in museum collections
    1-3 billion
  • Also hiding a large number of not yet described
    species
  • 18 000 new species described each year
  • This rate has not improved during the past 40
    years
  • 1 000 to 10 000 species lost each year to
    extinction
  • This rate is 1000 times faster than the natural
    rate

6
WHAT IS GBIF?
  • GBIF is an international scientific co-operative
    project based on a multilateral agreement (MoU)
    between countries, economies and international
    organisations, dedicated to
  • establishing an interoperable, distributed
    network of databases containing scientific
    biodiversity information, in order to
  • make the worlds scientific biodiversity data
    freely and universally available to all,
  • with initial focus on species- and specimen-level
    data,
  • with links to molecular, genetic and ecosystems
    levels

7
THE STORY OF GBIF
  • 1996 Planning of GBIF starts
  • January 1999Working group of the MegaScience
    Forum of the OECD recommends establishing GBIF
  • March 2001GBIF formally established
  • June 2001Denmark chosen to host GBIF Secretariat
  • November 2001Executive Secretary James L.
    Edwards moves to Copenhagen and initiates
    Secretariat
  • October 2002First work programme approved
  • 2004Three-year review and necessary
    reorientation
  • 2006Initial 5 year commitment of participants
    over and future of GBIF will be reconsidered

8
GBIF WORK PROGRAMMES
  • Data Access and Database Interoperability
  • Electronic Catalogue of Names of Known Organisms
  • Digitisation of Natural History Collections
  • Outreach and Capacity Building
  • Species Bank
  • Digital Biodiversity Literature Resources

9
GBIF VOTING PARTICIPANTS (as of 1 January 2003)
  • 22 Voting Participants
  • Australia, Belgium, Canada, Costa Rica, Denmark,
    Finland, France, Germany, Iceland, Japan, Korea,
    Mexico, Netherlands, New Zealand, Nicaragua,
    Peru, Portugal, Slovenia, Spain, Sweden, UK, USA
  • Convention on Biological Diversity is also an ex
    officio (non-voting) member of Governing Board

10
30 ASSOCIATE PARTICIPANTS (as of 1 January 2003)
  • Argentina
  • Austria
  • Bulgaria
  • Czech Republic
  • European Commission
  • Ghana
  • Pakistan
  • Poland
  • Slovak Republic
  • South Africa
  • Switzerland
  • Taiwan
  • Tanzania
  • ALL Species Foundation
  • ASEANET
  • BioNET
  • BIOSIS
  • Commonwealth Agricultural Bureau (CABI)
  • EASIANET
  • Expert Centre for Taxonomic Identification
  • Inter-American Biodiversity Information Network
  • Integrated Taxonomic Information System
  • NatureServe
  • Ocean Biogeographic Information System
  • Species 2000
  • Taxonomic Databases Working Group
  • UNESCO--Man and the Biosphere Program
  • UNEP--World Conservation Monitoring Centre
  • The World Federation for Culture Collections

11
PARTICIPANTS AGREE TO...
  • Share biodiversity data
  • Set up a node or nodes for sharing the data
  • Formulate and implement GBIF work programme for
    their part
  • Voting Participants (countries and economies)
    make yearly contribution based on GDP
  • GBIF central budget is 3M
  • Associate Participants (countries, economies,
    international organisations) cannot vote, but
    otherwise participate fully in GBIF activities
    and decisions
  • Make additional investments in biodiversity
    information and the necessary infrastructure
  • 90 of investment in GBIF happens within
    Participants, only 10 centrally for providing
    the linking mechanism

12
GBIF node responsibilities
GBIF Metadata Registry and Portal
  • Network
  • Standards
  • Tools
  • Consolidated Data
  • Network
  • Standards
  • Tools
  • Identify (local) Data Nodes
  • Forward registration metadata from Data Nodes
  • National Language Interfaces

Data Node
Participant Node
  • Data
  • Metadata
  • Encourage participation
  • Manage registration of Data Nodes

13
GBIF IPR Principles
  • GBIF will seek to ensure that data in
    GBIF-affiliated databases is in public domain
  • In particular data enabling linking with other
    data
  • GBIF will seek to ensure that source of data is
    acknowledged by all users
  • Cf. Open Source licenses, commons
  • Maintenance and control of data remain in hands
    of database owners
  • There will be no central data banks (except
    caches)
  • Database owners can block access to sensitive
    data
  • Countries have sovereignity over their biological
    resources
  • It follows that GBIF services will mainly be
    integrative metadata services, and standards

14
GBIF information architecture and central
registries
15
Information model(under development)
Institutions
Rights Services
Source URL Protocol Format Data
exchange Description
Data sources
Subject Taxa Coverage Spatial Coverage
Temporal Description
Knowledge Bases
Datasets
Taxonomies in Global Species Databases
Units/ Records
Objects
Rights Format Encodings
Unstructured information
Checklists Redlists
Observation data
Specimen data
Species Knowledge
16
COMPONENTS IN THE GBIF INFORMATION ARCHITECTURE
Participant Services
Central Services
Internal Services
Participant Node Portal
GBIF portal
Group Collaboration Services
Multiple Data Nodes
Data Base
Data Repository
GBIFS Services
Registry
Taxonomic Name Service
Standard Validation Tool
Standards Repository
Digitization Tools
17
You dont get very far with web services unless
you have a registry...-Tom Gaskins, uddi.org
The registry
  • One global marketplace of shared biodiversity
    data
  • A central services registry
  • Directory of Participants and Data providers
  • Datasources and datasets offered
  • Services of the providers
  • Services registry will then be used to generate a
    metadata registry of the available data
  • Registry retrieves metadata from the registered
    datasets
  • Indices over key elements in data sets (Dublin
    Core registry)
  • Subject taxon Coverage spatial, temporal ...
  • Open interfaces for portals and specialised
    search engines
  • Anybody can write their portal/search tool that
    uses the registry and the index
  • Will be written by GBIFS using open source
    components
  • Examples available in Biomoby, EIONET Content
    Registry, NBN

18
The registry consists of a Services registry and
a Content registry
  • Communications Portal
  • Syndication
  • Collaboration
  • User directories

Species Bank
Specialised Portal B
Web Application A Search Engine A
  • Services registry
  • Providers
  • Datasources /sets
  • Services of above
  • Content registry
  • Names and concepts
  • Federated key data
  • Indexes of content

Data source
Institution
Data source
Institution
19
Taxonomic name service
  • Dynamic linking mechanism derived from contents
    of Catalogue of Life
  • Closely paired with the registry
  • Many possible approaches
  • Semantic web and RDF
  • Taxonomic object service global namespace of
    URIs
  • Provide programmatic access to the current state
    of knowledge for taxonomy.
  • Provide a single name service that encapsulates
    existing services such as ITIS and SP2000
  • GBIF is working with other groups such as
    Consortium on the Catalogue of Life, TDWG and
    Octopus to define this service

20
Integration by name service (ECAT)
Portal
ECAT elements have been coloured orange Name
Lists are lists of names for a specific purpose
(e.g. Red List, regional checklist)
XML Data Access
HTML Data Access
GBIF Data Access (Servlets)
Registry
Species 2000
Name Usage Index
Name Service Interface (ECAT)
Indexing of usage
Indexing of usage
21
Data exchange standards are key
  • XML Schema must be agreed for
  • Name
  • Taxon
  • Specimen
  • Collection
  • Person in various roles
  • Publication
  • Site
  • Observation
  • Standards process must be open and consistent
  • Discussion
  • Documentation
  • Support for format validation
  • Support for quality assurance
  • Leading standards
  • TDWG ABCD
  • BioCASE
  • Dublin Core
  • Darwin Core
  • DiGIR
  • SOAP
  • Grid OGSA

22
Data nodes
  • service.xx.gbif.net

23
Data node
WSDL Service Descriptions
Specimen Index Data (3-5 fields)
Specimen Detail (full data)
Specimen Summary Data (20-30 fields)
HTML
Data Repository
Node Data Services
Presentation Service
Metadata Services

Collection Database Adaptor
Collection Database Adaptor
Collection Database
Collection Database

24
Services of data nodes
  • Export the shared data into
  • Data warehouse in SQL
  • Data repository (locally owned) in document
    format
  • Advertise the provider, its services and
    available datasets towards central registry
  • WSDL, possibly UDDI/ebXML
  • Participant node to coordinate
  • Dublin Core description of published datasets
  • Enable the central metadata registry to index the
    datasets
  • SOAP/ DiGIR protocol for queries and responses
  • TDWG/ABCD standard for data encoding in XML
  • Respond to queries of data users
  • SOAP/ DiGIR protocol for queries and responses
  • TDWG/ABCD standard for data encoding in XML

25
Some issues with online databases
  • Writing middleware (wrappers, resource
    agents) is a complicated task.
  • Toolkits, guidelines, and assistance (roaming
    wrapper writers) are needed.
  • High availability of databases when queries come
    must be arranged, or caching be used
  • Data warehousing is recommended
  • Original operational data on web is risky
  • Operational databases are optimised for input and
    storage, not always for querying
  • Quality assurance and possible approval easier to
    do on an exported dataset than an entire database
  • For these reasons international reporting systems
    traditionally have used document-based data
    exchange

26
Data repositories
  • Export the shared dataset into a locally owned
    repository
  • Exported dataset is held in document format
  • or data warehouse in a flat structure
  • Data is stored and served in XML format using the
    standards of GBIF/TDWG
  • I.e., Darwin Core /ABCD
  • Repository acts as a wrapper and allows the
    central metadata registry to index its datasets
  • SOAP/DiGIR, WSDL, Dublin Core, Darwin Core/ABCD
  • But... a repository does not respond to dynamic
    queries
  • It only returns the files as they were uploaded
  • Enables data warehousing elsewhere

27
Participant nodes
  • www.xx.gbif.net

28
Participant node
WSDL Service Descriptions
Specimen Data
Name Data
General Resource Data
HTML
Portal Services
Presentation Service
Registry Management
UDDI Service Registry
Specimen Data from Collection Data Nodes
Data Services from GBIF Portal
WSDL Service Descriptions
29
Node relationships for registration
GBIF Registry and Portal
Service Metadata
Service Metadata
Indexing Metadata
Participant Node
Participant Node
Service Metadata
Service Metadata
Collection Node
Collection Nodes
30
Possible services of the Participant nodes
  • Promotion and helping of inclusion of new data
    providers and data sets
  • Quality assurance and compliance
  • Institutional level Coordinate participant part
    of network who can play?
  • Dataset level Compliance with national
    regulations and IPR
  • Data element level Scientific quality control of
    correctness advertised data
  • Tecnical level Is the format right
  • Host data from the willing data nodes
  • National language support as needed
  • Use of the central registry to provide access to
    domain-relevant data
  • Thematic portal and search facilities to find
    special data of the Participant

31
What tools Participant node needs
  • Necessities
  • Register institutions and data sources (nodes)
  • Local directory server or UDDI database, linking
    with central registry
  • Register the services and datasets of nodes
  • Local UDDI or other registry, linking with
    central registry
  • Good to have
  • Tools for quality assurance
  • Portal server for domain-specific website
  • PTK for communication, repository tool for
    hosting
  • Directory of people and communication tools

32
GBIF central portal
  • www.gbif.net

33
Role of portals
  • Communication/ coordination needs
  • Portals are integrative tools and gateways to
    information that go beyond single websites
  • Portals and related directory services can be
    used to coordinate network activities
  • Data access needs
  • Much of the content on the portals can be built
    automatically out of contents of the Registry
    using metadata
  • GBIF central portal is only one of many portals
    and search engines making use of the central
    metadata registry and related indices through
    their open interfaces

34
Services of the GBIF portal
  • Version 1 released at the end of 2002, and as
    toolkit for nodes, http//beta.gbif.org/
  • News syndication and electronic newspaper with
    discussion
  • Events, calendar of calendars, projects
  • Articles, documents, images, audio and video
    content
  • Search within the site, across the GBIF network
  • Download area
  • Getting started service and how to become a node
  • About GBIF
  • CIRCA-based group collaboration services
  • Directory services (CIRCA-based open LDAP)
  • Suggestions and feedback from users
  • Prototype data repository
  • Version 2 end of 2003, demonstration earlier in
    the year
  • XML standards repository and registry links
  • Links to Participant nodes and their content
  • Access to biological content derived from the
    registry

35
Test version of the central GBIF
communic-ationsportal
36
GBIF Portal Toolkit (PTK)
  • Model implementation of the functions needed for
    dissemination and interaction between portals and
    the registry
  • Interoperability and automatic content
    syndication between collaborative portals, e.g.
    CHM
  • How to use the registry to create a vertical
    portal and specialised search engine
  • Knowledge management based on GBIF data sources
  • User interface for data mining, knowledge
    discovery, knowledge contributions, ...
  • Packages the tools in one turn-key solution to
    reduce the time needed for a node to get online
    and include a new data source in gbif.net
  • Open source, based on Zope www.zope.org
  • Available now as beta

37
Conclusion
38

GBIF VISION (a technical update)
Content area responsibilities of GBIF
GenBank, et al.
Sequence Data (RNA, protein, etc.)
Specimen Observation Data
Registry of Shared Biodiversity Data
GeospatialData
Climate Data
Electronic Catalog of Names
SpeciesBank, Search Engines Portals
Ecosystems Data
Existing responsibilities of other groups
Ecological Data
39
Species portals?When all is done, new kinds of
services can be created semiautomatically from
the contents of the registry, e.g. the
SpeciesBank Scoping of the SpeciesBank needed
  • http//ponderosa.pinus.plantae.bio

40
GRID AND GBIF
  • Grid is the emerging global distributed network
    for universally available high-performance
    computing and networking.
  • Potentially very relevant for GBIF.
  • Architectures
  • Web Services or Open Grid Service Architecture?
  • Possible areas of activity
  • Semantic Grid might fit the taxonomic name
    service
  • Production of global distribution map under
    multiple global change scenarios could require
    computational capacities from the Grid.
  • Advanced collaborative environment (ACE is a Grid
    Research Group) is needed for accelerating
    species discovery and distributed authoring of
    the Species Bank

41
HOW TO PLAY?
  • After all, 90 of investement in GBIF should be
    within Participants, not centrally
  • Share your data. Anyone can apply to become a
    data node the Participant nodes will coordinate
  • Use the data. Provide value-added services for
    data archiving, mining, analysis that build on
    the upcoming wealth of data.
  • Vertical portals and specilised search engines.
  • Contribute new data and knowledge.
  • Calls for proposals for seed money for important
    digitisation and other projects
  • GBIF builds on open source
  • Lots of room to provide tools
  • Contribute to standards refinement

42
SUMMARY
  • GBIF network to be up and running by end of 2003
  • New generation of simple data exchange standards
  • Central registry and marketplace of distributed
    data
  • Anyone can build their vertical portals or
    specilised search engines on top of that
  • Participant nodes Major role in quality control,
    coordination and dissemination
  • Data nodes Register your datasets, provide
    online access to database or repository
  • Data remains under the control of providers
Write a Comment
User Comments (0)
About PowerShow.com