Title: Biodiversity Data Retrieval and Integration Distributed species, data, computation and credit
1Biodiversity Data Retrieval and Integration
Distributed species, data, computation and credit
- James H. Beach
- Biodiversity Research Center
- University of Kansas
- beach_at_ku.edu
2Museums and their Data
- 3 B specimens and data documenting the
distribution of life on earth - 2 M species
- 300 years of biological exploration
- Data are held in dynamic, autonomous,
self-organizing and spatially-distributed
collections
3Paris Museum Mexican Birds
4British Museum Mexican Birds
5Field Museum Mexican Birds
6KU Museum Mexican Birds
7World Museum Mexican Birds
8The Species Analyst Network
- Direct access to live primary data
- Ownership and control maintained locally
- Z39.50, HTTP, XML data, XML Query
Broadcast query
Data Resources
9Species Analyst HTML Gateway
10Results of Species Analyst Query
11GARP Genetic Algorithm for Rule-set Production
- Developed by David Stockwell, San Diego
Supercomputer Center - Takes advantage of multiple algorithms (BIOCLIM,
logistic regression, etc.) - Different decision rules may apply to different
sectors of species distributions - Uses a genetic algorithm for choosing rules
- Implemented on WWW, and open for public use
12Species Analyst GARP A Powerful Tool
- Integrates distributed biodiversity data
- Provides current information on species ranges
- Models species ecological niches
- Predicts geographic distributions
- Integrates niche models with environmental change
scenarios, e.g. global climate change and
biodiversity, invasive species, emerging diseases
13Asian Longhorn Beetle (Anoplophora glabripennis)
14Longhorn Beetle - Modeled Asian Distribution
15Asian Longhorn Beetle Predicted U.S.
Distribution
16A Global Encyclopedia of Life or The World
According to GARP
- Research
- Biogeographic analysis on distributions
- Invasive species predictions
- Monitoring and conservation planning
- Global climate change impacts on Biota
- Outreach, Education and Training
- Backyard biodiversity, spatial data queries, GIS
functions - Interactive data entry, observational data
- Data Analysis Services for Museums
- Uniqueness and value of collections holdings
- Data quality issues
- Summary statistics and analyses
17A Global Encyclopedia of Life or The World
According to GARP (2)
- Every documented species with georeferenced
localities in the Species Analyst Network - North America, Western Hemisphere, World
- Resolution 1 Km grid NA, 10 Km elsewhere
- 1 M species in collections with data?
- Computational Requirements
18Metacomputing Museum Data
- Global species distributions parallel
computation - SETI _at_ Home
- Collaborative computing
- 1 M simultaneous users
- Port GARP to Win32 to run in background or
foreground
19Lifemapper
- Georeferenced Species Data
- Distributed Query Architecture
- Predictive Modeling
- Distributed Computation
- Spatial Map and Model Archive
- Open Access Web Portal
20Lifemapper Demonstration
21Lifemapper Future Directions
- Diversify modeling options, add interactivity, 3D
analysis and visualization - Add new classes of data layers, remote sensing,
human impacts element, ecological models - Add observational species data
- Embed dispersion models, temporal dimension
- Add internet services API, UDDI, SOAP
- Add more value-added services for data providers
- Embed LM data and analysis tools within a
semantic research and decision support network - Integrate LM into informal and formal science
education
22Lifemapper Social Scaling
- Distributed authorship
- Desktop computing
- User preferences
- Value-added collections data analysis
- Acknowledgement and accreditation of
contributions, ranks and statistics
23Museums as Sensor Networks
- Data are dynamic, servers connections
- Deborah Estrin -- Adaptive self-organization of
the network, unattended and untethered --
parallels to curators and collection managers. - Self-assembling, observational data
- Do not usually have the requirement of real time
- Changes are as important
- Source data (West Nile virus), model outputs
- Frank Vernon mentioned that in many cases it is
not the data values per se it is the change that
is of importance - People as part of the Network
- Doug Goodin people are part of the technological
system museum are sensors, they are
observatories, but the latency of bringing the
data into analysis engines is not measured in
milliseconds but in field seasons, or decades to
get formal publication of new scientific
concepts. Many specimens and data are centuries
old
24Acknowledgements
- University of Kansas
- Dave Vieglais, Ricardo Pereira, Aimee Stewart,
Greg Vorontsov, Town Peterson, BRC - SDSC
- David Stockwell, Environmental Computing
- University of Massachusetts-Boston
- Bob Morris, CS, Rob Stevenson, Biology
- UC Berkeley
- John Wiecorek, Museum of Vertebrate Zoology
- Dan Wertheimer, Space Science Laboratory
- Agriculture Canada
- Derek Munro, ITIS Canada Office
- California Academy of Sciences
- Stan Blum, Informatics