An Ecology of CENS Data Talk by Christine L. Borgman - PowerPoint PPT Presentation

1 / 34
About This Presentation
Title:

An Ecology of CENS Data Talk by Christine L. Borgman

Description:

An Ecology of CENS Data Talk by Christine L' Borgman – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 35
Provided by: research55
Category:

less

Transcript and Presenter's Notes

Title: An Ecology of CENS Data Talk by Christine L. Borgman


1
An Ecology of CENS DataTalk by Christine L.
Borgman Jillian C. Wallis Demos by Matthew S.
Mayernik Alberto Pepe
  • CENS Friday Seminar
  • May 11th, 2007

2
An Ecology of CENS Data Overview
  • Introduction
  • Interview Study Methods
  • Interview Results
  • Data Life Cycle
  • Functional Requirements for Infrastructure
  • Data Sharing Trust
  • Building Out the Data Ecology
  • Demos
  • CENS Deployment Center
  • CENS eScholarship Repository

3
Information Infrastructure Challenges
  • Make scientific data available to
  • scientific community
  • educational community
  • Manage scientific data from sensors in multiple
    ways
  • streaming data
  • archived data
  • In situ sensor kits for short term data
    collection
  • Serve multiple disciplines, problems and projects
  • habitat biology
  • seismology
  • contaminant transport
  • marine microbiology
  • Serve diverse user communities
  • scientists, domestic and international
  • grades 6-12 students and teachers, domestic and
    international

4
Interview Study Research Questions
  • Research problem CENS is committed to sharing
    data from our research
  • Research questions
  • What are CENS data?
  • What data are available to share?
  • Under what conditions do CENS researchers wish
    to release data?
  • Implications
  • What is an appropriate architecture for managing
    CENS data?
  • What are appropriate policies for balancing
    rights and responsibilities in access to CENS
    data?

5
Interview Study Sample
Interviews Interviews Pilot Terrestrial Contam Aquatic Total
Application Scientists Faculty 3 1 2 6
Application Scientists Staff 1 1 1 3
Application Scientists Students 1 1 1 3
Technology Scientists Faculty 3 1 1 5
Technology Scientists Staff 3 1 4
Technology Scientists Students 2 1 3
Total Total 2 12 4 6 24
Interviews averaged 60 minutes 23 hours of
tape 312 pages of transcriptions (not including
pilot)
6
What are CENS Data?
Sensor Collected Proprioceptive Data
Sensor Collected Performance Data
Conductivity
PAR
Awake time
Flow
Wind speed
Wind duration
Heading
Fault detection
Water potential
Wind direction
Leaf wetness
Neighbor table
Roll/pitch/yaw
Sap flow
Humidity
Soil moisture
Bird calls
Packets transmitted
Water temp
Rainfall
Motor speed
LandSat images
Mosscam
Packets received
ORP
CDOM
Rudder angle
pH
GPS/location
Time
Calcium
Battery voltage
Water depth
Temperature
CO2
Chloride
Routing table
Ammonium
Chlorophyll
Nitrate
Ammonia
Phosphate
Sensor Collected Application Data
Hand Collected Application Data
Organism presence
Mercury
Nutrient presence
Organism concentration
Methylmercury
Nutrient concentration
7
What Data Exist to Release?
  • What are the data?
  • Sensor collected application data
  • Hand collected application data
  • Sensor collected performance data
  • Sensor collected proprioceptive data
  • What are the states of the data?
  • Raw data
  • Processed data
  • Verified data
  • Certified data
  • Models
  • Software algorithms
  • Where are the data?
  • Refrigerators
  • Hard copies
  • Computers of individual students, staff, faculty
  • Lab servers
  • On CENSWEB, SensorBase

8
Who Can Release What Data?
  • Who is the owner of a dataset?
  • The funding or supporting institution
  • The principal investigator
  • Anyone with any intellectual contribution
  • Dont know/havent considered
  • Who will take responsibility for the data?

9
Conditions for Releasing Data
  • No restrictions will post all data immediately
    for use by anyone
  • Will release data only in specific states
  • Raw data
  • Data with a certain level of processing
  • Certified data
  • Will release upon request
  • To anyone, no restrictions
  • If commercial reuse, share and share alike
  • To anyone, provided source is acknowledged or
    cited
  • If co-authorship credit given for providing the
    data
  • If research questions do not compete with ours
  • Will release after an embargo period
  • After data are published
  • After weve finished mining the data
  • Time period, e.g., 3-5 yrs
  • It depends

10
Integrating Social Concerns with Data Architecture
  • Develop metadata to capture social aspects of
    sharing
  • Current state raw, verified, processed, etc
  • Releasable state raw, verified, processed, etc
  • Embargo period none, published, time period, etc
  • Allowable uses all, education/public,
    not-for-profit, etc
  • Request requirements none, cite/ack,
    co-authorship, etc
  • Public health certified yes, no, not required
  • Develop policy to support data management and
    release
  • Encourage discussion of release restrictions
  • Encourage regular ethics discussion regarding the
    responsible use of data

11
Potential User Communities for CENS Data
Avian Biology
Fresh H2O Biology
Aquatic MicrobialData
Terrestrial EcologyData
Climatology
Architecture
Limnology
Entomology
Meta-Analysis
Pattern Recognition
Fresh H2O Ecology
Government
Exobiology
CS
Public
Insurance
Soil Chemistry
Robotics
Public Health
GIS
EE
Soil Ecology
Regulatory Agencies
Mercury Research
Contaminant TransportData
Water Flow Modeling
Arsenic Research
12
Data Life Cycle
13
Functional Requirements Overview
  1. Obtain and maintain data in the field
  2. Verify data in the field
  3. Document data sufficiently for interpretation
  4. Integrate data from multiple sources
  5. Analyze data
  6. Preserve the data

14
1. Obtain Maintain Data in the Field
  • I was just storing it locally on this laptop
    because I did not have network access... for two
    weeks, during the entire deployment..

15
2. Verify Data in the Field
  • our pre-imposed calibration curves are pretty
    different from one another, so there will be some
    debate about whether we use the pre or the post
    or the average, or whether theres something we
    can do to measure how fast its changing.

16
3. Documenting Data for Interpretation
  • Temperature is temperature.
  • There are hundreds of ways to measure
    temperature. The temperature is 98 is low-value
    compared to, the temperature of the surface,
    measured by the infrared thermopile, model number
    XYZ, is 98. That means it is measuring a proxy
    for a temperature, rather than being in contact
    with a probe, and it is measuring from a
    distance. The accuracy is plus or minus .05 of a
    degree. I also want to know that it was taken
    outside versus inside a controlled environment,
    how long it had been in place, and the last time
    it was calibrated, which might tell me whether it
    has drifted.."

17
4. Integration of Data from Multiple Sources
  • "synching all of the sensors is a chore but its
    still a concern..because Ive received data sets
    that Im sure are not synched properly."

18
5. Analysis of Data
  • New technologies -- new questions
  • I started thinking about how this was different,
    and could let me ask different questions.
  • Statistical tools will not scale
  • I also knew that we were going to get data in a
    magnitude that I just could not analyze with all
    the normal tools that I use.
  • Different granularity of use
  • Some people want to see a whole weeks worth of
    data averaged, on a daily basis, on a monthly
    basis. Some people want to see day-by-day,
    hour-by-hour, minute-by-minute. They want to see
    the pattern. It varies depending on the question
    that you are asking, and the data analysis might
    be vastly different.

19
6. Preservation of Data
  • If the data has been quality controlled and
    error checked, it is more valuable and something
    that we would want to preserve in perpetuity as
    opposed to a goofy data set that we end up
    dumping.

20
Traditional Data Sharing Exchange
1. Publishing
2. Reading
3. Initial Request
4. Specification Negotiations
5. Reformat, Clean, Label
Data Provider
6. Sent To
Data Requestor
21
Transference of Trust with Data
Communication
Data Requestor
Data Provider
Devices Methods
Publication
Data
22
What Is Needed for Trust in Reuse



Data Provider
Devices Methods
Data
Publication
Trust Reuse
23
Fulfilling the Data Life Cycle
Deployment Plans
Publications
Datasets
24
Sensorbase.org
  • Borrows heavily from Web 2.0 applications like
    blogging that invite user-generated content
  • Each project has standard page maps location of
    sensors, provides access to project notes and
    lists of available data
  • Data slogged in CSV or XML (SensorML, EML on the
    way)
  • Underlying schema designed for CENS apps
  • Users access data through web interface or SOAP
    web services
  • Developing Signal search Syndication,
    Computational tools

25
The CENS Deployment Center
  • A multi-purpose web-tool andservice for
  • pre-deployment planning
  • post-deployment knowledgetransfer
  • Centralized access to past,current, and future
    CENS deployments
  • Searchable by highly-structuredmetadata
  • http//schnauzer.ats.ucla.edu/censdc

26
Deployment Center Flow
Create New Plan
Make Plan Public
Leader
Deploy!
Fill Out Evaluations
Make Report Public
Participants
Leader
27
The CENS eScholarship repository
  • An OAI-compliant collectionof pre-prints,
    papers, technical reports, presentations
    andlinks to datasets
  • Allows for the aggregationof other ENS
    publicationsthrough metadata
  • http//repositories.cdlib.org/cens/

28
eScholarship Repository Benefits
29
Future Work More Complete Data Ecology
Sensors
Deployment Data
People
Publications
Sensorbase
eScholarship Repository
Sensor Registry
CENS Directory
Deployment Center
30
Object Reuse Exchange (ORE) Initiative
  • Aims for ORE
  • Augment cross-repository interoperability reached
    by Open Archives Initiative (OAI)
  • Build an interoperable fabric for scholarly value
    chains
  • Create a repository-centric scholarly
    communication system
  • Who is involved
  • Led by the Los Alamos National Lab, supported by
    Mellon Foundation
  • In collaboration with Microsoft, CNI, JISC and
    the Digital Library Federation
  • Further reading
  • Herbert Van de Sompel et al. An Interoperable
    Fabric for Scholarly Value Chains. D-Lib
    Magazine, 12(10), 2006.

31
A Fabric for Interoperability
  • Need to make heterogeneous data available for
    sensor communities
  • Including datasets, publications, videos,
    software,
  • Scholarly communication as a value chain of
    digital objects in repositories
  • Achieve interoperability via
  • A shared data model for digital objects
  • A shared surrogate format to represent digital
    objects across the infrastructure
  • A shared protocol to obtain, harvest, put
    surrogates

32
A Fabric for Interoperability -- Model
Services
Deployment Center
Content
SensorBase
eScholarship Repository
33
Creating a Compound Data Object
Deployment Center
Sensorbase
eScholarship Repository
lt?xml version"1.0" encoding"utf-8"?gt ltfeed
xmlns"http//www.w3.org/2005/Atom"gt
lttitlegtExample ORE compound objectlt/titlegt ltlink
rel"self" type"application/atomore"href"http
//cens.ucla.edu/repository/ore1"/gt ltlink
rel"aboutDOI"gtDOIsome-resourcelt/linkgt ltupdatedgt2
003-12-13T183002Zlt/updatedgt ltgenerator
uri"http//www.cens.ucla.edu/ore"
version"1.0"gtCENSORElt/generatorgt ltauthorgtltnamegtAl
berto Pepelt/namegtlt/authorgt ltidgturnuuid60a76c80-d
399-11d9-b93C-0003939e0af6lt/idgt ltentrygt lttitlegtFi
rst resourcelt/titlegt ltlink rel"alternate"
type"pdf href"http//cens.ucla.edu/escholarship
/1"/gt ltidgturnuuid1225c695-cfb8-4ebb-aaaa-80da34
4efa6alt/idgt ltupdatedgt2003-12-13T183002Zlt/updated
gt lt/entrygt ltentrygt lttitlegtSecond
resourcelt/titlegt ltlink rel"alternate"
type"text/html" href"http//sensorbase.org/1"/gt
ltidgturnuuid1225c695-cfb8-4ebb-aaaa-80da344efssslt
/idgt ltupdatedgt2003-12-13T183002Zlt/updatedgt lt/ent
rygt lt/feedgt
Dep Plan
Pub
Data
Dep Plan
34
Acknowledgements Thanks
  • CENS is funded by National Science Foundation
    Cooperative Agreement CCR-0120778, Deborah L.
    Estrin, UCLA, Principal Investigator Christine
    L. Borgman is a co-Principal Investigator.
    CENSEI, under which much of this research was
    conducted, is funded by National Science
    Foundation grant ESI- 0352572, William A.
    Sandoval, Principal Investigator and Christine L.
    Borgman, co- Principal Investigator. Alberto
    Pepes participation in this research is
    supported by a gift from the Microsoft Technical
    Computing Initiative. SensorBase research in CENS
    is led by Mark Hansen and Nathan Yau. Support for
    CENS bibliographic database development is
    provided by Christina Patterson and Margo Reveil
    of UCLA Academic Technology Services.
Write a Comment
User Comments (0)
About PowerShow.com