Title: An Ecology of CENS Data Talk by Christine L. Borgman
1An Ecology of CENS DataTalk by Christine L.
Borgman Jillian C. Wallis Demos by Matthew S.
Mayernik Alberto Pepe
- CENS Friday Seminar
- May 11th, 2007
2An Ecology of CENS Data Overview
- Introduction
- Interview Study Methods
- Interview Results
- Data Life Cycle
- Functional Requirements for Infrastructure
- Data Sharing Trust
- Building Out the Data Ecology
- Demos
- CENS Deployment Center
- CENS eScholarship Repository
3Information Infrastructure Challenges
- Make scientific data available to
- scientific community
- educational community
- Manage scientific data from sensors in multiple
ways - streaming data
- archived data
- In situ sensor kits for short term data
collection - Serve multiple disciplines, problems and projects
- habitat biology
- seismology
- contaminant transport
- marine microbiology
- Serve diverse user communities
- scientists, domestic and international
- grades 6-12 students and teachers, domestic and
international
4Interview Study Research Questions
- Research problem CENS is committed to sharing
data from our research - Research questions
- What are CENS data?
- What data are available to share?
- Under what conditions do CENS researchers wish
to release data? - Implications
- What is an appropriate architecture for managing
CENS data? - What are appropriate policies for balancing
rights and responsibilities in access to CENS
data?
5Interview Study Sample
Interviews Interviews Pilot Terrestrial Contam Aquatic Total
Application Scientists Faculty 3 1 2 6
Application Scientists Staff 1 1 1 3
Application Scientists Students 1 1 1 3
Technology Scientists Faculty 3 1 1 5
Technology Scientists Staff 3 1 4
Technology Scientists Students 2 1 3
Total Total 2 12 4 6 24
Interviews averaged 60 minutes 23 hours of
tape 312 pages of transcriptions (not including
pilot)
6What are CENS Data?
Sensor Collected Proprioceptive Data
Sensor Collected Performance Data
Conductivity
PAR
Awake time
Flow
Wind speed
Wind duration
Heading
Fault detection
Water potential
Wind direction
Leaf wetness
Neighbor table
Roll/pitch/yaw
Sap flow
Humidity
Soil moisture
Bird calls
Packets transmitted
Water temp
Rainfall
Motor speed
LandSat images
Mosscam
Packets received
ORP
CDOM
Rudder angle
pH
GPS/location
Time
Calcium
Battery voltage
Water depth
Temperature
CO2
Chloride
Routing table
Ammonium
Chlorophyll
Nitrate
Ammonia
Phosphate
Sensor Collected Application Data
Hand Collected Application Data
Organism presence
Mercury
Nutrient presence
Organism concentration
Methylmercury
Nutrient concentration
7What Data Exist to Release?
- What are the data?
- Sensor collected application data
- Hand collected application data
- Sensor collected performance data
- Sensor collected proprioceptive data
- What are the states of the data?
- Raw data
- Processed data
- Verified data
- Certified data
- Models
- Software algorithms
- Where are the data?
- Refrigerators
- Hard copies
- Computers of individual students, staff, faculty
- Lab servers
- On CENSWEB, SensorBase
8Who Can Release What Data?
- Who is the owner of a dataset?
- The funding or supporting institution
- The principal investigator
- Anyone with any intellectual contribution
- Dont know/havent considered
- Who will take responsibility for the data?
9Conditions for Releasing Data
- No restrictions will post all data immediately
for use by anyone - Will release data only in specific states
- Raw data
- Data with a certain level of processing
- Certified data
- Will release upon request
- To anyone, no restrictions
- If commercial reuse, share and share alike
- To anyone, provided source is acknowledged or
cited - If co-authorship credit given for providing the
data - If research questions do not compete with ours
- Will release after an embargo period
- After data are published
- After weve finished mining the data
- Time period, e.g., 3-5 yrs
- It depends
10Integrating Social Concerns with Data Architecture
- Develop metadata to capture social aspects of
sharing - Current state raw, verified, processed, etc
- Releasable state raw, verified, processed, etc
- Embargo period none, published, time period, etc
- Allowable uses all, education/public,
not-for-profit, etc - Request requirements none, cite/ack,
co-authorship, etc - Public health certified yes, no, not required
- Develop policy to support data management and
release - Encourage discussion of release restrictions
- Encourage regular ethics discussion regarding the
responsible use of data
11Potential User Communities for CENS Data
Avian Biology
Fresh H2O Biology
Aquatic MicrobialData
Terrestrial EcologyData
Climatology
Architecture
Limnology
Entomology
Meta-Analysis
Pattern Recognition
Fresh H2O Ecology
Government
Exobiology
CS
Public
Insurance
Soil Chemistry
Robotics
Public Health
GIS
EE
Soil Ecology
Regulatory Agencies
Mercury Research
Contaminant TransportData
Water Flow Modeling
Arsenic Research
12Data Life Cycle
13Functional Requirements Overview
- Obtain and maintain data in the field
- Verify data in the field
- Document data sufficiently for interpretation
- Integrate data from multiple sources
- Analyze data
- Preserve the data
141. Obtain Maintain Data in the Field
- I was just storing it locally on this laptop
because I did not have network access... for two
weeks, during the entire deployment..
152. Verify Data in the Field
- our pre-imposed calibration curves are pretty
different from one another, so there will be some
debate about whether we use the pre or the post
or the average, or whether theres something we
can do to measure how fast its changing.
163. Documenting Data for Interpretation
- Temperature is temperature.
- There are hundreds of ways to measure
temperature. The temperature is 98 is low-value
compared to, the temperature of the surface,
measured by the infrared thermopile, model number
XYZ, is 98. That means it is measuring a proxy
for a temperature, rather than being in contact
with a probe, and it is measuring from a
distance. The accuracy is plus or minus .05 of a
degree. I also want to know that it was taken
outside versus inside a controlled environment,
how long it had been in place, and the last time
it was calibrated, which might tell me whether it
has drifted.."
174. Integration of Data from Multiple Sources
- "synching all of the sensors is a chore but its
still a concern..because Ive received data sets
that Im sure are not synched properly."
185. Analysis of Data
- New technologies -- new questions
- I started thinking about how this was different,
and could let me ask different questions. - Statistical tools will not scale
- I also knew that we were going to get data in a
magnitude that I just could not analyze with all
the normal tools that I use. - Different granularity of use
- Some people want to see a whole weeks worth of
data averaged, on a daily basis, on a monthly
basis. Some people want to see day-by-day,
hour-by-hour, minute-by-minute. They want to see
the pattern. It varies depending on the question
that you are asking, and the data analysis might
be vastly different.
196. Preservation of Data
- If the data has been quality controlled and
error checked, it is more valuable and something
that we would want to preserve in perpetuity as
opposed to a goofy data set that we end up
dumping.
20Traditional Data Sharing Exchange
1. Publishing
2. Reading
3. Initial Request
4. Specification Negotiations
5. Reformat, Clean, Label
Data Provider
6. Sent To
Data Requestor
21Transference of Trust with Data
Communication
Data Requestor
Data Provider
Devices Methods
Publication
Data
22What Is Needed for Trust in Reuse
Data Provider
Devices Methods
Data
Publication
Trust Reuse
23Fulfilling the Data Life Cycle
Deployment Plans
Publications
Datasets
24Sensorbase.org
- Borrows heavily from Web 2.0 applications like
blogging that invite user-generated content - Each project has standard page maps location of
sensors, provides access to project notes and
lists of available data - Data slogged in CSV or XML (SensorML, EML on the
way) - Underlying schema designed for CENS apps
- Users access data through web interface or SOAP
web services - Developing Signal search Syndication,
Computational tools
25The CENS Deployment Center
- A multi-purpose web-tool andservice for
- pre-deployment planning
- post-deployment knowledgetransfer
- Centralized access to past,current, and future
CENS deployments - Searchable by highly-structuredmetadata
- http//schnauzer.ats.ucla.edu/censdc
26Deployment Center Flow
Create New Plan
Make Plan Public
Leader
Deploy!
Fill Out Evaluations
Make Report Public
Participants
Leader
27The CENS eScholarship repository
- An OAI-compliant collectionof pre-prints,
papers, technical reports, presentations
andlinks to datasets - Allows for the aggregationof other ENS
publicationsthrough metadata - http//repositories.cdlib.org/cens/
28eScholarship Repository Benefits
29Future Work More Complete Data Ecology
Sensors
Deployment Data
People
Publications
Sensorbase
eScholarship Repository
Sensor Registry
CENS Directory
Deployment Center
30Object Reuse Exchange (ORE) Initiative
- Aims for ORE
- Augment cross-repository interoperability reached
by Open Archives Initiative (OAI) - Build an interoperable fabric for scholarly value
chains - Create a repository-centric scholarly
communication system - Who is involved
- Led by the Los Alamos National Lab, supported by
Mellon Foundation - In collaboration with Microsoft, CNI, JISC and
the Digital Library Federation - Further reading
- Herbert Van de Sompel et al. An Interoperable
Fabric for Scholarly Value Chains. D-Lib
Magazine, 12(10), 2006.
31A Fabric for Interoperability
- Need to make heterogeneous data available for
sensor communities - Including datasets, publications, videos,
software, - Scholarly communication as a value chain of
digital objects in repositories - Achieve interoperability via
- A shared data model for digital objects
- A shared surrogate format to represent digital
objects across the infrastructure - A shared protocol to obtain, harvest, put
surrogates
32A Fabric for Interoperability -- Model
Services
Deployment Center
Content
SensorBase
eScholarship Repository
33Creating a Compound Data Object
Deployment Center
Sensorbase
eScholarship Repository
lt?xml version"1.0" encoding"utf-8"?gt ltfeed
xmlns"http//www.w3.org/2005/Atom"gt
lttitlegtExample ORE compound objectlt/titlegt ltlink
rel"self" type"application/atomore"href"http
//cens.ucla.edu/repository/ore1"/gt ltlink
rel"aboutDOI"gtDOIsome-resourcelt/linkgt ltupdatedgt2
003-12-13T183002Zlt/updatedgt ltgenerator
uri"http//www.cens.ucla.edu/ore"
version"1.0"gtCENSORElt/generatorgt ltauthorgtltnamegtAl
berto Pepelt/namegtlt/authorgt ltidgturnuuid60a76c80-d
399-11d9-b93C-0003939e0af6lt/idgt ltentrygt lttitlegtFi
rst resourcelt/titlegt ltlink rel"alternate"
type"pdf href"http//cens.ucla.edu/escholarship
/1"/gt ltidgturnuuid1225c695-cfb8-4ebb-aaaa-80da34
4efa6alt/idgt ltupdatedgt2003-12-13T183002Zlt/updated
gt lt/entrygt ltentrygt lttitlegtSecond
resourcelt/titlegt ltlink rel"alternate"
type"text/html" href"http//sensorbase.org/1"/gt
ltidgturnuuid1225c695-cfb8-4ebb-aaaa-80da344efssslt
/idgt ltupdatedgt2003-12-13T183002Zlt/updatedgt lt/ent
rygt lt/feedgt
Dep Plan
Pub
Data
Dep Plan
34Acknowledgements Thanks
- CENS is funded by National Science Foundation
Cooperative Agreement CCR-0120778, Deborah L.
Estrin, UCLA, Principal Investigator Christine
L. Borgman is a co-Principal Investigator.
CENSEI, under which much of this research was
conducted, is funded by National Science
Foundation grant ESI- 0352572, William A.
Sandoval, Principal Investigator and Christine L.
Borgman, co- Principal Investigator. Alberto
Pepes participation in this research is
supported by a gift from the Microsoft Technical
Computing Initiative. SensorBase research in CENS
is led by Mark Hansen and Nathan Yau. Support for
CENS bibliographic database development is
provided by Christina Patterson and Margo Reveil
of UCLA Academic Technology Services.