HiFi: Networkcentric Query Processing in the Physical World - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

HiFi: Networkcentric Query Processing in the Physical World

Description:

HiFi: Networkcentric Query Processing in the Physical World – PowerPoint PPT presentation

Number of Views:271
Avg rating:3.0/5.0
Slides: 41
Provided by: jeff70
Category:

less

Transcript and Presenter's Notes

Title: HiFi: Networkcentric Query Processing in the Physical World


1
HiFi Network-centric Query Processing in the
Physical World
Mike Franklin UC Berkeley
  • SAP Research Forum
  • February 2005

2
Introduction
  • Receptors everywhere!
  • Wireless sensor networks, RFID technologies,
    digital homes, network monitors, ...

Large-scale deployments will be as High Fan-In
Systems
3
High Fan-in Systems
The Bowtie
Large numbers of receptors large data volumes
Hierarchical, successive aggregation
4
High Fan-in Example (SCM)
Headquarters
Regional Centers
Warehouses, Stores
Dock doors, Shelves
Receptors
5
Properties
  • High Fan-In, globally-distributed architecture.
  • Large data volumes generated at edges.
  • Filtering and cleaning must be done there.
  • Successive aggregation as you move inwards.
  • Summaries/anomalies continually, details later.
  • Strong temporal focus.
  • Strong spatial/geographic focus.
  • Streaming data and stored data.
  • Integration within and across enterprises.

6
Design Space Time
Archiving (provenance and schema evolution)
Filtering,Cleaning,Alerts
Monitoring, Time-series
Data mining (recent history)
On-the-fly processing
Disk-based processing
Stream/Disk Processing
7
Design Space Geography
Archiving (provenance and schema evolution)
Filtering,Cleaning,Alerts
Monitoring, Time-series
Data mining (recent history)
8
Design Space Resources
Archiving (provenance and schema evolution)
Filtering,Cleaning,Alerts
Monitoring, Time-series
Data mining (recent history)
9
Design Space Data
Archiving (provenance and schema evolution)
Filtering,Cleaning,Alerts
Monitoring, Time-series
Data mining (recent history)
Dup Elim history hrs
Interesting Events history days
Trends/Archive history years
10
State of the Art
  • Current approaches hand-coded, script-based
  • expensive, one-off, brittle, hard to deploy and
    keep running
  • Piecemeal/stovepipe systems
  • Each type of receptor (RFID, sensors, etc)
    handled separately
  • Standards-efforts not addressing this
  • Protocol design bent
  • Different data models at each level
  • Reinventing query languages at each level
  • ? No end-to-end, integrated middleware for
    managing distributed receptor data

11
HiFi
  • A data management infrastructure for high fan-in
    environments
  • Uniform Declarative Framework
  • Every node is a data stream processor that speaks
    SQL-ese
  • ? stream-oriented queries at all levels
  • Hierarchical, stream-based views as an organizing
    principle

12
Why Declarative? (database dogma)
  • Independence data, location, platform
  • Allows the system to adapt over time
  • Many optimization opportunities
  • In a complex system, automatic optimization is
    key.
  • Also, optimization across multiple applications.
  • Simplifies Programming
  • ???

13
Building HiFi
14
Integrating RFID Sensors (the loudmouth query)
15
A Tale of Two Systems
  • TinyDB
  • Declarative query processing for
  • wireless sensor networks
  • In-network aggregation
  • Released as part of TinyOS Open Source
    Distribution
  • TelegraphCQ
  • Data stream processor
  • Continuous, adaptive query
  • processing with aggressive sharing
  • Built by modifying PostgreSQL
  • Open source beta release out now new release
    soon

16
TinyDB
  • The Network is the Database
  • Basic idea treat the sensor net as a virtual
    table.
  • System hides details/complexities of devices,
    changing topologies, failures,
  • System is responsible for efficient execution.
  • Developed on TinyOS/Motes

http//telegraph.cs.berkeley.edu/tinydb
17
TelegraphCQ Data Stream Monitoring
  • Streaming Data
  • Network monitors
  • Sensor Networks, RFID
  • News feeds, Stock tickers,
  • B2B and Enterprise apps
  • Trade Reconciliation, Order Processing etc.
  • (Quasi) real-time flow of events and data
  • Manage these flows to drive business processes.
  • Can mine flows to create and adjust business
    rules.
  • Can also tap into flows for on-line analysis.

http//telegraph.cs.berkeley.edu
18
Data Stream Processing
Result Tuples
Result Tuples
Queries
Queries
Data
Traditional Database
Data Stream Processor
  • Data streams are unending
  • Continuous, long running queries
  • Real-time processing

19
Windowed Queries
A typical streaming query
Window Clause
SELECT S.city, AVG(temp) FROM SOME_STREAM
S range by 5 seconds slide by 5
seconds WHERE S.state California GROUP BY
S.city
I want to look at 5 seconds worth of data
I want a result tuple every 5 seconds
Window
Data Stream

Result Tuple(s)
Result Tuple(s)
20
TelegraphCQ Architecture
21
The HiFi System
PC
Stargates
Sensor Networks RFID Readers
22
Basic HiFi Architecture
  • Hierarchical federation of nodes
  • Each node
  • Data Stream Query Processor (DSQP)
  • HiFi Glue
  • Views drive system functionality
  • Metadata Repository (MDR)

23
HiFi Processing Pipelines
  • The CSAVA Framework

On-line Data Mining
CSAVA
Generalization
24
CSAVA Processing
Clean
CREATE VIEW cleaned_rfid_stream AS (SELECT
receptor_id, tag_id FROM rfid_stream rs WHERE
read_strength gt strength_T)
25
CSAVA Processing
Smooth
CREATE VIEW smoothed_rfid_stream AS (SELECT
receptor_id, tag_id FROM cleaned_rfid_stream
range by 5 sec, slide by 5
sec GROUP BY receptor_id, tag_id HAVING
count() gt count_T)
Clean
26
CSAVA Processing
Arbitrate
CREATE VIEW arbitrated_rfid_stream AS (SELECT
receptor_id, tag_id FROM smoothed_rfid_stream rs
range by 5 sec, slide by 5
sec GROUP BY receptor_id, tag_id HAVING
count() gt ALL (SELECT count() FROM
smoothed_rfid_stream range by 5
sec, slide by 5 sec
WHERE tag_id rs.tag_id GROUP BY
receptor_id))
Smooth
Clean
27
CSAVA Processing
Validate
CREATE VIEW validated_tags AS (SELECT tag_name,
FROM arbitrated_rfid_stream rs range by
5 sec, slide by 5 sec,
known_tag_list tl WHERE tl.tag_id rs.tag_id
Arbitrate
Smooth
Clean
28
CSAVA Processing
Analyze
CREATE VIEW tag_count AS (SELECT tag_name,
count() FROM validated_tags vt range by
5 min, slide by 1 min GROUP BY
tag_name
Validate
Arbitrate
Smooth
Clean
29
Ongoing Work
  • Bridging the physical-digital divide
  • VICE A Virtual Device Interface
  • Hierarchical query processing
  • Automatic Query planning dissemination
  • Complex event processing
  • Unifying event and data processing

30
Virtual Device (VICE) Layer
The branch of philosophy that deals with the
ultimate nature of reality and existence. (name
due to Shawn Jeffery)
31
The Virtues of VICE
  • A simple RFID Experiment
  • 2 Adjacent Shelves, 8 ft each
  • 10 EPC-tagged items each, plus 5 moved between
    them.
  • RFID antenna on each shelf.

32
Ground Truth
33
Raw RFID Readings
34
After VICE Processing
Under the covers (in this case) Cleaning,
Smoothing, and Arbitration
35
Other VICE Uses
  • Once you have the right abstractions
  • Soft Sensors
  • Quality and lineage streams
  • Pushdown of external validation information
  • Power management and other optimizations
  • Data Archiving
  • Model-based sensing
  • Non-declarative code

36
Hierarchical Query Processing
  • Continuous and Streaming
  • Automatic placement and optimization
  • Hierarchical
  • Temporal granularity vs. geographic scope
  • Sharing of lower-level streams

I provide national monthly values for the US
I provide avg weekly values for California
I provide avg daily values for Berkeley
I provide raw readings for Soda Hall
37
Complex Event Processing
  • Needed for monitoring and actuation
  • Key to prioritization (e.g., of detail data)
  • Exploit duality of data and events
  • Shared Processing
  • Semantic Windows
  • Challenge a single system that simultaneously
    handles events spanning seconds to years.

38
Next Steps
  • Archiving and Detail Data
  • Dealing with transient overloads
  • Rate matching between stored and streaming data
  • Scheduling large archive transfers
  • System design deployment
  • Tools for provisioning and evaluating receptor
    networks
  • System monitoring management
  • Leverage monitoring infrastructure for
    introspection

39
Conclusions
  • Receptors everywhere ? High Fan-In Systems
  • Current middleware solutions are complex
    brittle
  • Uniform declarative framework is the key
  • The HiFi project is exploring this approach
  • Our initial prototype
  • Leveraged TelegraphCQ and TinyDB
  • Demonstrated RFID/multiple sensor integration
  • Validated the HiFi approach
  • We have an ambitious on-going research agenda
  • See http//hifi.cs.berkeley.edu for more info.

40
Acknowledgements
  • Team HiFi
    Shawn Jeffery, Sailesh Krishnamurthy,
    Frederick Reiss, Shariq Rizvi, Eugene Wu, Nathan
    Burkhart, Owen Cooper, Anil Edakkunni
  • Experts in VICE
  • Gustavo Alonso, Wei Hong, Jennifer Widom
  • Funding and/or Reduced-Price Gizmos from NSF,
    Intel, UC MICRO program, and Alien Technologies
Write a Comment
User Comments (0)
About PowerShow.com