Data Mining for Sensor Networks - Opportunities and Challenges - PowerPoint PPT Presentation

About This Presentation
Title:

Data Mining for Sensor Networks - Opportunities and Challenges

Description:

Data Mining for Sensor Networks Opportunities and Challenges – PowerPoint PPT presentation

Number of Views:1157
Avg rating:3.0/5.0
Slides: 38
Provided by: vipin3
Category:

less

Transcript and Presenter's Notes

Title: Data Mining for Sensor Networks - Opportunities and Challenges


1
Data Mining for Sensor Networks - Opportunities
and Challenges
Vipin Kumar University of Minnesota kumar_at_cs.umn
.edu www.cs.umn.edu/kumar Research funded by
NSF, ARL, NASA, ARDA, and DHS
2
What is Data Mining?
  • Many Definitions
  • Non-trivial extraction of implicit, previously
    unknown and potentially useful information from
    data
  • Exploration analysis, by automatic or
    semi-automatic means, of large quantities of data
    in order to discover meaningful patterns

3
Why Mine Data? Commercial Viewpoint
  • Lots of data is being collected and warehoused
  • Web data, e-commerce
  • purchases at department/grocery stores
  • Bank/Credit Card transactions
  • Computers have become cheaper and more powerful
  • Competitive Pressure is Strong
  • Provide better, customized services for an edge
    (e.g. in Customer Relationship Management)

4
Why Mine Data? Scientific Viewpoint
  • Data collected and stored at enormous speeds
    (GB/hour)
  • Remote sensors on a satellite
  • Telescopes scanning the skies
  • Microarrays generating gene expression data
  • Scientific simulations generating terabytes of
    data
  • Data mining may help scientists
  • In classifying and segmenting data
  • In hypothesis formation

5
Why mine data? Sensor Networks Viewpoint
  • Potentially massive streams of sensor data
  • Data mining offers the hope of real-time delivery
    of actionable knowledge

Interactive VR
Game
Wearable
Disaster Recovery
Environmental Monitoring
Computing
Earth Science
Space Exploration
Context-Aware
Computing
Immerse
Sensor Networks
Environments
Biological
Monitoring
Hazard
Detection
Smart
Environment
RFID-based systems
Traffic Monitoring
Mobile Data Stream Mining
Courtesy Tian He, Hillol Kargupta, Shashi
Shekhar and CENS/UCLA
6
Multi-Robot Teams and Sensor Networks
  • Nikos Papanikolopoulos, University of Minnesota
  • Goal A sensor network of distributed robots
    with various mobility and sensory modes for
    exploration of structures in an urban scenario.
  • Some Applications
  • Emergency response
  • Law enforcement
  • Monitoring/surveillance
  • NASA space exploration programs

7
Gateway Change Detection Project
  • Robert Grossman, University of Illinois, Chicago
  • Goal Monitor, integrate, analyze, detect
    changes, and send alerts for data streaming from
    a distributed highway sensor system.

System detects changes from baselines in
real-time and distributes them as alerts.
  • 830 traffic sensors, 170,000 new sensor readings
    per day
  • also image, text, and semi-structured data
    (about 1 TB)

Video Feeds unstructured
Sensor Data structured
8
Health Monitoring
  • Real-time Health Monitoring
  • Smart shirts that collect many attributes in
    real-time
  • Health Monitoring for fire fighters for safety
    evaluation
  • Assisted Living

Objective Improve the health assessment of
elders/disabled
Monitor sleep quality of elders Monitor gait,
falls, and other movement disorders Reduce risks
and improve the efficiency of caregivers
Subject position
Courtesy Tian He Hillol Kargupta
http//www.cs.virginia.edu/wsn/medical/
9
Structure monitoring
Objective Understand Interaction between ground
motions and structure/foundation response.
N. Xu, S. Rangwala, K. Chintalapudi, D. Ganesan,
A. Broad, R. Govindan, and D. Estrin, "A wireless
sensor network for structural monitoring,"
SenSys, 2004.
Courtesy Tian He
10
MineFleet A Vehicle Data Stream Management and
Mining Software System
  • On-board Module
  • Continuous data streams from the vehicle data bus
  • Onboard data stream mining
  • Communicates with a remote control station
  • Privacy management
  • Central control station
  • Data Management
  • Data mining
  • Communicates with the on-board modules over
    wireless networks
  • Privacy management

Hillol Kargupta, UMBC/AGNIK, LLC
Courtesy Hillol Kargupta
  • Applications
  • Vehicle Health Monitoring and Maintenance
  • Fuel Consumption Analysis
  • Driver Behavior Monitoring

11
Origins of Data Mining
  • Draws ideas from machine learning/AI, pattern
    recognition, statistics, and database systems
  • Traditional techniques may be unsuitable due to
    data that is
  • Large-scale
  • High dimensional
  • Heterogeneous
  • Complex
  • Distributed

12
Data Mining Tasks
Data
Clustering
Predictive Modeling
Anomaly Detection
Association Rules
Milk
13
Predictive Modeling
  • Find a model for class attribute as a function
    of the values of other attributes

Model for predicting credit worthiness
Class
  • Applications
  • Targeted Marketing
  • Customer Attrition/Churn
  • Predicting damage in complex structures by using
    sensor values for mode shapes and frequencies

14
Clustering
  • Finding groupings in data such that objects
    within the same cluster are more similar to each
    other than objects in different clusters
  • Applications
  • Market Segmentation
  • Gene expression clustering
  • Document Clustering
  • Finding groups of driver behaviors based upon
    patterns of automobile motions (normal, drunken,
    sleepy, rush hour driving, etc)

Courtesy Michael Eisen
15
Association Rule Discovery
  • Given a set of records, find dependency rules
    which will predict occurrence of an item based on
    occurrences of other items in the record
  • Applications
  • Marketing and Sales Promotion
  • Supermarket shelf management
  • Traffic pattern analysis (e.g., rules such as
    "high congestion on Intersection 58 implies high
    accident rates for left turning traffic")

Rules Discovered Milk --gt Coke (s0.6,
c0.75) Diaper, Milk --gt Beer
(s0.4, c0.67)
16
Deviation/Anomaly Detection
  • Detect significant deviations from normal
    behavior
  • Applications
  • Credit Card Fraud Detection
  • Network Intrusion Detection
  • Identify anomalous behavior from sensor networks
    for monitoring and surveillance.

17
Data Mining Challenges General
  • Scale
  • High-dimensional
  • Heterogeneous
  • Spatio-temporal
  • Privacy
  • Streaming
  • Distributed

18
Specific Issues and Challenges in Data Mining for
Sensor Networks
  • Spatio-temporal
  • Streaming
  • Distributed
  • Privacy
  • Security
  • Uncertainty/noise/missing values
  • Low bandwidth communication
  • Resource/power considerations
  • Online feature extraction/mining

Courtesy Hillol Kargupta
Courtesy Robert Grossman
Courtesy Nikos Papanikolopoulos
19
Discovery of Climate Patterns from Global Data
Sets
Questions that can be answered using
spatio-temporal data mining
  • General Questions
  • How is the global climate changing?
  • What are the consequences of changes in the
    climate?
  • How well can we predict future climate changes?
  • Illustrative Specific Questions
  • How is the frequency and intensity of ecosystem
    disturbanceon land related to variability in
    surface climate?
  • How is land surface precipitation and temperature
    affected by ocean temperature?
  • How is the frequency and extent of wildfires
    related to variability in surface climate
    (precipitation, temperature, and wind speed)?
  • Data sources
  • Sensors
  • Ground-based remote atmospheric measurements
  • In-situ measurements from sensors in coastal
    waters and atmosphere
  • Weather observation stations
  • High-resolution EOS satellites
  • Model-based data from forecast and other models
  • Data sets created by data fusion

Earth Observing System
20
Detection of Ecosystem Disturbances
Detection of sudden changes in greenness over
extensive areas from these large global satellite
data sets allowed Earth Science researchers to
gain a deeper insight into the interplay among
natural disasters, human activities and the rise
of carbon dioxide in Earth's atmosphere during
two recent decades.
Release 03-51AR          NASA DATA MINING
REVEALS A NEW HISTORY OF NATURAL DISASTERS NASA
is using satellite data to paint a detailed
global picture of the interplay among natural
disasters, human activities and the rise of
carbon dioxide in the Earth's atmosphere during
the past 20 years.
http//www.nasa.gov/centers/ames/news/releases/200
3/03_51AR.html
21
Climate Indices Connecting the Ocean/Atmosphere
and the Land
  • A climate index is a time series of sea surface
    temperature or sea level pressure
  • Climate indices capture teleconnections
  • The simultaneous variation in climate and related
    processes over widely separated points on the
    Earth

El Nino Events
Nino 12 Index
22
Discovery of Climate Indices Using Clustering
A novel clustering technique was developed to
identify regions of uniform behavior in
spatio-temporal data. The use of clustering for
discovering climate indices is driven by the
intuition that a climate phenomenon is expected
to involve a significant region of the ocean or
atmosphere where the behavior is relatively
uniform over the entire area. A cluster-based
approach for discovering climate indices provides
better physical interpretation than those based
on the SVD/EOF paradigm, and provide candidate
indices with better predictive power than known
indices for some land areas. Some SST clusters
reproduce well-known climate indices. In
particular, we were able to replicate the four El
Nino SST-based indices cluster 94 corresponds to
NINO 12, 67 to NINO 3, 78 to NINO 3.4, and 75 to
NINO 4. The correlations of these clusters to
their corresponding indices are higher than
0.9. Some SST clusters, e.g., cluster 29, are
significantly different than known indices, but
provide better correlation with land climate
variables than known indices for many parts of
the globe. The bottom figure shows the
difference in correlation to land temperature
between cluster 29 and the El Nino indices. Areas
in yellow indicate where cluster 29 has higher
correlation.
23
Moving Clusters in Space and Time
  • Most well-known indices based on data collected
    at fixed land stations.
  • NAO computed as the normalized difference between
    SLP at a pair of land stations in the Arctic and
    the subtropical Atlantic regions of the North
    Atlantic Ocean

24
Moving Clusters in Space and Time (contd.)
  • However, underlying phenomenon may not occur at
    exact location of the land station. e.g. NAO
  • Challenge Given sensor readings for SLP at
    different points in the ocean, how to identify
    clusters of low/high pressure points that may
    move with space and time.

25
Spatio-temporal Associations in Climate Data
Ref Tan et al 2001
FPAR-Hi gt NPP-Hi (sup5.9, conf55.7)
Grassland/Shrubland areas
Association rule is interesting because it
appears mainly in regions with grassland/shrubland
vegetation type
26
Data Mining for Cyber Intrusion Detection
Incidents Reported to Computer Emergency Response
Team/Coordination Center (CERT/CC)
  • Due to the proliferation of Internet, more and
    more organizations are becoming vulnerable to
    cyber attacks
  • Sophistication of cyber attacks as well as their
    severity is also increasing
  • Cyber strategies can be a major force multiplier
    and equalizer
  • Security mechanisms always have inevitable
    vulnerabilities
  • Firewalls are not sufficient to ensure security
    in computer networks
  • Insider attacks
  • Traditional signature-based intrusion detection
    systems (IDSs) (e.g. SNORT) cannot detect
    emerging cyber threats
  • Data Mining can alleviate this limitation

Example of SNORT rule (MS-SQL Slammer worm) any
-gt udp port 1434 (content"81 F1 03 01 04 9B 81
F1 01" content"sock" content"send")
www.snort.org
27
MINDS Minnesota INtrusion Detection System
  • Data mining based anomaly detection system
  • Incorporated into Interrogator architecture at
    ARL Center for Intrusion Monitoring and
    Protection (CIMP),
  • Helps analyze data from multiple sensors at DoD
    sites around the country
  • MINDS anomalies are used as the primary key when
    viewing related alerts from other tools (SNORT,
    Jids, etc.)
  • MINDS is the first effective anomaly intrusion
    detection system used by ARL
  • Routinely detects attacks and intrusive behavior
    not detected by widely used intrusion detection
    systems
  • Insider Abuse / Policy Violations / Worms / Scans

28
Typical MINDS Output
  • UMN computer connecting to a remote FTP server,
    running on port 5002
  • Summarized TCP reset packets received from
    64.156.X.74, which is a victim of DoS attack, and
    we were observing backscatter, i.e. replies to
    spoofed packets
  • Summarization of FTP scan from a computer in
    Columbia, 200.75.X.2
  • Summary of IDENT lookups, where a remote computer
    tries to get user name
  • Summarization of a USENET server transferring a
    large amount of data

29
NSF Press Release
Just because an event occurs rarely doesn't mean
it won't have dramatic impacts. Consider heart
attacks, power blackouts, credit card frauds or
computer virus infections. Vipin Kumar and
colleagues at the University of Minnesota are
developing data-mining techniques to detect rare
events, such as computer break-ins, that are
difficult to detect using traditional methods
that recognize attacks only through pre-defined
patterns. The new techniques have been
incorporated in the Minnesota Intrusion Detection
System (MINDS) software, which helps
cybersecurity analysts detect computer break-ins
and other undesirable activity in real-world
networks, potentially while the break-in is
underway. "MINDS allows cybersecurity experts to
quickly analyze massive amounts of network
traffic," Kumar said. "They only need to evaluate
the most anomalous connections identified by the
system." The data-mining research on rare event
analysis is supported by a 300,000 award from
the National Science Foundation. MINDS is
currently being used to monitor over 40,000
computers at the University of Minnesota. In
addition, it is an integral part of the Army's
Interrogator architecture, used at the Army
Research Laboratory's Center for Intrusion
Monitoring and Protection to analyze network
traffic from Defense Department sites around the
country. MINDS routinely detects novel
intrusions, policy violations and insider abuse
that are missed by other widely used
tools. Detecting computer intrusions is only the
first application for the Minnesota team's new
data-mining methods. The underlying techniques
could be applied to many areas beyond
cybersecurity, such as detecting financial or
health-care fraud.
http//www.nsf.gov/discoveries/disc_summ.jsp?cntn_
id100488
30
Correlation of suspicious events across network
sites
  • Needed to detect sophisticated attacks not
    identifiable by single site analyses
  • Distributed correlation algorithms
  • Grids middleware

Data Mining Middleware for Grids NSF/ITR funded
project jointly with B. Grossman, S. Ranka, and
J. Weissman
How to detect a distributed network attack?
31
Map of the Global IP Space
32
Attack Traffic on Port 445
Destination IPs of suspicious connections within
the 3 class B networks at the U of M
Source IPs of suspicious connections in the
global IP space
7982 unique sources, 6184 unique destinations,
9930 total flows involved Failed connections
O Successful connections
33
Spatio-temporal Data Mining on Network Zombies
  • Spatial Attack Distribution of IPs on the Same
    Day (Left) IPs attacking the UFL network on
    12/09/04 (712 scanners). (Middle) IPs attacking
    the UMN network on 12/09/04 (14,938 scanners).
    (Right) Intersection of the IPs attacking UFL and
    UMN (201 scanners).
  • Challenge Given distribution of attackers at
    many different sensors (i.e., Internet sites),
    how to find attack patterns in space and time.

34
Resources
  • Workshop Proceedings
  • Data Mining and Wireless Sensor Networks
    (DM-WSN'06), IEEE International Conference on
    Data Mining, ICDM'06, Hong Kong, December 18-22,
    2006.
  • 2nd Workshop on Geosensor Networks (GSN2.0),
    Boston, MA, Oct 1-3 2006.
  • Data Mining in Sensor Networks, SIAM Data Mining
    Conference, April 23, 2005. http//www.public.asu.
    edu/huanliu/dmml_presentation/sdm-Sensor-Networks
    .pdf
  • ISSNIP 2004 Workshop on Machine Learning for
    Signal Processing
  • Data Mining in Resource Constrained Environments,
    SIAM Data Mining Conference (SDM 2004).
  • 1st Geo Sensor Networks Workshop, Portland, ME,
    Oct 9-11 2003.
  • Pervasive, Distributed, and Stream Data Mining, a
    session at the National Science Foundation
    Workshop on Next Generation Data Mining
    (NGDM'02).
  • PKDD Workshop on Ubiquitous Data Mining for
    Mobile and Distributed Environments, 2001.
    http//www.cs.umbc.edu/hillol/pkdd2001/udm.html
  • IJCAI-01 Workshop on Knowledge Discovery From
    Distributed, Dynamic, Autonomous, Heterogeneous
    Data and Knowledge Sources, August 6, 2001,
    Seattle, WA

35
Resources (contd.)
  • Journal Special Issues
  • Special Issue on Distributed Sensing for Quality
    and Productivity Improvement, IEEE Transactions
    on Automation Science and Engineering, July 2006.
  • Special Issue on Distributed and Mobile Data
    Mining, IEEE Transactions on System, Man,
    Cybernetics, Part B, December 2004.
  • Special Isssue on Signal Processing for Mining
    Information, IEEE Signal Processing Magazine, May
    2004
  • Special Issue on Sensor Network Technology
    Infrastructure, Security, Data processing, and
    Deployment, SIGMOD record, March 2004.
  • Special Section on Sensor Network Technology and
    Sensor Data Management, SIGMOD Record, December
    2003.

36
Resources (contd.)
  • Overview Articles
  • Pang-Ning Tan, Knowledge Discovery from Sensor
    Data, Sensors Magazine, March (2006).
  • Secure sensor information management and mining,
    B Thuraisingham, IEEE Signal Processing Magazine,
    May 2004.
  • A survey on sensor networks, IF Akyildiz, W Su, Y
    Sankarasubramaniam, E Cayirci, IEEE
    Communications Magazine, August 2002.
  • Bibliography
  • Distributed Data Mining Bibliography (by Hillol
    Kargupta) http//www.cs.umbc.edu/hillol/DDMBIB/

37
Resources (contd.)
  • Books
  • P-N Tan, M. Steinbach, and V. Kumar, Introduction
    to Data Mining, Addison-Wesley, 2005.
  • K. Chakrabarty and S.S. Iyengar. Scalable
    Infrastructure for Distributed Sensor Networks,
    Springer, 2005.
  • Hillol Kargupta and Philip Chan. Advances in
    Distributed and Parallel Knowledge Discovery,
    xv--xxvi, MIT/AAAI Press, 2000.
Write a Comment
User Comments (0)
About PowerShow.com