Title: DEMS: A Data Mining Based Technique to Handle Missing Data in Mobile Sensor Network Applications
1DEMS A Data Mining Based Technique to Handle
Missing Data in Mobile Sensor Network
Applications
- Le Gruenwald Md. Shiblee Sadik Rahul Shukla
Hanqing Yang - School of Computer Science
- University of Oklahoma
- Norman, Oklahoma, USA
- ggruenwald_at_ou.edu
2Outline
- Research objective
- Current approaches
- The proposed approach DEMS
- Performance Evaluation
- Conclusions and future work
3Mobile Sensor Networks
- A typical mobile sensor network
- Sensor nodes are provided with motion
capabilities - Sensor nodes can relocate themselves
- Sensor nodes may move continuously/randomly
- Sensor nodes may move periodically to make up for
lost/missing sensors - Sensor nodes send data to a base station.
4Missing Sensor Data
- Missing sensor data sensor readings that fail
to reach the base station or are corrupted when
reaching the base station - Reasons for missing sensor data
- Power shortage (sensor nodes are battery-powered)
- Mal-functioning of sensor nodes (hardware
failure) - Networking issues
- Connection failures
- Data package collision
- etc.
5Research Objective
- Goal Develop an effective algorithm to estimate
missing sensors readings in a mobile sensor
network application.
6Research Issues
- Issues common with static sensor networks
- Infiniteness, fast arrival rate, concept drifts
- Additional issues due to mobility of mobile
sensors - Spatial relations
- The spatial relation between two sensors
readings is distorted by the mobility of mobile
sensors - Temporal relations
- The history data of a mobile sensor that are
generated at different locations may not
necessarily possess the temporal relationships
with the data in the current round of sensor
readings - Frequent power failure
- Power outage is more common in mobile sensor
network compared to static sensor network because
mobility requires excessive power.
7Current Approaches
Statistics based techniques
Fig 1. A taxonomy of techniques for handling
missing data
8The Proposed Approach DEMS
- DEMS Data Estimation for Mobile Sensors
- Based on two important concepts
- Virtual Static Sensor (VSS)
- A fictitious static sensor which mimics a real
static sensor - helps reconstruct the spatial and temporal
relations among the sensors readings - Association Rule Mining
- A popular method of discovering relationships
among different items - helps explore the relationships among sensors
readings.
9DEMS Components
- DEMS has three major components
- Mapping Real Mobile Sensor (RMS) to Virtual
Static Sensor (VSS) - Divides the entire area of coverage into small
hexagons - A hexagon the coverage area of a VSS with VSS
being at the center of the hexagon - Converts RMS readings into VSS readings
- Association rule mining
- Constructs a novel data structure called
MASTER-tree to capture the association rules
among VSSs - Updates MASTER-trees to capture the most recent
association rules among VSSs - Data estimation
- Uses the most recent association rules to
estimate a missing VSS reading - Uses the estimated value of the missing VSS
reading as the value of the missing RMS reading.
10DEMS Mapping RSS to VSS
- What is VSS?
- A VSS is a fictitious static sensor
- A VSS reading is based on one or more RMSs
readings - A VSS has a unique identifier and has a unique
area of coverage - Why do we need VSS?
- Each VSS has a fixed location hence the spatial
relations among VSSs readings can be obtained - Each VSS reading is generated from a fixed
location hence history readings might have
strong temporal relations with the current
reading.
11DEMS Mapping RSS to VSS (Cont.)
- How to construct a VSS?
- We divide the entire monitoring area into small
hexagons - A virtual static sensor is the center of a
hexagon - Each hexagon is a coverage area of a virtual
static sensor.
12DEMS Mapping RMS to VSS
- Goal map RMSs readings to VSSs readings so
that spatial and temporal relations among the
sensor readings can be restored. - Two types of mapping
- Mapping of a non-missing RMS to VSS
- Mapping of a missing RMS to VSS
13DEMS Mapping of a non-missing RMS to VSS
- If a VSS contains one RMS within its coverage
area, the RMSs reading is used as the VSS
reading - If a VSS contains more than one RMSs, the average
of the RMSs readings is used as the VSS reading - If a VSS contains no RMS, the VSS is called
inactive.
14DEMS Mapping of a missing RMS to VSS
- Why mapping of a missing RMS is difficult?
- RMS location is the key to RMS to VSS mapping
- If a RMS is missing, it is very likely that its
data and location would be missing together - Hence mapping of a missing RMS to VSS requires
intelligence - The solution
- A missing RMS is mapped to a VSS using a
trajectory mining approach for location
prediction Morzy, 2007.
15DEMS Mapping of a missing RMS to VSS (cont.)
- What is a trajectory?
- A trajectory is the sequence of hexagons that a
mobile sensor traverses - If a mobile sensor is not missing, it reports its
location and the location is contained by one
hexagon - Hence the sequence of hexagons is called a
trajectory.
(V14,V9,V11,V4,V3,V10) is the trajectory of M1
16DEMS Mapping of a missing RMS to VSS (cont.)
- Each RMS has a trajectory
- DEMS periodically stores the trajectories
(collected from all RMSs) into a frequency
pattern tree - Frequency pattern tree
- It has a root labeled null
- Each node consists of an ID (hexagon ID) and
count (number of times it appears in the
trajectories)
Example 5 trajectories 1. (V14, V9, V11, V4,
V2, V8, V1) 2. (V14, V9, V11, V4, V3, V10, V1) 3.
(V14, V9, V5, V4, V3, V10, V8) 4. (V14, V9, V11,
V4, V3, V10, V1, V8) 5. (V2, V3, V6, V10, V8, V1)
Fig 5. A frequency pattern tree
17DEMS Mapping of a missing RMS to VSS (cont.)
- If a RMS is missing, it is mapped to a VSS from
the frequency pattern tree and its own trajectory - Consider the last known trajectory of M1
(V14,V9,V11,V4)
V3 Predicted next hexagon in trajectory of M1
Fig 7. Frequency pattern tree
18DEMS Mapping RMS to VSS (cont.)
Procedure mapReal2Virtual(RealSensorData listRSData, VirtualSensorData listVSData) Procedure mapReal2Virtual(RealSensorData listRSData, VirtualSensorData listVSData)
1 for each real sensor rs
2 if(rs is not missing)
3 location ? listRSData(rs).Location
4 vs ? findVirtualSensor(location)
5 listVSData(vs).addReading(listRSData(rs).Reading)
6 else
7 location ? predictLocation(rs)
8 vs ? findVirtualSensor(location)
9 listVSData(vs).status?missing
10 end loop
11 for each virtual static sensor vs
12 if(listVSData(vs) has data)
13 listVSData(vs).status?active
14 listVSData(vs).reading?average(listVSData(vs).Readings)
15 else
16 if(listVSData(vs).status is not missing)
17 listVSData(vs).status ?inactive
18 end loop
end procedure end procedure
Fig 8. Mapping algorithm
19DEMS Components
- DEMS has three major components
- Real Mobile Sensor (RMS) to Virtual Static Sensor
(VSS) - Divides the entire area of coverage into small
hexagons, - Each hexagon is the coverage area of a virtual
static sensor where the virtual static sensor is
assumed to be sitting in the middle of the
hexagon, - Converts RMS readings into VSS readings.
- Association rule mining
- Constructs a novel data structure called
MASTER-tree to capture the association rules
among VSSs - Updates MASTER-trees to capture the most recent
association rules among VSSs. - Data estimation
- Uses the most recent association rules to
estimate a missing VSS reading, - Uses the missing VSS reading as missing RMS
reading.
20DEMS Association Rule Mining
- Goal mine and represent the potential
association rules among the VSS readings. - We propose a novel data structure (called
MASTER-tree) to mine and represent the
association rules among VSS readings - MASTER-tree basics
- A MASTER-tree is capable of mining any kind of
association rules among any number of VSSs - A MASTER-tree represents potential association
rules among the VSS readings - A path in MASTER-tree represents a potential
association rule.
Fig 8. A MASTER-tree
21DEMS Association Rule Mining (cont.)
- The potential number of association rules among
VSSs grows exponentially with the number of VSSs - To restrict the number of association rules, DEMS
clusters the VSSs into small groups and
constructs one MASTER-tree for each group - DEMS uses Agglomerative clustering
- Agglomerative clustering starts with every VSS as
an individual cluster - At each step it merges two closest clusters based
on their pair-wise distances into one if the
total number of VSSs in the new cluster does not
exceed a user-defined maximum number of VSSs in
one cluster.
22DEMS Association Rule Mining (cont.)
Details
23DEMS The MASTER-tree Projection Module (cont.)
24DEMS Association Rule Mining (cont.)
5
V21, 10, V11, 10 ? V311, 20 Support
40 Confidence 100
V211, 20, V311, 20 ? V11, 20 Support
60 Confidence 66
(2, 31, 485, 7657, 121937)
Fig 11. MASTER-tree with count
25DEMS Association Rule Mining (cont.)
- Let
- The minimum support 50
- The minimum confidence 50
- A typical association rule becomes
- V211, 20, V311, 20 ? V11, 20
- The rule meaning if the VSS reading of V2 is
within 10 to 20 and the VSS reading for V3 is
within 10 to 20, the VSS reading for V1 is most
likely within 0 to 20. - There exists a path from the root node to V11,
20 via V211, 20 and V311, 20 in the
Master-tree.
26DEMS Components
- DEMS composed of three major components
- Real Mobile Sensor (RMS) to Virtual Static Sensor
(VSS) - Divides the entire area of coverage into small
hexagons, - Each hexagon is the coverage area of a virtual
static sensor where the virtual static sensor is
assumed to be sitting in the middle of the
hexagon, - Converts RMS readings into VSS readings.
- Association rule mining
- Construct a novel data structure called
MASTER-tree to capture the association rules
among VSSs, - Update MASTER-trees to capture most recent
association rules among VSSs. - Data estimation
- Uses the most recent association rules to
estimate a missing VSS reading - Uses the estimated value of the missing VSS
reading as the estimated value of the missing RMS
reading.
27DEMS Data Estimation
- Goal estimate the missing VSS reading.
- The data estimation modules estimates the missing
VSS - The estimated reading for the missing VSS is used
as the estimated reading for the missing RMS.
28DEMS Data Estimation (cont.)
Fig 12. Flowchart of the data estimation module
(A step by step example)
29Performance Evaluation
- Simulation Model
- We simulate the missing data for our datasets
- A sensor is missing randomly (approximately
5-10) for a consecutive random number (10 - 20)
of rounds - Data and location both are missing for a missing
sensor - We use DEMS, TinyDB, SPIRIT and Average method to
estimate missing readings - TinyDB
- An average based technique which estimates the
missing data by taking the average of the
readings from other sensor readings in the
current round. - SPIRIT
- An auto-regression based technique which
estimates the missing data based on the readings
in the previous rounds - Average
- The average of other sensor readings is used as
the estimated reading - We compare the techniques based on mean absolute
error (MAE) - MAE Sestimation error/number of estimations.
30Performance Evaluation (cont.)
- Datasets
- DAPPLE Project Dataset A real life dataset
- The carbon monoxide (CO) readings in the range
0, 6 were collected over a period of two weeks
around Marylebone Road in London - The mobile sensors monitoring the atmospheric CO
level are attached to PDAs which store these
readings - We chose Thursday, 20th May 2004, when three
sensors were simultaneously recording for about
32 minutes, resulting in 600 rounds (after
disregarding the missing rounds) of CO readings - Factory Floor Temperature Dataset A synthetic
dataset - A simulation of a mobile sensor network for
monitoring factory floor temperatures - Machines are placed on a floor
- Some machines are turned on for a number of
rounds the temperatures on these machines reach
a high constant temperature and heat disperse on
the floor. - 100 mobile sensors were roaming around in random
directions to monitor the factory floor and
report the temperature readings in the range 0,
100C from different locations. - The mobile sensor readings were sampled once per
hour the total rounds of readings are 5000 from
100 mobile sensors.
31Performance Evaluation (cont.)
Fig 13. Impacts of number of rounds on MAE for
DAPPLE project dataset
Table 1. Average MAE for DAPPLE project dataset
Approach Average MAE
DEMS 0
Average 1.2717
TinyDB 0.6331
SPIRIT 0.9437
32Performance Evaluation (cont.)
Fig 14. Impacts of number of rounds on MAE for
factory floor dataset
Table 2. Average MAE for factory floor dataset
Approach Average MAE
DEMS 2.2538
Average 14.778
TinyDB 6.9621
SPIRIT 4.7472
33Conclusions and Future Work
- We proposed DEMS
- A novel data estimation technique for mobile
sensor networks based on data mining and virtual
static sensor concepts - Estimates missing sensor data with high accuracy
- Future work Extend DEMS to include
- Multiple base stations
- De-synchronized mobile sensor networks
- Cluster sensor networks.
34Thanks
35MASTER-tree Construction
Back
36MASTER-tree projection and data estimation An
example
37Assume
- Three Node (A, B, C)
- One dimension of Data (Temperature)
- Upper bound 30 lower bound 0, cell size 10
- dis(A,B) 4, dis(A,C) 3 and dis(B,C) 5
- MCSS 10
- minSup 25
- minConf 75
C
A
B
38Pattern trees
Ø
Ø
Ø
A
C
B
C
B
A
A
B
C
C
B
B
B
A
A
B
C
C
B
A
C
Pattern tree for A
Pattern tree for B
Pattern tree for C
Ø
C
A
B
A
C
B
A
B
C
C
A
C
B
A
B
Final MASTER tree without GS
39Data Sequence
A 4 14 11 18 6 8
B 8 18 15 22 10 12
C 7 17 14 21 9 ?
40Ø
B
A
C
A
B
C
B
C
A
A
B
C
A 4 14 11 18 6 8
B 8 18 15 22 10 12
C 7 17 14 21 9 ?
41Ø
B
A
C
A
B
C
A
C
B
B
C
A
B
C
A
A
B
C
C
B
A
A 4 14 11 18 6 8
B 8 18 15 22 10 12
C 7 17 14 21 9 ?
42Ø
B
A
C
A
B
C
A
C
B
B
C
A
B
C
A
A
B
C
C
B
A
A 4 14 11 18 6 8
B 8 18 15 22 10 12
C 7 17 14 21 9 ?
43Ø
B
A
C
A
B
C
A
C
B
B
C
A
A
C
B
B
C
A
A
A
B
C
C
B
A
C
B
A
A 4 14 11 18 6 8
B 8 18 15 22 10 12
C 7 17 14 21 9 ?
44Ø
B
A
C
A
B
C
A
C
B
B
C
A
A
C
B
B
C
A
A
A
B
C
C
B
A
C
B
A
C
A
A 4 14 11 18 6 8
B 8 18 15 22 10 12
C 7 17 14 21 9 ?
45Ø
B
A
C
A
B
C
A
C
B
B
C
A
A
C
B
B
C
A
A
A
B
C
C
B
A
C
B
A
C
A
A 4 14 11 18 6 8
B 8 18 15 22 10 12
C 7 17 14 21 9 ?
46MCSS 10
Ø
Rule Ø ?C 0, 29 Supp 100 Conf 100
B
A
2
2
1
A
B
C
A
C
B
B
C
A
A
C
B
B
C
A
A
A
B
C
C
B
A
C
B
A
C
A
A 4 14 11 18 6 8
B 8 18 15 22 10 12
C 7 17 14 21 9 ?
47MCSS 10
Ø
Rule Ø ?C 0, 19 Supp 80 Conf 80
B
A
2
2
1
A
B
C
A
C
B
B
C
A
A
C
B
B
C
A
A
A
B
C
C
B
A
C
B
A
C
A
A 4 14 11 18 6 8
B 8 18 15 22 10 12
C 7 17 14 21 9 ?
48MCSS 10
Back to presentation
Ø
Rule Ø ?C 0, 29 Supp 80 Conf 80
B
2
3
2
2
1
Rule A ?C 0, 9 Supp 40 Conf 100
A
B
C
A
C
B
B
2
A
A
C
B
B
C
A
A
A
B
C
C
B
A
C
B
A
C
A
A 4 14 11 18 6 8
B 8 18 15 22 10 12
C 7 17 14 21 9 ?