Title: Creating Dynamic Social Network Models from Sensor Data
1Creating Dynamic Social Network Models from
Sensor Data
- Tanzeem ChoudhuryIntel Research / Affiliate
Faculty CSE - Dieter Fox
- Henry KautzCSE
- James KittsSociology
2- What are we doing?
- Why are we doing it?
- How are we doing it?
3Social Network Analysis
- Work across the social physical sciences is
increasingly studying the structure of human
interaction - 1967 Stanley Milgram 6 degrees of separation
- 1973 Mark Granovetter strength of weak ties
- 1977 International Network for Social Network
Analysis - 1992 Ronald Burt structural holes the social
structure of competition - 1998 Watts Strogatz small world graphs
4Social Networks
- Social networks are naturally represented and
analyzed as graphs
5Example Network Properties
- Degree of a node
- Eigenvector centrality
- global importance of a node
- Average clustering coefficient
- degree to which graph decomposes into cliques
- Structural holes
- opportunities for gain by bridging disconnected
subgraphs
6Applications
- Many practical applications
- Business discovering organizational bottlenecks
- Health modeling spread of communicable diseases
- Architecture urban planning designing spaces
that support human interaction - Education understanding impact of peer group on
educational advancement - Much recent theory on finding random graph models
that fit empirical data
7The Data Problem
- Traditionally data comes from manual surveys of
peoples recollections - Very hard to gather
- Questionable accuracy
- Few published data sets
- Almost no longitudinal (dynamic) data
- 1990s social network studies based on
electronic communication
8Social Network Analysis of Email
9Limits of E-Data
- Email data is cheap and accurate, but misses
- Face-to-face speech the vast majority of human
interaction, especially complex communication - The physical context of communication useless
for studying the relationship between environment
and interaction
- Can we gather data on face to face
communication automatically?
10Research Goal
- Demonstrate that we can
- Model social network dynamics by gathering large
amounts of rich face-to-face interaction data
automatically - using wearable sensors
- combined with statistical machine learning
techniques - Find simple and robust measures derived from
sensor data - that are indicative of peoples roles and
relationships - that capture the connections between physical
environment and network dynamics
11Questions we want to investigate
- Changes in social networks over time
- How do interaction patterns dynamically relate to
structural position in the network? - Why do people sharing relationships tend to be
similar? - Can one predict formation or break-up of
communities? - Effect of location on social networks
- What are the spatio-temporal distributions of
interactions? - How do locations serve as hubs and bridges?
- Can we predict the popularity of a particular
location?
12Other Applications of such Data
- Research on emotional content of speech
- Need for natural data
- Medical applications
- Speaking rate is an indicator of mental activity
- Overly-rapid speech symptom of mania
- Aspergers syndrome abnormal conversational
dynamics - Meeting understanding
- Interruptions indicate status dominance
13Support
- Human and Social Dynamics one of five new
priority areas for NSF - 800K award to UW / Intel / Georgia Tech team
- Intel at no-cost
- Intel Research donating hardware and internships
- Leveraging work on sensors localization from
other NSF DARPA projects
14Procedure
- Test group
- 32 first-year incoming CSE graduate students
- Units worn 5 working days each month
- Collect data over one year
- Units record
- Wi-Fi signal strength, to determine location
- Audio features adequate to determine when
conversation is occurring - Subjects answer short monthly survey
- Selective ground truth on of interactions
- Research interests
- All data stored securely
- Indexed by code number assigned to each subject
15Privacy
- UW Human Subjects Division approved procedures
after 6 months of review and revisions - Major concern was privacy, addressed by
- Procedure for recording audio features without
recording conversational content - Procedures for handling data afterwards
16Data Collection
- Intel Multi-Modal Sensor Board
Coded Database
codeidentifier
audiofeatures
Real-time audio feature extraction
WiFistrength
17Recording Units
18Data Collection
- Multi-sensor board sends sensor data stream to
iPAQ - iPAQ computes audio features and WiFi node
identifiers and signal strength - iPAQ writes audio and WiFi features to SD card
- Each day, subject uploads data using his or her
code number to the coded data base
19Speech Detection
- From the audio signal, we want to extract
features that can be used to determine - Speech segments
- Number of different participants (but not
identity of participants) - Turn-taking style
- Rate of conversation (fast versus slow speech)
- But the features must not allow the audio to be
reconstructed!
20Speech Production
The source-filter Model
Fundamental frequency (F0/pitch) and formant
frequencies (F1, F2 ) are the most important
components for speech synthesis
21Speech Production
- Voiced sounds Fundamental frequency (i.e.
harmonic structure) and energy in lower frequency
component - Un-voiced sounds No fundamental frequency and
energy focused in higher frequencies - Our approach Detect speech by reliably detecting
voiced regions - We do not extract or store any formant
information. At least three formants are required
to produce intelligible speech
1. Donovan, R. (1996). Trainable Speech
Synthesis. PhD Thesis. Cambridge University 2.
OSaughnessy, D. (1987). Speech Communication
Human and Machine, Addison-Wesley.
22Goal Reliably Detect Voiced Chunks in Audio
Stream
23Speech Features Computed
- Spectral entropy
- Relative spectral entropy
- Total energy
- Energy below 2kHz (low frequencies)
- Autocorrelation peak values and number of peaks
- High order MEL frequency cepstral coefficients
24Features used Autocorrelation
(a)
(b)
Autocorrelation of (a) un-voiced frame and (b)
voiced frame. Voiced chunks have higher
non-initial autocorrelation peak and fewer number
of peaks
25Features used Spectral Entropy
FFT magnitude of (a) un-voiced frame and (b)
voiced frame. Voiced chunks have lower entropy
than un-voiced chunks, because voiced chunks have
more structure
26Features used Energy
Energy in voiced chunks is concentrated in the
lower frequencies
Higher order MEL cepstral coefficients contain
pitch (F0) information. The lower order
coefficients are NOT stored
27Segmenting Speech Regions
28Multi-Person Conversation Model
Group State Gt Who is holding the floor (main
speaker) 1-N instrumented subjects N1
silence N2 any unmiked speaker
29Multi-Person Conversation Model
Individual State Mit True if subject i is
speaking P(MG) set so as to disfavor people
talking simultaneously U true if unmiked subject
speaking
30Multi-Person Conversation Model
Voicing States Vit True if sound from mike i is a
human voice P(Vit Mit) 1 P(Vit not Mit)
0.5 AVt is logical OR of voicing nodes
31Multi-Person Conversation Model
Observations Oit Acoustic features from mike i
that are useful for detecting speech P(OV) is a
3D Gaussian with covariance matrix, learned from
speaker-independent data
32Multi-Person Conversation Model
Energy Ei,jt 2D variable containing log energies
of mikes i and j Associates voiced regions with
speaker If i talks at t, then energy of mike i
should be higher than mike j
33Determining Miked Speaker
34Multi-Person Conversation Model
Entropy Het Entropy of the log energy
distribution across all N microphones When an
unmiked subject speaks, entropy across
microphones will be low
35Determining Unmiked Speaker
36Results
37Results
38Analyzing Results of DBN Inference
- Compute of conversations between subjects
- Create weighted graph
- Visualize with multi-dimensional scaling
39Modeling Influence
- Goal model influence of subject j on subject is
conversational style - Formally
- P(Si,t Si,t-1) self transition probability
(probability of continuing to speak or remain
silent) - Question for a particular conversation, how much
of P(Si,t Si,t-1, Sj,t-1) is explained
byP(Sj,t Sj,t-1)? - Create mixed-memory Markov chain model, infer
parameters
40Influence
41GISTS
- Inferring what a conversation is about (gist)
- Apply speech recognition
- Use OpenMind commonsense knowledge database to
associate words with classes of events (buying
lunch) - Use simple Naïve Bayes bag of words to infer
gist and select key words - Improve by conditioning on location
42Example
43Next Step Locations
- Wi-Fi signal strength can be used to determine
the approximate location of each speech event - 5 meter accuracy
- Location computation done off-line
- Raw locations are converted to nodes in a coarse
topological map before further analysis
44Topological Location Map
- Nodes in map are identified by area types
- Hallway
- Breakout area
- Meeting room
- Faculty office
- Student office
- Detected conversations are associated with their
area type
45Goal Social Network Model
- Goal Dynamic Social Network Model
- People, Places, Conversations, Time
- Nodes
- Subjects (wearing sensors, have given consent)
- Places (e.g., particular break out area)
- Instances of conversations
- Edges
- Between subjects and conversations
- Between places and conversations
- Replicate over data collection sessions (as in a
DBN) - Compute influences between sessions E.g., if A-B
and B-C are strong a t, then A-C is likely to be
strong at t1