On the Session Structure Of Network Applications - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

On the Session Structure Of Network Applications

Description:

'Internet traffic is far weirder than any network researcher can ever imagine' ... Collate causally linked connections into sessions ... – PowerPoint PPT presentation

Number of Views:19
Avg rating:3.0/5.0
Slides: 21
Provided by: anon54
Category:

less

Transcript and Presenter's Notes

Title: On the Session Structure Of Network Applications


1
On the Session Structure Of Network Applications
  • Jayanthkumar Kannan, UC Berkeley
  • Jaeyeon Jung, MIT
  • Vern Paxson, ICSI, LBL
  • Can Emre Koksal, EPFL

2
Motivation
  • Internet traffic is far weirder than any network
    researcher can ever imagine
  • Paxson, 99 Why understanding anything about
    the Internet is painfully hard
  • Many characteristics of traffic have heavy-tailed
    distributions
  • Leads to operational difficulty
  • Several hundred Internet applications in use
    (especially in enterprise and universities)
  • Hard for administrators to identify the
    applications in use
  • May use it for specifying policies
  • Makes measurement and analysis harder
  • Need complex statistical models to capture
    network behavior

3
Our work
  • Goal
  • Characterize the connection-level behavior of
    applications
  • Session
  • Set of connections initiated/received by an
    application in response to an user event
  • Example FTP session consists of a FTP control
    connection followed by multiple data connections
  • Problem definition
  • Input Connection-level traces at a firewall
  • Infer the session structure of common
    applications with minimal human input

4
The Big Picture
  • Two main pieces
  • Identifying application sessions.
  • Eg (ftp, ftp-data), (ftp,ftp-data,ftp-data,ftp-da
    ta)
  • Inferring typical sessions structure
  • Eg FTP (ftp) (ftp-data)
  • Application Identify anomalous activity
  • Eg new apps, misconfigurations, malicious attacks

Connections
Sessions
Inferring Session Structure
Identifying Sessions
Session Description for Common Apps
Host-Specific Training
Identify Anomalous Activity
5
Outline
  • Session Identification
  • Inferring Session Structure
  • Preliminary Results
  • Application Identifying Anomalous Activity
  • Conclusion

Sessions
Connections
Inferring Session Structure
Identifying Sessions
Session Description for Common Apps
Host-Specific Training
Identify Anomalous Activity
6
Session Identification
  • Purpose
  • Given a stream of connections, parse it into
    sessions
  • Observations
  • The connections in a session are causally related
  • Such connections tend to occur close to each
    other
  • Devised a statistical test
  • Identifies pairs of causally linked connections
  • Collate causally linked connections into sessions
  • Builds a base model of what is normal, and flags
    deviations

7
Session Identification Base Model
  • Maintain a model for the arrival times of
    connections
  • Categorize connections by type and maintain rate
    estimates for the arrival of each type
  • Eg Rate of (ltftp) 10 per hr
  • Empirically known fact
  • This arrival model is roughly stationary Poisson
    over duration of a hour
  • Arrival process of unrelated types of connections
    is union of independent Poisson processes
  • Avoids the usual stumbling block in modeling
    studies

8
Session Identification Deviations
  • Consider connections of two types T1, T2
  • Define PR1,R2,x as probability that two
    independent processes of rates R1,R2 have an
    arrival within time x
  • Test If PR1,R2,x lt T, declare C1, C2 in same
    session
  • For poisson arrivals, PR1,R2,x R1 R2 x
  • Tunable parameters Threshold T, Timeout values
  • False positives per unit time is at most T
  • False negatives Experimentally evaluated
  • Note that we will see several instances of same
    application

C1
(ltftp)
Rate R1
Rate R2
(gthttp)
C2
9
Outline
  • Session Identification
  • Inferring Session Structure
  • Preliminary Results
  • Application Identifying Anomalous Activity
  • Conclusion

Sessions
Connections
Inferring Session Structure
Identifying Sessions
Session Description for Common Apps
Host-Specific Training
Identify Anomalous Activity
10
Inferring Session Structure Representation
  • What language do you use to represent an
    application session?
  • Our decision Regular Expressions
  • Divined based on experience with applications
  • Eg FTP (ltftp) ((gtftp-data)(ltftp-data))
  • Problem
  • Given various types of observed app sessions,
    output a regular expression that captures typical
    behavior of that app

11
Inferring Session Structure Framework
  • Naive approach
  • Build a regexp that exactly matches the list of
    observed sessions
  • Eg (ltftp, gtftp-data), (ltftp, ltftp-data,
    ltftp-data)
  • Challenges
  • Such regular expressions are extremely
    complicated
  • Need to generalize beyond the trace since all
    types of sessions may not be observed
  • Need to deal with false positives of session
    identification
  • Our approach
  • Build a exact DFA E based on trace
  • Design generalization rules and provide a set of
    generalized DFAs with associated FP, FN
  • Human makes final decision based on simplicity of
    DFA, FP, FN

12
Inferring Session Structure Generalization
  • Prefix Rule
  • If session S is observed in trace, assume all
    prefixes of S are legal app sessions
  • Counting Rule
  • If (a bm), (a bn) occur in trace, assume (a b)
    are legal application sessions
  • Pruning Rule
  • Given a DFA, retain states and edges that are
    required to match k of the trace (kcoverage)
  • 2 other rules
  • Invert Direction, Dynamic Port

13
Outline
  • Session Identification
  • Inferring Session Structure
  • Preliminary Results
  • Application Identifying Anomalous Activity
  • Conclusion

Sessions
Connections
Inferring Session Structure
Identifying Sessions
Session Description for Common Apps
Host-Specific Training
Identify Anomalous Activity
14
Results Setup
  • Test data
  • Connection-level traces from Feb 05
  • LBL (7879 hosts, three million connections a day)
  • ICSI (272 hosts, hundred thousand connections a
    day)
  • Parameter choice
  • Threshold for statistical test T 0.01
  • Timeout values
  • Coverage for Pruning k 99
  • Identified over 28 applications at LBL
  • Heuristics designed by verifying that output over
    trace matches published protocol information
  • Ongoing Validate heuristics over fresh traces

15
Results
16
Outline
  • Session Identification
  • Inferring Session Structure
  • Preliminary Results
  • Application Identifying Anomalous Activity
  • Conclusion

Sessions
Connections
Inferring Session Structure
Identifying Sessions
Session Description for Common Apps
Host-Specific Training
Identify Anomalous Activity
17
Application Identifying Anomalies
  • Approach
  • Run causality detection test to find causally
    related connections
  • Identify causal links that do not correspond to
    list of inferred session structure
  • Use some training to avoid reporting unusual
    sessions seen frequently at a single host

18
Identifying Anomalies Results
  • Identified several activities of interest to
    admins
  • About 10 alarms a day
  • Uncommon applications
  • Unauthorized peer-to-peer apps (Ares P2P
    application on non-standard ports)
  • Spam relays (compromised machines being used to
    send spam)
  • Unauthorized web proxies (some hosts being used
    as proxy to get Yahoo pages)
  • Identified several port-scanners
  • Weird connection patterns 3 confirmed attacks,
    others possibly attacks

19
Conclusion
  • There is value to understanding the
    connection-level behavior of applications
  • Of interest to network operators
  • Of interest to the measurement community
  • Can characterize this behavior
  • From light-weight information available at a
    firewall
  • Minimal human intervention

20
Results(2)
Write a Comment
User Comments (0)
About PowerShow.com