Title: Real-time Traffic monitoring and containment
1Real-time Traffic monitoring and containment
A. L. Narasimha Reddy Dept. of Electrical
Engineering Texas A M University reddy_at_ee.tamu.
edu http//ee.tamu.edu/reddy/
2Acknowledgements
- Deying Tong, Smitha, Phani Achanta
- Seong Soo Kim
3Outline
- Introduction Motivation
- DOS attacks
- Partial state routers
- DDOS attacks, worms
- Aggregate Packet header data as signals
- Signal/image based anomaly/attack detectors
4Introduction
- UDP-based multimedia traffic increasing
- UDP does not have congestion control
- Applications can be selfish
- If everyone is selfish, network can break down
- Controlling selfish flows desired
- Identify Resource hogs and control them
5Impact of UDP -- Unfairness
- When UDP and TCP compete, UDP wins by pushing TCP
into congestion FloydFall 99
6Unfairness - FIFO
7Unfairness - WRR
8Loss of goodput -FIFO
- Packets dropped later in network
9Loss of goodput -WRR
10UDP -- Summary
- Individual flows need to respond to congestion
- When end-hosts dont respond to congestion
- Need to identify and contain such flows
- Need network mechanisms for such control
11Introduction (contd)
- Many Network attacks
- Exploit Application, Protocol, Network
architecture vulnerabilities - Denial of Service attacks
- Consume all resources
- Leave no resources for legitimate users
12TCP SYN Flooding (contd)
- The attack occurs by the attacker initiating a
TCP connection to the server with a SYN. (using a
legitimate or spoofed source address) - The server replies with a SYN-ACK
- The client then doesnt send back a ACK, causing
the server to allocate memory for the pending
connection and wait. - (If the client spoofed the initial source
address, it will never receive the SYN-ACK)
13TCP SYN Flooding Results
- The half-open connections buffer on the victim
server will eventually fill - The system will be unable to accept any new
incoming connections until the buffer is emptied
out. - There is a timeout associated with a pending
connection, so the half-open connections will
eventually expire. - The attacking system can continue sending
connection requesting new connections faster than
the victim system can expire the pending
connections.
14TCP Three-Way Handshake
Client connecting to a TCP port
Client initiates request
Connection is now half-open
Client connection Established
Server connection Established
15SYN Flood Illustrated
Client SYN Flood
Client spoofs request
half-open
half-open
I have ACKed these connections, but I have not
received an ACK back!
half-open
Queue filled
Queue filled
Queue filled
16Smurf Example
1. Attacker sends ICMP packet with spoofed source
IP Victim?10.1.2.255
2. Attacker sends ICMP packet with spoofed source
IP Victim?192.168.1.255
3. Victim is flooded with ICMP echo responses
4. Victim hangs?
17Distributed Denial of Service Attacks (DDOS)
- Attacker logs into Master and signals slaves to
launch an attack on a specific target address
(victim). - Slaves then respond by initiating TCP, UDP, ICMP
or Smurf attack on victim.
18Network Attacks -- Summary
- Many vulnerabilities exist in Networks
- Malicious traffic increasing
- For fun and profit
- Need mechansims to identify and control malicious
traffic - DOS and DDOS
- DOS, resource hog problem similar
- DDOS requires new approach
19Real-time traffic monitoring
- Attacks motivate us to monitor network traffic
- Potential anomaly/attack detectors
- Potentially contain/throttle them as they happen
- Line speeds are increasing
- Need simple, effective mechanisms
- Attacks constantly changing
- CodeRed yesterday, MyDoom today, what next
20Motivation
- Most current monitoring/policing tools are
tailored to known attacks - Look for packets with port number 1434 (CodeRed)
- Contain Kaaza traffic to 20 of the link
- Become ineffective when traffic patterns or
attacks change - New threats are constantly emerging
21Motivation
- Can we design generic (and generalized)
mechanisms for attack detection and containment? - Can we make them simple enough to implement them
at line speeds?
22Introduction
- Why look for Kaaza packets
- They consume resources
- Consume resources more than we want
- Not much different from DOS flood
- Consumes resources to stage attacks
- Why not monitor resource usage?
- Do not want to rely on attack specific info
23Attacks
- DOS attacks
- Few sources resource hogs
- DDOS attacks, worms
- Many sources
- Individual flows look normal
- Look at the aggregate picture
24DOS attacks Network Flows
- Too many flows to monitor each flow
- Maintain a fixed amount of state/memory
- State not enough to monitor all flows (Partial
state) - Manage the state to monitor high-bandwidth flows
- How?
- Sample packets
- High-BW flows more likely to be selected
- Use a cache and employ LRU type policy
- Traffic driven
- Cache retains frequently arriving flows
25Partial State Approach
- Similar to how caches are employed in computer
memory systems - Exploit locality
- Employ an engineering solution in an
architecture-transparent fashion
26Identifying resource hogs
- Lots of web flows
- Tend to corrupt the cache quickly
- Apply probabilistic admission into cache
- Flow has to arrive often to be included in cache
- Most web flows not admitted
- Works well in identifying high-BW flows
- Can apply resource management techniques to
contain cached/identified flows
27LRU with probabilistic admission
- Employ a modified LRU
- On a miss, flow admitted with probability p
- When p is small, keeps smaller flows out
- High-BW flows more likely admitted
- Allows high-BW flows to be retained in cache
- Nonresponsive flows more likely to stay in cache
28Traffic Driven State Management
- Monitor top 100 flows at any time
- Dont know the identity of these flows
- Dont know how much BW these may consume
29Policy Driven State Management
- An ISP could decide to monitor flows above 1Mbps
- Will need state gt link capacity/1 Mbps
- Could monitor flows consuming more than 1 of
link capacity - For security reasons
- At most 100 flows with 1 BW consumption
30Partial State Trace-driven evaluation
31Partial State Trace-driven Evaluation
32UDP Cache Occupancy
33TCP Cache Occupancy
34Resource Management
35Preferential Dropping
1
drop prob
maxp
minth
maxth
Queue length
drop prob for high bandwidth flows
drop prob for other flows
36Multiple possibilities
- SACRED Monitor flows above certain rate (policy
driven), differential RED, (iwqos99) - LRU-RED Traffic driven state management,
differential RED (Globecom01) - Approximately fair BW distribution
- LRU-FQ Traffic driven state management, fair
queuing (ICC 04) - Contain DOS attacks
- Provide shorter delays for short-term flows
37SACRED
- Sampling And Caching RED
- Maintain flow rate as state for cached flows
- If flow rate gt threshold, drop at higher rate
- Drop rate keeps increasing if flow stays above
threshold - Tends to punish nonresponsive flows, high-BW
flows - If flow rate lt threshold, remove from cache
- Make room for another flow
38SACRED results -10 state
39SACRED cache associativity
40SACRED --Additive
41SACRED TCP only
42LRU-FQ Resource Management
43LRU-FQ flow chart enqueue event
Is Flow in Cache?
Does Cache Have space?
Packet Arrival
No
Admit flow with Probability p
No
Yes
Yes
Is Flow Admitted?
Record flow details Initialize count to 0
Yes
Increment count Move flow to top of cache
No
Is count gt threshold
No
Yes
Enqueue in Normal Queue
Enqueue in Partial state Queue
44Linux IP Packet Forwarding
Local packet Deliver to upper layers
UPPER LAYERS
Route to destination Update Packet
Error checking Verify Destination
IP LAYER
Packet Enqueued
Scheduler invokes Bottom half
Design space
Scheduler runs Device driver
LINK LAYER
Request Scheduler To invoke bottom half
Device Prepares packet
Packet Departure
Packet Arrival
Check Store Packet Enqueue pkt
45Linux Kernel traffic control
- Filters are used to distinguish between different
classes of flows. - Each class of flows can be further categorized
into sub-classes using filters. - Queuing disciplines control how the packets are
enqueued and dequeued
46LRU-FQ Implementation
- LRU component of the scheme is implemented as a
filter. - All parameters threshold, probability and cache
size are passed as parameters to the filter - Fair Queuing employed as a queuing discipline.
- Scheduling based on queues weight.
- Start-time Fair Queuing
47Experimental Setup
48Long-Term flow differentiation
Normal TCP fraction 0.07
Probability 1/25 Cache size 11 threshold
125
49Long-term flow differentiation
Probability 1/25 Cache size 11 threshold
125
50Protecting Web Mice
51Protecting Web mice
Experimental Setup
52Protecting Web Mice
Bandwidth Results
Normal Router
LRU-FQ Router
53Protecting Web Mice
Timing Results
Normal Router
LRU-FQ Router
54Summary of Partial-State
- Sampling and Caching allows simple identification
of resource hogs - Provides a good control of DOS attacks with
limited number of flows - Provides fairer distribution of link BW
- Partial state packet handling cost -not an issue
at 100Mbps/1Gbps. - 1Gbps implemented on Intel Network processor
55Applications of Partial State
- More intelligent control of network traffic
- Accounting and measurement of high bandwidth
flows - Denial of Service (DOS) attack prevention
- Tracing of high bandwidth flows
- QOS routing
56Aggregated packet analysis
57Approach
Anomaly Detection (Thresholding)
Signal Generation Data Filtering (Address
correlation)
Statistical or Signal Analysis (Wavelets or DCT)
Detection Signal
Network Traffic
58Signal Generation
- Traffic volume (bytes or packets)
- Analyzed before
- May not be a great signal when links are always
congested (typical campus access links) - Lot more information in packet headers
- Source address
- Destination address
- Protocol number
- Port numbers
59Signal Generation
- Per packet cost is important driver
- Update a counter for each packet header field
- Too much memory to put in SRAM
- Break the field into multiple 8-bit fields
- 32-bit address into four 8-bit fields
- 1024 locations instead of 232 locations
- In general, 256 (k/8) instead of 2k
- k/8 counter updates instead of 1
60Signal Generation
- What kind of signals can we generate with
addresses, port numbers and protocol numbers?
61Addresses are correlated
- Most of us have habits
- Access same web sites
- Large web sites get significant part of traffic
- Google.com, hp.com, yahoo.com
- Large downloads correlate over time
- ftp, video
- On an aggregate, addresses are correlated
62Address Correlation attacks?
- Address correlation changes when traffic patterns
change abruptly - Denial of service attacks
- Flash crowds
- Worms
- Results in differences in correlation
- High --single attack victim
- Low lots of addresses --worm
63Address correlation signals
- Address correlation
- Simplified Address correlation
64Address Correlation Signals
65Address Correlation Signals
66Signal Analysis
- Capture information over a sampling period
- Of the order of a few seconds to minutes
- Analyze each sample to detect anomalies
- Compare with historical norms
- Post-mortem/Real-time analysis
- May use different amounts of data analysis
- Detailed information of past few samples
- Less detailed information of older samples
67Signal Analysis
- Address correlation as a time series signal
- Employ known techniques to analyze time series
signals - Wavelets one powerful technique
- Allows analysis in both time and frequency domain
- Per-sample analysis has more flexibility
- Not in forwarding path
68Does this work?
69Analysis of address signal
70Image based analysis
- Treat the traffic data as images
- Apply image processing based analysis
- Treat each sample as a frame in a video
- Video compression techniques lead to data
reduction - Scene change analysis leads to anomaly detection
- Motion prediction leads to attack prediction
71Signal Generation
72Two dimensional images
- Horizontal/vertical lines indicate anomalies
- Infected machine contacting multiple destinations
(worm propagation) - Multiple source machines targeting a destination
(DDOS)
73DCT analysis of addresses
74Semi-random attacks
75Random attacks
76Complex attacks
77Better than volume analysis
78Evaluation
- True Positive Rate
- False Alarm Rate or False Positive Rate
- True Negative Rate
- False Negative Rate
- LR true positive rate/ false positive rate
- NLR false negative rate/true ve rate
- Ideally, LR infinity, NLR 0
79Comparison of Scalar signals
80Protocol Composition
- During attack, attack protocol volume will be
higher - Observation of changes can lead to detection
81Protocol Composition
82Address based signals
83Port Number Domain
84Thresholds vs. Detection
85Motion prediction
86End host attacks
- Common solution to several kinds of attacks?
- Do something simple in the network layer
- State maintenance and policing
- Our Key Idea Per Resource regulation
- Hierarchical regulation (per resource, per flow)
also possible - Move regulation away from server into the network
(eg. At firewall)
87QOS Regulation to control network attacks
88End host QOS regulation
- Limit consumption of each resource
- At bastion Host
- Limit resource consumption to a traffic class so
that other classes keep getting service
89End host protection
- Have a uniform picture of resources at the
network layer - We do this at the QOS Regulator
- Resource Aggregates (resource principals)
- Memory, Protocol State Buffers, mbuf / sk_buff
Clusters, Network Bandwidth, CPU Cycles... - Charge incoming traffic to one or more of these
resource aggregates
90End host protection (contd)
- What does Rate Control achieve?
- UDP food regulation
- ICMP flood regulation
- Interrupt / packet processing regulation
- What about TCP SYN? CGI attack?
- Consume Fixed number of resources
- What does Window Control achieve?
- Regulates fixed number of resources
- Need to keep track of resource usage
- TCP SYN data structures, CGI processes, Memory
- Sometimes action required to reset system state
and free resources
91Experimental results
92Results SYN attacks
93Advantages
- Not looking for specific known attacks
- Generic mechanism
- Works in real-time
- Latencies of a few samples
- Simple enough to be implemented inline
94Prototypes
- Linux-PC boxes
- On Intel Network processors
- Can push to Gbps packet forwarding rates
- Forwarding throughput not impacted
- Sampling rates of a few ms possible
95Related Work
- Resource usage monitoring
- Estan Verghese Bloom filters
- Kodialam Lakshman Run detection
- Mahajan et al RED-PD
- Duffield (AT T) Sampling
- Others
96Related Work Worms
- Payload monitoring
- Singh, Savage Verghese, Tang Chen
- Look for matches against constant length payloads
- Sampling, Rabin Signatures
- Prototype implementation
- Detects worms within 5-30 seconds
- Effective with polymorphic worms
97Related Work -- Worms
- Look for TCP Reset signals
- Weaver Paxson
- Random host scan at a specific ports
- Not all hosts open attack port
- Attacking worm will get many Resets
- Too many Resets gt Attacker
- Effective for TCP based attacks
- Can detect/contain in real-time
98Related Work -- Worms
- Quick spreading worms use randomly generated
addresses - Normal users use names, DNS
- Worms dont have DNS activity
- Lots of accesses without DNS requests gt Worms
- Many detectors within a campus
- Local DNS servers
99Related Work -- Worms
- Address honeypots
- Arbor networks, Paxson, CrowCroft
- Configure machines to accept packets for
unassigned addresses - Only worms will contact these machines
- Capture payloads to analyze
- Quickly propagate signatures
100Related Work -- Worms
- IP Traceback Savage et al
- Address spoofing makes origin of attacks
difficult to detect - Tracing, if universal, will limit attacks
- Fear of detection
- Post-attack detection
- Not helpful in mitigating or detection
- Most attack machines are innocent participants
101Related Work host based
- Limit the number of new connections of individual
hosts - TwyCross Williamson (HP)
- Reduces the speed at which a worm can spread
- Can be used to detect worms
- Monitor application execution sequences
- Profiling based indication of anomalous behavior
gt Detect and sandbox worms
102Conclusion
- Real-time resource accounting is feasible
- Real-time traffic monitoring is feasible
- Simple enough to be implemented inline
- Can rely on many tools from signal/image
processing area - More robust offline analysis possible
- Concise for logging and playback
103Thank you !!For more information,http//ee.ta
mu.edu/reddyreddy_at_ee.tamu.edu
104LRU-RED Results
105RTT Bias -TCP flows
106Impact of Cache size
- Effect of varying cache size
- to study impact of cache size on performance of
the scheme - probability 1/55, threshold 125
- number of TCP flows20
- equal weights for both queues.
107Results Cache size
108Normal Workloads
- Performance under normal workloads
- working of scheme when non-responsive loads are
absent or use their fair share of bandwidth - cache size 9, threshold 125
- probability 1/55
109Results Normal workload
110Normal Mixed workload
111Interrupt processing overhead for
server(incoming UDP traffic 100Mbps)
Received UDP Goodput (Kpkts/sec) -gt
QoS Rate Limit on Regulator -gt