Continuous Online Extraction of HTTP Traces from Packet Traces Anja Feldmann anjaresearch'att'com AT - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Continuous Online Extraction of HTTP Traces from Packet Traces Anja Feldmann anjaresearch'att'com AT

Description:

Software Design Goals. Continuous traces. Avoid unnecessary I/O. High ... Hardware Design: AT&T Labs PacketScope. Measured Link. 10-GB RAID. 140-GB. Tape Loader ... – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 17
Provided by: albertgg
Category:

less

Transcript and Presenter's Notes

Title: Continuous Online Extraction of HTTP Traces from Packet Traces Anja Feldmann anjaresearch'att'com AT


1
Continuous Online Extraction of HTTP Traces from
Packet TracesAnja Feldmann
anja_at_research.att.com ATT Labs-Research
Florham Park, NJ
2
Monitoring User Accesses to the Web
  • Via modified Web Browsers
  • Through Web server logs
  • Through Web proxy logs
  • From the wire via packet monitoring
  • Passive monitoring (oblivious to users)
  • No impact on network performance
  • Capture TCP and HTTP events
  • Potential to compute statistics about or collect
    downloaded page
  • Related work IBM, Berkeley, Virginia Tech

3
Web Page Access Details
client
server
time
4
From User Requests to Packets
5
Challenges of HTTP trace extraction
  • Packets cannot be processed in isolation
  • End systems cannot be throttled
  • End systems may not comply with TCP/HTTP spec.
  • Arbitrary fragments of Web pages and headers per
    packet
  • HTTP header may be spread between 10 packets
  • Retransmitted data may be fragmented differently
  • Use of TCP connections by HTTP
  • TCP connections may be terminated at any point
  • HTTP GET requests may contain data
  • Demultiplexing of pkts to HTTP transaction
  • Packet sniffer may lose packets (incl. TCP
    connection packets)
  • Sanity checks on extracted information can fail
  • Inaccurate content length

6
Software Design Goals
  • Continuous traces
  • Avoid unnecessary I/O
  • High speed transmission medium
  • Software needs to be robust toward packet losses
  • Deployable anywhere in the network
  • Handle asymmetric routing
  • Implications
  • Memory resident Software and Data
  • Priority towards packet sniffing over packet
    processing
  • TCP connections cannot be used as demultiplexing
    unit
  • Offline matching of HTTP requests and responses

7
Hardware Design ATT Labs PacketScope
Router / Terminal Server
Out of band Communication
10-GB RAID
500-MHz Alpha Workstation
140-GB Tape Loader
Measured Link
8
Software Separation of Tasks
  • Packet sniffing
  • Tcpdump (with IP address encryption)
  • Output files of 10,000,000 packets
  • Control script
  • Perl script
  • Action takes files from pkt sniffing and runs
    header extraction
  • HTTP header extraction
  • C-code (based on Tcpdump)
  • Output Log files with HTTP and TCP events,
    packet header files
  • HTTP header matching
  • C-code
  • Output Log files with matching TCP/HTTP
    request/response

9
Control Flow
Packet Sniffing Control Script HTTP header
extraction
10
HTTP Header Extraction Basic Steps
  • Reconstruction of packet sequence
  • Demux packets according to Flows
  • Reorder packets according to TCP sequence numbers
  • Eliminate duplicate packets
  • Identify missing packets
  • Extraction of information
  • Extract TCP and HTTP timestamp information
  • Extract HTTP header info and body parts
  • Summarize data part, e.g., length, sequence
    number
  • Discard HTTP data part

11
HTTP Header Extraction (cont.)
Data structure per flow list of packet
HTTP Log File
Extract HTTP
Cleanup
Runs periodically to age flows
12
HTTP Header Extraction
  • Data structure
  • Indexed by unidirectional IP-flows (for
    demultiplexing packets)
  • Per flow list of packets and partial extracted
    HTTP information
  • Extraction
  • Execute basic steps for a list of packets
  • Cleanup
  • Triggers execution of Extraction
  • if more than a fixed number of packets have been
    received
  • and after a fixed timeout
  • Runs after processing a fixed number of packets

13
Trace Environment ATT Labs PacketScope
Network
  • Trace data
  • TCP protocol events
  • HTTP protocol events
  • HTTP response, request headers
  • URL / response codes / content length ..
  • Data length

14
HTTP Header Extraction (cont.)
Data structure per flow list of packet
HTTP Log File
Extract HTTP
15
HTTP Header Extraction (cont.)
Data structure per flow list of packet
HTTP Log File
Extract HTTP
16
HTTP Header Extraction (cont.)
Data structure per flow list of packet
HTTP Log File
Extract HTTP
Cleanup
Runs periodically to age flows
Write a Comment
User Comments (0)
About PowerShow.com