A Flexible and Efficient API for a Customizable Proxy Cache - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

A Flexible and Efficient API for a Customizable Proxy Cache

Description:

How free is the CPU? Stratacache Dart-10, with Nokia phone ... Proxy redirects requests/responses to a separate server for modification. Filter-style processes ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 54
Provided by: viveksp
Category:

less

Transcript and Presenter's Notes

Title: A Flexible and Efficient API for a Customizable Proxy Cache


1
A Flexible and Efficient API for a Customizable
Proxy Cache
Vivek S. Pai, Alan L. Cox, Vijay S. Pai, and
Willy Zwaenepoel
iMimic Networking, Inc. http//www.imimic.com
2
Motivation
  • More features moving into proxy caches
  • The ubiquitous layer 7 device
  • Filtering, reporting, CDN support, transformation
  • Lots of this being done one-off, ad hoc
  • Cant know everything at deployment
  • Some approaches for generalization
  • ICAP/OPES, proprietary mechanisms
  • But design considerations shifting
  • Goal new approach for modern environments

3
Contributions
  • Designed event-friendly proxy API
  • Implemented on iMimic DataReactor cache
  • Imposes negligible performance overhead
  • Demo modules
  • High performance
  • Low interference

4
Outline
  • Background
  • API Design
  • API Functions
  • Implementation and Performance
  • Conclusions

5
Proxy Cache Concepts
clients
WAN
proxy cache
LAN
origin servers
6
Why Program a Proxy?
  • Its at the right point in network
  • Sees all client-side and server-side HTTP traffic
  • Can react to both LAN and WAN conditions
  • Already examines layer 7
  • Groundwork in place for value-adds
  • Content filtering, access control, etc.

7
Enabling Technologies
  • Moores Law
  • CPU speeds outstripping all other components
  • Lots of cycles to burn
  • Proxy software
  • Increasing efficiency in managing connections,
    disk storage, etc.
  • Commodity OS/hardware improvements
  • No longer need specialized systems to run
    efficient proxy caches

8
Commodity System Improvements
  • 1997 Appliances 4x faster than software running
    on a 2-processor UltraSparc
  • Source Danzig, NetCache Architecture and
    Deployment

9
Commodity System Improvements
  • 1997 Appliances 4x faster than software running
    on a 2-processor UltraSparc
  • Source Danzig, NetCache Architecture and
    Deployment
  • 1st NLANR cacheoff (April 99) gap only 2.5 x
  • 600 req/sec (Peregrine) vs. 1500 (InfoLibria)

10
Commodity System Improvements
  • 1997 Appliances 4x faster than software running
    on a 2-processor UltraSparc
  • Source Danzig, NetCache Architecture and
    Deployment
  • 1st NLANR cacheoff (April 99) gap only 2.5 x
  • 2nd cacheoff (Jan 00) gap only 1.7x
  • 1450 req/sec (iMimic) vs. 2400 (Compaq)

11
Commodity System Improvements
  • 1997 Appliances 4x faster than software running
    on a 2-processor UltraSparc
  • Source Danzig, NetCache Architecture and
    Deployment
  • 1st NLANR cacheoff (April 99) gap only 2.5 x
  • 2nd cacheoff (Jan 00) gap only 1.7x
  • 3rd cacheoff (Oct 00) gap only 15
  • 2083 req/sec (Microsoft) vs. 2400 (Compaq)

12
Commodity System Improvements
  • 1997 Appliances 4x faster than software running
    on a 2-processor UltraSparc
  • Source Danzig, NetCache Architecture and
    Deployment
  • 1st NLANR cacheoff (April 99) gap only 2.5 x
  • 2nd cacheoff (Jan 00) gap only 1.7x
  • 3rd cacheoff (Oct 00) gap only 10
  • 4th cacheoff (Dec 01) commodity system best
  • Performance record 2700 req/sec (Cintel/iMimic)

13
How free is the CPU?
  • Stratacache Dart-10, with Nokia phone
  • 120 req/sec (7 Mbps) with 300 MHz CPU
  • CPU mostly idle performance disk-limited

14
Outline
  • Background
  • API Design
  • API Functions
  • Implementation and Performance
  • Conclusions

15
Previous Customization Approaches
  • Write your own proxy or modify Squid
  • Huge code, changes likely to conflict with
    updates
  • ICAP TCP-based offload
  • Proxy redirects requests/responses to a separate
    server for modification
  • Filter-style processes
  • Plugins where proxy designers anticipated a need
    (e.g., content filtering)
  • Kernel modules
  • Difficult programming model, but needed for
    kernel-integrated proxies

16
Reasons for a New Approach
  • Scalability needed to gt 10,000 flows
  • Filter processes may not scale
  • Limitations of ICAP-style offloading
  • Offloading small requests adds latency
  • Need for separate ICAP server with own CPU
  • Programmers want flexibility
  • Program in C using standard OS and libraries
  • Avoid problems from later code conflicts

17
Design of the Proxy API
  • Event-aware
  • Modules notified as requests/responses arrive
  • Maps well to implementation of modern proxies
  • HTTP-Complete
  • Capture all key interactions in HTTP
    request-response protocol for full flexibility
  • Support various programming models
  • Events, threads, processes
  • Communication via function call or socket

18
HTTP Data Flows
Cache Misses
Requests
Server
Client
Proxy Cache
New Content
Responses
Cached Content
Cache Hits
Storage System
19
HTTP Data Flows and the API
Server
modify
modify
Client
Proxy Cache
modify
modify
modify
Storage System
20
HTTP Request-Response Structure
Requested URL Request header line 1 Request
header line 2 ... Request header line N ltblank
terminating linegt
Header block special first line followed
by more detail about request/response
Optional request body" used in POST requests for
forms, etc.
Body data
21
Design of API Notifications
  • typedef struct DR_FuncPtrs
  • DR_InitFunc dfp_init // on module
    load
  • DR_ReconfigureFunc dfp_reconfig // on config
    change
  • DR_FiniFunc dfp_fini // on module
    unload
  • DR_ReqHeaderFunc dfp_reqHeader // when req
    hdr done
  • DR_ReqBodyFunc dfp_reqBody // on each
    piece of req body
  • DR_ReqOutFunc dfp_reqOut // before
    req to remote srv
  • DR_DNSResolvFunc dfp_dnsResolv // when DNS
    resolution needed
  • DR_RespHeaderFunc dfp_respHeader // when resp
    hdr done
  • DR_RespBodyFunc dfp_respBody // on each
    piece of resp body
  • DR_RespReturnFunc dfp_respReturn // when resp
    returned to clt
  • DR_TransferLogFunc dfp_logging // log entry
    after req done
  • DR_OpaqueFreeFunc dfp_opaqueFree // when each
    resp completes
  • DR_TimerFunc dfp_timer // periodic
    maintenance
  • int dfp_timerFreq // timer
    period (sec)

22
Outline
  • Background
  • API Design
  • API Functions
  • Implementation and Performance
  • Conclusions

23
API Functions
  • Content Adaptation
  • Content Management
  • Customized Administration
  • Utility Functions

24
Content Adaptation
  • Functions to allow modules to inspect and modify
    requests and replies through cache

Server
modify
modify
Client
Proxy Cache
modify
modify
modify
Storage System
25
Content Adaptation (contd)
  • Example uses
  • Integration into a CDN based on URL rewriting
  • Transcoding for mobile devices
  • Special features of cache integration
  • Store modified content
  • Return multiple versions using HTTP Vary header

26
Content Management
  • Fine-grained control over cacheability
  • Content-freshness modification/eviction
  • Content preloading
  • Content querying
  • Example uses
  • News CDN needs new home page on major event
  • Premium services

27
Customized Administration
  • Notifications on logging
  • Example uses
  • Aggregation at network operation centers
  • Detection of high error rates indicates bad links

28
Utility Functions
  • Interfaces to underlying OS event-notification
  • Module may register or clear interest on FD
    events
  • API will automatically call back module
  • Independent of underlying OS mechanisms (e.g.,
    poll, select, /dev/poll, kevent)
  • Configuration options processing

29
Outline
  • Background
  • API Design
  • API Functions
  • Implementation and Performance
  • Conclusions

30
Implementation in DataReactor
  • Commercial proxy server
  • Portable (x86, Alpha, Sparc), and
  • (FreeBSD, Linux, Solaris)
  • Fast (exposes overheads)
  • Independently measured at Proxy Cache-Offs (alone
    or via OEMs)
  • Support requires lt 1000 lines of code
  • Implementation lt 6 person-months

31
Sample Modules
  • Ad Remover
  • Matches ad patterns in Hostname, URI
  • Dynamic Compressor
  • Uses zlib to compress, store, serve object
  • Image Transcoder
  • Color stripping via NetPBM ijpeg helpers
  • Text Injector
  • Finds ltheadgt tag, asks helper what to insert
  • Content Manager
  • Local telnet, then query, fetch, inject, evict
    objects
  • ICAP client
  • Implements ICAP 1.0 draft to use external server

32
Web Surfing Now
33
Web Surfing Without Ads
34
Sample Module Implementation
35
Measurement
  • Polygraph and PolyMix-3, Measurement Factory
  • De facto standard for proxy testing
  • Scales with load
  • Number of clients
  • Number of servers
  • Data set size
  • Working set size
  • Very long test time
  • Fill phase (14 hours)
  • Test phase (10 hours)

36
PolyGraph Test Phases
Fill Phase
1st Load Phase
2nd Load Phase
0 5 10 15 20
25 30 Time (hours)
37
PolyGraph Hit Rates
Cacheable
Offered
Actual
38
Our Test Environment
  • Proxy - 1.4GHz Athlon, 2GB memory
  • 5 SCSI disks, GigE, FreeBSD
  • Harness
  • 10 Polygraph client/server machines
  • Target load 1450 reqs/sec
  • 16000 simultaneous connections
  • Pmix-3 Modified Polymix-3
  • Single fill phase for all tests
  • Load phase time cut in half
  • Slight increase in hit rate

39
API Performance
40
Module Performance
41
Outline
  • Background
  • API Design
  • API Functions
  • Implementation and Performance
  • Conclusions

42
Summary
  • CPUs getting more idle
  • Commodity OS suitable choices
  • High-concurrency servers needed
  • Customizable, efficient event-friendly API
  • Implemented with low overhead
  • Sample results, deployments promising

43
Ongoing Work
  • CoDeeN a CDN system on PlanetLab
  • Uses a customized version of DataReactor
  • Being built at Princeton
  • Prototype 1 week reading 1 week reading
  • Currently 42 nodes (one per site)
  • Lessons
  • API easy enough for busy grad students
  • Logging infrastructure would be nice
  • Want to mask non-HTTP failures

44
Questions?
vivek_at_imimic.com iMimic Networking,
Inc. http//www.imimic.com/
45
Cacheoff-3 Hit Times
46
Cacheoff-3 Miss Times
47
Cacheoff-3 Improvements
48
Cacheoff-3 Price/Performance
49
CacheOff-3 Results
50
CacheOff-3 Results
51
Cacheoff-4 Hit Times
52
Cacheoff-4 Miss Times
53
CacheOff-4 Results
Write a Comment
User Comments (0)
About PowerShow.com