HighBandwidth Packet Switching on the Raw GeneralPurpose Architecture - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

HighBandwidth Packet Switching on the Raw GeneralPurpose Architecture

Description:

WiFi. 802.111. Range: approx. 25-125 meters1 ... WiFi Mesh Transmission ... WiFi. Once limit is surpassed each additional Kb is $0.0052 ... – PowerPoint PPT presentation

Number of Views:73
Avg rating:3.0/5.0
Slides: 42
Provided by: glebch
Category:

less

Transcript and Presenter's Notes

Title: HighBandwidth Packet Switching on the Raw GeneralPurpose Architecture


1
High-Bandwidth Packet Switching on the Raw
General-Purpose Architecture
  • Gleb Chuvpilo
  • Saman Amarasinghe
  • MIT LCS Computer Architecture Group
  • September 19, 2002

2
Talk at a Glance
  • Motivation
  • Architecture of Internet Routers
  • Raw Processor Overview
  • Raw Router Architecture
  • Switch Fabric Design
  • Distributed Scheduling Algorithm
  • Results and Analysis
  • Future Work and Conclusion

3
We are on
  • Motivation
  • Architecture of Internet Routers
  • Raw Processor Overview
  • Raw Router Architecture
  • Switch Fabric Design
  • Distributed Scheduling Algorithm
  • Results and Analysis
  • Future Work and Conclusion

4
Motivation
  • Build a fast IP router on a general-purpose
    architecture
  • Why?
  • Flexibility ? new protocols and services
  • Price ? economies of scale

5
We are on
  • Motivation
  • Architecture of Internet Routers
  • Raw Processor Overview
  • Raw Router Architecture
  • Switch Fabric Design
  • Distributed Scheduling Algorithm
  • Results and Analysis
  • Future Work and Conclusion

6
Architecture of Internet Routers
7
Switch Fabric
8
Click Modular Router
9
We are on
  • Motivation
  • Architecture of Internet Routers
  • Raw Processor Overview
  • Raw Router Architecture
  • Switch Fabric Design
  • Distributed Scheduling Algorithm
  • Results and Analysis
  • Future Work and Conclusion

10
Raw Processor Overview
  • 16 MIPS-like tiles on a single die
  • 2 Megabytes of SRAM on-chip
  • Over a thousand signal I/O pins
  • Over 200 Gbps of external chip bandwidth
  • Scalable to thousands of tiles!

11
Raw Layout
12
Raw Communication Mechanisms
  • Two static networks
  • Two dynamic networks

13
Raw Static Networks
  • Destinations known at compile time
  • Message size known at compile time
  • Cycle-by-cycle switch schedule
  • Three-cycle nearest neighbor send-to-use
    latency
  • No processing overhead

14
Static Network Send
15
Static Network Receive
16
Raw Dynamic Networks
  • Unpredictable events
  • External asynchronous interrupts
  • Cache misses
  • 15- to 30-cycle nearest neighbor send-to-use
    latency (message header processing overhead)

17
Raw is Good for Streaming
18
We are on
  • Motivation
  • Architecture of Internet Routers
  • Raw Processor Overview
  • Raw Router Architecture
  • Switch Fabric Design
  • Distributed Scheduling Algorithm
  • Results and Analysis
  • Future Work and Conclusion

19
Given Four Networks
20
and Sixteen Tiles
21
Problem Mapping?
StaticInterconnect
Dynamic Communication
22
Solution Rotating Crossbar
Out 0
Out 1
In 0
In 1
In 3
In 2
Out 3
Out 2
23
We are on
  • Motivation
  • Architecture of Internet Routers
  • Raw Processor Overview
  • Raw Router Architecture
  • Switch Fabric Design
  • Distributed Scheduling Algorithm
  • Results and Analysis
  • Future Work and Conclusion

24
Rotating Crossbar Highlights
  • The idea of a Token Ring network ? absolute
    fairness
  • Algorithm uses two static networks, dynamic
    networks are idle
  • All deadlock-free configurations are scheduled
    at compile time
  • Four headers and token location define a global
    configuration
  • Global configuration is computed in a distributed
    manner at run time

25
Rotating Crossbar Illustrated
26
Rotating Crossbar Illustrated
27
Phases of the Algorithm
TILE PROCESSOR
SWITCH PROCESSOR
headers_request
headers
send_prev_config
choose_new_config
route_body
confirm
update_token
28
We are on
  • Motivation
  • Architecture of Internet Routers
  • Raw Processor Overview
  • Raw Router Architecture
  • Switch Fabric Design
  • Distributed Scheduling Algorithm
  • Results and Analysis
  • Future Work and Conclusion

29
Configuration Space
  • Lets enumerate the number of configurations
  • SPACE Hdr0 x x Hdr3 x Token,
  • where Hdr0 Hdr3 5,
  • and Token 4 ?
  • therefore
  • SPACE 54 x 4 2,500 distinct configurations

30
So What?...
  • Each tile has 8,192 words of instruction memory,
    same for switch ?
  • ? 8,192/2,500 3.3 instructions per
    configuration ? not enough! ? need to use
    off-chip memory ? slow! ?
  • ? need to minimize SPACE

31
Minimization
out
cwnext
in
ccwprev
cwprev
ccwnext
32
Clients and Servers of a Crossbar Processor
33
Outcome of Minimization
  • We cut down the number of configurations by 78
    times! Now there are only 32 entries! ?
  • ? the program can fit in the local instruction
    memory!

34
We are on
  • Motivation
  • Architecture of Internet Routers
  • Raw Processor Overview
  • Raw Router Architecture
  • Switch Fabric Design
  • Distributed Scheduling Algorithm
  • Results and Analysis
  • Future Work and Conclusion

35
Implementation
  • Raw Router was tested in a cycle-accurate
    simulator of the Raw processor
  • Raw prototype clock speed is assumed to be 250
    MHz
  • The focus of research is on switch fabric, NOT on
    route lookup, etc.

36
Peak Throughput
37
Average Throughput
38
We are on
  • Motivation
  • Architecture of Internet Routers
  • Raw Processor Overview
  • Raw Router Architecture
  • Switch Fabric Design
  • Distributed Scheduling Algorithm
  • Results and Analysis
  • Future Work and Conclusion

39
Future Work
  • Take advantage of dynamic networks
  • Implement IP route lookup
  • Add computation on data (encryption)
  • Add support of multicast traffic
  • Implement Quality of Service
  • Add virtual output queueing
  • Explore larger router configurations

40
Conclusion
  • Implemented a gigabit switch on Raw
  • Mapped dynamic communication to static
    interconnect
  • Can intermix switch fabric with computation
  • High-bandwidth I/O allows performance of custom
    ASIC processors

41
Questions?
Write a Comment
User Comments (0)
About PowerShow.com