RAW Machines: The Need - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

RAW Machines: The Need

Description:

Designers now freely duplicate logic to reduce wire lengths ... Expose the complete details of the underlying hardware architecture to the software system ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 27
Provided by: Har7163
Category:
Tags: raw | machines | need

less

Transcript and Presenter's Notes

Title: RAW Machines: The Need


1
RAW Machines The Need Architecture
  • By
  • Gaurav Bansal
  • Harsh Dhand

2
Technology Trends
  • Clock frequency of processors has risen
    exponentially
  • Fraction of the chip that is reachable by a
    signal in a single clock cycle has decreased
    exponentially
  • Designers now freely duplicate logic to reduce
    wire lengths
  • Bypass network evolution
  • Still exploiting Parallelism remains high priority

3
Application-specific Custom Hardware Systems
  • Fine-grain communication between large numbers of
    replicated processing elements
  • Expose the complete details of the underlying
    hardware architecture to the software system
  • RAW is the result of giving high priority to
    these concepts!!

4
Reconfigurable Architecture Workstation (Raw)
  • Set of replicated tiles
  • RISC like processor
  • configurable logic
  • memory for instructions and data.
  • associated programmable switch

5
Programmable, Integrated Interconnect
  • Inter-tile communication with very low latency
  • Statically scheduled
  • Dynamic support
  • Replacing the bus architecture of superscalar
    processors with a switched interconnect
  • Compile time operations such as register renaming
    and instruction scheduling.

6
Compilation
  • Identifying and partitioning for fine-grained
    ILP.
  • Lower overheads gt finer partitioning
  • Placement one to one mapping from threads to
    Physical tiles
  • Routing Allocating physical network resources.
    Time-space path for inter-tile communication

7
(No Transcript)
8
More Software Support
  • Reducing Dynamic events maximize independence
    from other static parts of program
  • Dynamic routing
  • Software approach reserve channel bandwidth
    between nodes that may communicate
  • Conservative estimation of delivery times

9
Deadlocks
  • Deadlock avoidance and recovery.
  • Memory network restricted usage model that uses
    deadlock avoidance
  • General network is unrestricted and uses
    deadlock recovery.If deadlocks, an interrupt
    routine is activated that uses the memory network
    to recover

10
RAW A Comparative Analysis
  • IWarp and NuMesh RAW
  • building point-to-point networks that support
    static scheduling.
  • Cost of initiating a message in both systems high
    gt compilers could exploit only coarse grain
    parallelism
  • Register like latencies of communication
  • latencies over small instruction sequences
  • Scheduling of multiple tightly coupled
    instruction streams

11
RAW and FPGAs
  • Exposing the low-level hardware details to
    facilitate compiler orchestration
  • RAW Fast Compilation because binds into hardware
    commonly used compute mechanisms such as ALUs,
    registers, memory paths and switching channels
  • Eliminating repeated low-level compilations of
    these macro units

12
RAW VLIWs
  • Large register name space and a distributed
    register file.
  • Multiple memory ports
  • Compiler technology to discover parallelism and
    statically schedule computations.
  • Raw multiple instruction streams.
  • Flexibility to perform independent but statically
    scheduled computations in different tiles

13
Multiprocessors
  • Simple replicated tile and provide distributed
    memory
  • Cost of message startup and synchronization would
    hamper its ability to exploit fine-grain
    instruction-level parallelism.

14
RAW Logic
  • 16 identical, programmable tiles. Each tile
    contains
  • one static communication router
  • two dynamic communication routers
  • eight-stage, in-order, single-issue, MIPS-style
    processor
  • four-stage, pipelined, floating-point unit
  • 32-Kbyte data cache
  • 96 Kbytes of software-managed instruc-tion cache.

15
Application Domains
  • Implementation of a soft-ware Gigabit Internet
    protocol router on a 225- MHz, 16-tile Raw
    processor runs more than fives times faster than
    a hand-tuned implementation on a 700-MHz Pentium
    III processor.
  • Additionally, an implementation of video median
    filter on 128 tiles attained a 57-time speedup
    over a single Raw tile
  • Applications for Raw can be written in a
    high-level language such as C or Java

16
Features
  • Expose gates,wire delay and pins to programmer.
  • More functional units, more flexible efficient
    pin utilization
  • Higher clock frequencies.
  • Scalable stamp out tiles and I/O ports
  • No centralized resources, global buses or
    structures that get larger as tile/pin count
    increases.
  • Wire length, design complexity and verification
    complexity independent of transistor count.

17
Application Mapping
  • RAW OS allows Space and Time Multiplexing of
    processes
  • OS context switching finds a contiguous region
    of tiles corresponding to the dimensions of the
    process resumes the execution of the physical
    threads.
  • Gang scheduling policy physical threads of the
    process likely to communicate with each other.

18
Compute Processor
  • Register mapped processor networks
  • Networks integrated directly into bypass paths of
    processor pipeline.
  • Instruction format bit for 2 output destinations.
    Gives tile option of keeping local copies of
    transmitted value
  • Oldest value in pipeline pulled by FIFO buffers

19
Network Integration Bypass Paths
  • Network Integration networks register mapped and
    integrated into bypass paths.
  • 2D bypass networks serving as bridges between the
    bypass networks of separate tiles
  • Challenge Pipeline operation. Place commit point
    at execute stage.

20
Static Routing
  • Two static networks
  • In-order
  • Flow-controlled
  • Low-latency communication
  • Route preparations pipelined
  • Single cycle per hop latencies route 2 values
    in each direction in a cycle.
  • Total latency 3

21
Static Routing
  • Equal importance to Communication and computation
    Instructions
  • Static routers collectively reconfigure the
    entire communication pattern of the network on a
    cycle-by-cycle basis.
  • Applications with compile-time predictable
    communication.

22
Dynamic Networks
  • The dynamic routers manage the dynamic networks
  • Transport unpredictable operations --interrupts,
    cache misses,etc.
  • Header word (dest,field,len)
  • Latency 2X1Y2 cycles.

23
Dynamic Event Handling Deadlocks
  • Deadlock avoidance
  • Users to limit to proven disciplines
  • Memory network
  • Trusted clients OS, Data cache, interrupts,
    hardware devices, DMA.
  • Deadlock recovery.
  • Requires network to drained to copious memory,
    may fail.
  • If deadlocks, interrupt routine for memory
    network to recover.

24
Conclusion
  • Field Programmable gt Cost effective custom
    hardware
  • Best suited for stream based signal processing
  • Raw does not bind specialized logic structures
    such as register renaming logic, dynamic
    instruction issue logic, or caching into
    hardware.

25
Conclusion
  • Instead, it focuses on keeping each tile small,
    and maximizes the number of tiles it can
    implement on a chip, thereby increasing the
    amount of parallelism it can exploit, and the
    clock speed it can achieve.

26
References
  • Barring it All to Software RAW machines --
    Elliot Waingold, Michael Taylor, Vivek Sarkar et
    al.
  • RAW Microprocessor Computational Fabric for
    Software circuits and General Purpose programs.
    Michael Taylor, Jason Kim, et al.
Write a Comment
User Comments (0)
About PowerShow.com