The Peregrine HighPerformance RPC System - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

The Peregrine HighPerformance RPC System

Description:

In Peregrine, the kernel is responsible for: ... The Peregrine implementation utilizes: ... Peregrine RPC performance for single-packet network RPCs (microseconds) ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 21
Provided by: kdi4
Category:

less

Transcript and Presenter's Notes

Title: The Peregrine HighPerformance RPC System


1
The PeregrineHigh-Performance RPC System
  • David B. Johnson and
  • Willy Zwaenepoel
  • Department of Computer Science
  • Rice University
  • Presented By Khaled Elmeleegy
  • Assisted by Moez Abdel-gawad

2
Overview
  • The Peregrine is an RPC system.
  • It tries to optimize the RPC.
  • The paper relies on experimental results.

3
Key optimizations
  • No intermediate copies of arguments or results.
  • No data conversion between client and server
    (unless needed).
  • Preallocated and precomputed header templates for
    transmitted packets.

4
Key optimizations (continued)
  • No thread-specific state is saved between calls
    in the server.
  • Arguments are mapped into the servers address
    space, rather than being copied.
  • In Multi-packet arguments, most copying overlaps
    with transmission of next packet.

5
Implementation
Client
Server
  • j

Client Stub
Server Stub
Server thread
remote procedure
Application
Kernel
Kernel
Call Message
Local call Local return
Trap Return
Transmit (gather DMA) Receive
Reinitialize Transmit (DMA)
Jump
Call Trap
Execute Return
Return message
RPC in Peregrine
6
Implementation (Contd)
  • In Peregrine, the kernel is responsible for
  • 1-Getting RPC messages from one address space to
    another. (usually from a machine to another)
  • 2-Reinitializing a free thread in the server when
    a call message arrives, that handles the call
    including the binding.
  • 3-Unblocking the client thread when the return
    message arrives.

7
Implementation (Contd)
  • Unlike the previous paper
  • No RPCruntime, instead its the kernels
    responsibility to transfer messages reliably.
  • Pool of threads instead of pool of processes,
    which gives better performance.
  • All processing specific to the particular server
    procedure being called is performed in the stubs.

8
Hardware Requirements
  • The Peregrine implementation utilizes
  • The ability to re-map memory pages between
    address spaces by manipulating the page-table
    entries.

9
Hardware Requirements (contd)
  • The gather DMA capability of the Ethernet
    controller.

P1
P1
P2
P2
P3
P3
P4
P5
P2
P9
P1
P6
P7
P8
Network
P9
Clients address space
Servers address space
10
Implementation of the optimizations
  • Gather DMA is used to send arguments/results,
    instead of expensive copying.
  • No data conversion. (unless needed)
  • Use of packet header templates.
  • Received packet is mapped into the threads
    stack, to avoid copying.

11
Implementation of the optimizations (contd)
Received call packet in one of the servers
Ethernet receive buffer pages
12
Used optimization techniques (contd)
  • Server thread doesnt save or restore its
    registers in-between different RPCs. (as its a
    jump not a call)

13
Multi-Packet Network RPC
  • For a network RPC message containing the argument
    or result values is larger than the data portion
    of a single Ethernet RPC packet, the message is
    broken into multiple packets.
  • As in the single-packet case, the data are
    transmitted directly from the clients address
    space using gather DMA to avoid copying.

14
Multi-Packet Network RPC (contd)
  • Other than packets transmission, the execution of
    a multi-packet network RPC is the same as for the
    single-packet case.

15
Multi-Packet Network RPC (contd)
Example multi-packet call transmission and
reception
16
Local RPC
  • Between two threads executing on the same
    machine.
  • Memory mapping is used to move the call arguments
    and results between the clients and servers
    address spaces.
  • The execution is the same as for network RPC.

17
Performance Numbers
Peregrine RPC performance for single-packet
network RPCs (microseconds)
18
Performance Numbers (contd)
Peregrine RPC performance for multi-packet
network RPCs
19
Effectiveness of the Optimizations
  • Not copying memory for either the arguments or
    the results was shown to be very efficient
    optimization.
  • In case of multi-packet RPC, not copying during a
    critical path, was efficiently time saving as
    well.
  • And not doing data representation conversion if
    not needed was yet another effective optimization.

20
Conclusion
  • Peregrine, by trying to
  • Avoid expensive copies.
  • Expensive data representation conversions.
  • Recomputation of packet headers.
  • And reducing overhead for thread management.
  • Achieves a performance that is very close to the
    hardware latency, both for network RPCs, and for
    local RPCs.
Write a Comment
User Comments (0)
About PowerShow.com