Robust Grid Computing Using PeertoPeer Services - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Robust Grid Computing Using PeertoPeer Services

Description:

Robust Grid Computing Using PeertoPeer Services – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 20
Provided by: aisrp
Category:

less

Transcript and Presenter's Notes

Title: Robust Grid Computing Using PeertoPeer Services


1
Robust Grid Computing Using Peer-to-Peer
Services
  • Alan Sussman
  • Department of Computer Science

2
Desktop Grids and P2P Systems
3
P2P Desktop Grid System
  • Executing applications on a (widely) distributed
    set of resources
  • Decentralized, Robust, Highly available and
    Scalable
  • Goal is to enable resource sharing
  • create ad-hoc grids within (research) communities
  • Applications
  • Large computational requirements and relatively
    low I/O requirements
  • Physical simulations and data analysis related to
    astronomy, physics, other scientific disciplines
    - parameter sweeps
  • binary asteroid formation (co-I Richardson)
  • comet composition (Deep Impact, co-I Wellnitz)
  • Monte Carlo simulations
  • Bioinformatics applications

4
Hard Problems / Issues
  • Job Submission
  • Submit a job into the decentralized P2P system
  • Matchmaking
  • Find a resource that can meet the minimum
    resource requirements of a job
  • Load balance
  • Distribute the load (jobs) across the nodes in
    the system
  • And no node has too much overhead (messages)
  • Resilience to failure
  • Overall system must not lose jobs from failures,
    and allow nodes to join and leave (or fail)
    dynamically

5
System Architecture
Job J
Peer-to-Peer Network (DHT)
Clients
Assign GUID to Job J
6
Goals of Matchmaking Algorithm
  • Expressiveness
  • Allow users to specify any type of resource
    requirements
  • Load balance
  • Distribute load across multiple capable
    candidates
  • Parsimony
  • Resources should not be wasted
  • Completeness
  • A valid assignment of a job to a node must be
    found if such an assignment exists
  • Low overhead
  • Routing must not add significant overhead

7
Basic Assumptions
  • Underlying Distributed Hash Table (DHT)
  • Object location and routing in P2P network
  • Reformulate the problem of matchmaking to one of
    routing
  • Job in the system
  • Data and associated profile (requirements)
  • All jobs are independent
  • Optimization criterion
  • Minimize time to complete all jobs (combination
    of throughput and response time)
  • Queuing time used as proxy for response time

8
Modified Content-Addressable Network
  • Basic CAN
  • Logical d-dimensional space
  • zone, neighbors, greedy forwarding
  • Formulate the matchmaking problem as a routing
    problem in CAN space
  • Each resource type is a CAN dimension
  • Map nodes and jobs into the CAN space
  • Resource capabilities and requirements,
    respectively
  • Search for a node whose coordinates in all
    dimensions meet or exceed the jobs requirements

9
CAN-based Algorithm
  • Employ dynamic aggregated resource information
    for load balancing

Memory Dimension
  • Aggregate Resource Information
  • Choose the least loaded direction
  • Push a job into under-loaded region
  • Stop pushing

Node A
  • Choose the best run node

Job
Routing
CPU Dimension
10
Different Resource Types
  • Categorical Resources
  • Architecture, Operating system type (version),
    etc.
  • Exact match
  • Continuous Resources
  • CPU speed, Memory amount, Disk space, etc.
  • Minimum match
  • Until recently, our work considered only
    continuous resource types

11
Integrating Categorical Resources
  • How to integrate all types of resources in CAN?
  • Could add new CAN dimensions for categorical
    resource types
  • Could create a distinct CAN space for each
    combination of categorical resource types
  • Our Approach
  • Virtual Peers 1-dimensional transformation

12
Virtual Peer1-D Transformation
  • 1-dimensional Transformation
  • Transform all categorical resource dimensions
    into a single dimension
  • To make virtual peer management and failure
    recovery simple, and re-use our existing
    algorithms
  • Use Hilbert Space-Filling Curve
  • Virtual peers cover the unoccupied parts of the
    CAN space in the categorical dimension
  • Effectively divide the CAN space into multiple
    disjoint sub-CANs and connect them

13
Memory
A
G
E
B
H
VBE
VEH
VH
MJ
C
F
I
D
Linux Intel
OSX PPC
Windows Intel
AIX Power
Solaris Sparc
C- Dim
14
Improving Overall Scalability
  • Partial Updates
  • Limit the size of a single update message
  • Probabilistic Heartbeat Messaging
  • Limit the number of outgoing update messages
  • Specialized Routing along T-dimension
  • Balance the load of processing routing messages
    from a sub-CAN to another sub-CAN
  • All without affecting correctness of CAN
    algorithms
  • Must quantify performance and reliability effects

15
Experimental Results
16
Current Status
  • Resource discovery algorithms thoroughly
    simulated and verified
  • CAN-based implementation complete and being
    tested
  • All CAN services working node join, node
    leave/fail, job assignment, load aggregation
  • Basic authentication mechanism in place, based on
    certificates and public-key authentication
  • Job management and client interface completed
  • Categorical resource types still being
    implemented
  • Large-scale deployment and measurement in
    process
  • binary asteroid formation application (Richardson)

17
Ongoing Work
  • Deploying the prototype system for real
    workloads and real machines
  • practical issues, such as firewalls, user
    interface for job submission
  • with serious logging for measuring system
    behavior
  • Dealing with multicore and multiprocessor
    machines
  • Better characterization of real workloads
  • via consultation with Astronomy collaborators,
    logging behavior of the peer, and mining of
    Condor system logs
  • High quality peer software, with good user tools

18
The Project Team
  • Faculty members
  • Alan Sussman, Pete Keleher, Bobby Bhattacharjee,
    Derek Richardson (Astronomy), Dennis Wellnitz
    (Astronomy)
  • Matchmaking algorithms and simulations
  • Jik-Soo Kim (RA)
  • Jaehwan Lee (RA)
  • Peer implementation and other tools
  • Michael Marsh (research scientist), Beomseok Nam
    (postdoc)
  • Sukhyun Song, San Ratanasanya (RAs)

19
End of Talk
Write a Comment
User Comments (0)
About PowerShow.com