Research Issues in Cooperative Computing - PowerPoint PPT Presentation

About This Presentation
Title:

Research Issues in Cooperative Computing

Description:

... be written by my grad students, read by all ND faculty, and ... What series of steps was used to run my job? (Usually considered implementation details. ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 28
Provided by: dougla9
Learn more at: https://www3.nd.edu
Category:

less

Transcript and Presenter's Notes

Title: Research Issues in Cooperative Computing


1
Research Issues inCooperative Computing
  • Douglas Thain
  • http//www.cse.nd.edu/ccl

2
Sharing is Hard!
  • Despite decades of research in distributed
    systems and operating systems, sharing computing
    resources is still very difficult.
  • Problems get worse as scale increases
  • Office
  • Server Room
  • Distributed System
  • Computational Grid

3
Designers Go To Extremes
Cooperative Computing
Peer to Peer
Central Control
4
How Do We Share Data?
P2P File Sharing (WWW, Napster)
Central Storage Archive (NFS, UDC, StorageTank.)
5
Things I Cant Do Today
  • Let members of my project team store and retrieve
    documents from this disk in my office.
  • (Where my boss defines project team.)
  • I must have 1 TB of space for one whole week, but
    it must be stored by someone I know.
  • (Where I give a list of trusted people.)
  • Allow a visitor in my office to use my machine.
  • (But I want her workspace isolated from mine.)
  • This bioinformatics repository can be written by
    my grad students, read by all ND faculty, and
    read by anyone approved by the NSF.
  • (Where each list comes from a different source.)

6
What is Cooperative Computing?
  • CC means putting owners in charge.
  • I control who uses my resources.
  • Need tools for expressing trust.
  • CC means respect for social structures.
  • Trust is rarely symmetric.
  • Hierarchy and centralization can be important.
  • Motivation is usually external to the system.
  • CC means ease of use.
  • Resource owners need simple and effective tools.
  • Resource users need to be insulated from failures.

7
Every User Should be a Super-User
Allocation Accounting Quality of
Service Security Debugging
Super- User
8
Vision of Cooperative Storage
  • Make it easy to deploy systems that
  • Allow sharing of storage space.
  • Respect existing human structures.
  • Provide reasonable space/perf promises.
  • Work easily and transparently without root.
  • Make the non-ideal properties manageable
  • Limited allocation. (select, renew,
    migrate)
  • Unreliable networks. (useful fallback
    modes)
  • Changing configuration. (auto. discovery/config)

9
basic filesystem
10
Cooperative Storage Pool
storage server
storage server
storage server
storage server
storage server
storage server
disk
disk
disk
disk
disk
disk
11
Cooperative Computingis useful in the
officebut it is badly neededon the Grid!
12
On the Grid
job
job
job
job
job
job
job
job
Work Queue
13
Grid Computing Experience
  • Ian Foster, et al. (102 authors)
  • The Grid2003 Production Grid
  • Principles and Practice
  • IEEE HPDC 2004
  • The Grid2003 Project has deployed a multi-virtual
    organization, application-driven grid laboratory
    that has sustained for several months the
    production-level services required by
  • ATLAS, CMS, SDSS, LIGO

14
Grid Computing Experience
  • The good news
  • 27 sites with 2800 CPUs.
  • 40985 CPU-days provided over 6 months.
  • 10 applications with 1300 simultaneous jobs.
  • The bad news
  • 40-70 percent utilization.
  • 30 percent of jobs would fail.
  • 90 percent of failures were local problems.
  • The lessons
  • Most site failures were due to disk space.
  • Debugging most problems was impossible.

15
Coop Computing and the Grid
  • The Grid is a boundary case of CC.
  • Large scale, high performance.
  • Allocate resources to partially trusted visitors.
  • Everyone wants to exhaust resources.
  • Can CC scale from the office to the grid?
  • If it is easy for one person to deploy in an
    office then it will be usable enough to work on
    the grid.

16
More Cooperative Computing
  • Nested Principals Authentication
  • Simple question How to allow a visitor?
  • Distributed Access Control
  • Can we find something more usable than PKI?
  • Storage Abstractions
  • Can we do better than files/directories?
  • Data-Intensive Grid Computing
  • How do I use storage and CPU together?
  • Distributing Debugging
  • Consider it a distributed query problem.

17
Cooperative Computing Credo
  • Make computer structures
  • model social structures...
  • Not the other way around!

18
For more information
  • The Cooperative Computing Lab
  • http//www.cse.nd.edu/ccl
  • Prof. Douglas Thain
  • dthain_at_cse.nd.edu

19
(No Transcript)
20
Two Related Problems
  • Users dont have direct control.
  • I need 50 GB of storage for one week.
  • Allow my collaborators to use my space.
  • (Usually considered administrative tasks.)
  • Users dont have direct information.
  • Why was I denied this allocation?
  • What series of steps was used to run my job?
  • (Usually considered implementation details.)

21
The Current Situation
storage server
storage server
storage server
storage server
storage server
22
Distributed Debugging
debugger
kerberos
cpu
cpu
batch system
auth gateway
workload manager
cpu
cpu
cpu
cpu
job
log file
log file
log file
license manager
archival host
storage Server
storage server
storage server
log file
log file
log file
log file
log file
23
Distributed Debugging
  • Big challenges!
  • Language issues storing and combining logs.
  • Ordering How to reassemble events?
  • Completeness Gaps, losses, detail.
  • Systems Distributed data collection.
  • But, could be a big win
  • A crashes whenever X gets its creds from Y.
  • Please try again I have turned up the detail on
    host B.

24
Grid Computing
  • - The Vision Make large-scale computing
    resources as reliable and as simple as the
    electric power grid or the water utility.
  • - The Reality Tie together existing computing
    clusters and archival storage around the country
    into systems that are (almost) usable by experts.

25
  • Storage Allocation
  • Give me 50 GB for 24 hours
  • Technical Problem Building Allocation
  • Distributed Debugging
  • Correlation
  • Hypothesis Proposal
  • Reasoning
  • System Building
  • Adaptation

26
I need ten more CPUs in order to finish my paper
by Friday!
CSE grads can computehere, but only when Im not.
May I use your CPUs?
CPU
CPU
CPU
CPU
CPU
CPU
Is this person a CSE grad?
My friends in Italy need to access this data.
auth server
Im not root!
secure I/O
disk
disk
disk
PBs of workstation storage! Can I use this as a
cache?
If I can backup to you, you can backup to me.
disk
disk
27
Cooperative Computing Credo
  • Put users in charge of their resources.
  • Share resources as they see fit.
  • Expose information for debugging.
  • Mode of operation
  • Make tools that are foolproof enough for casual
    use by one or two people in the office.
  • If they really are foolproof, then they will also
    be suitable for deployment in large scale systems
    such as computational grids.
Write a Comment
User Comments (0)
About PowerShow.com