Introduction to CS739: Distribution Systems - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to CS739: Distribution Systems

Description:

Introduction to CS739: Distribution Systems. UNIVERSITY of WISCONSIN-MADISON ... I ask questions, expect everyone to enthusiastically participate; fairly casual ... – PowerPoint PPT presentation

Number of Views:48
Avg rating:3.0/5.0
Slides: 12
Provided by: andreaarpa
Category:

less

Transcript and Presenter's Notes

Title: Introduction to CS739: Distribution Systems


1
Introduction to CS739Distribution Systems
UNIVERSITY of WISCONSIN-MADISONComputer Sciences
Department
CS 739Distributed Systems
Andrea C. Arpaci-Dusseau
What are distributed systems? What are the
benefits and challenges? How will CS739 be
structured? Readings, Writeups,
Presentations Projects
2
Goals of Course
  • Learn about challenges and existing techniques
    for building distributed systems and services
  • Read and discuss influential papers from SOSP,
    OSDI, NSDI
  • Gain some experience programming in distributed
    environment
  • Warm-up project
  • Final project

3
What is a Distributed System?
  • Leslie Lamport says You know you have one when
    the crash of a computer you never heard of stops
    you from doing any work
  • More technical definitionCollection of
    independent computers that appears to its users
    as a single coherent system
  • How are parallel, distributed, networked systems
    different?
  • All contain nodes (processing, memory, disk)
    connected with network

Moreunified
Lessunified
parallel
distributed
networked
Consider distributed services as well
4
Benefits of Distributed Systems
  • Great price/performance
  • Leverage commodity components (nodes and
    networks)
  • Use many, many of them
  • Incremental scalability
  • Can add x new nodes (or disks or memory) to
    improve performance x
  • Improved availability
  • Continue operating when some nodes stop working
  • Improved reliability
  • Deliver correct results when some nodes
    misbehave, corrupt data
  • Allow geographically-distributed individuals to
    share data or cooperate

5
Distributed System Challenges
  • Lack of global state information
  • Different nodes have different view of system
  • What are the contents of file A?
  • How many jobs are running on node X?
  • Which nodes are currently part of the system?
  • See delays, different ordering of messages, lost
    messages, network partitions
  • Tension with goal of single coherent system
  • Handling slow, failed and misbehaving nodes
  • How do you avoid slow nodes?
  • How do you get back data or work from failed
    node?
  • When nodes disagree, how do you know who is
    wrong?
  • Tension with goal of available and reliable
  • When is it okay to have some centralized
    components?
  • Simplifies state management, but single
    point-of-failure and performance bottleneck

6
Content of 739
  • Distributed system courses can be very different
  • Theoretical distributed algorithms (e.g., to
    allow nodes to come to consensus or agreement)
  • 4 lectures
  • Practical distributed programming (e.g., using
    RPC, JAVA RMI, CORBA, DCOM, MPI, PVM)
  • Warm-up project
  • Research systems new ideas for making
    distributed systems better
  • Focus of course
  • Implemented systems with new conceptual ideas
  • Recent papers in top systems conferences (SOSP,
    OSDI, NSDI)

7
Learning by Reading
  • Intense reading list assume sophisticated reader
    (736)
  • Usually cover 1 fascinating paper per class
  • No exams
  • Three types of classes
  • Formal lecture Only for 4 theory topics
  • Discussions Most papers
  • I ask questions, expect everyone to
    enthusiastically participate fairly casual
  • Task 1 Read paper 2-3 times before class
  • Task 2 Email write-up to me BEFORE class
  • Task 3 Take turns being scribe (about 2 times in
    semester)
  • Write-up notes from discussion in latex
  • Post to web page within 72 hours

8
Learning by Reading (cont)
  • Types of classes (cont)
  • Group-led lectures 4 topics
  • Small group gives overview of about 3-4 related
    papers
  • Topics
  • Distributed system analysis
  • Process migration
  • Programming environments
  • Specialized distributed services
  • Advantages
  • Good practice for giving presentations
  • Learn about topic in slightly more depth
  • Tasks
  • Group
  • Finalize related papers (1 week before)
  • Present to me (2 days before)
  • Use slides
  • Everyone else Skim papers
  • Handout State preferences by next week

9
Course Topics Reading List
  • Distributed Operating Systems (Survey, Amoeba vs
    Sprite)
  • Network File Systems (NFS, Coda, LBFS)
  • Theory Time, Ordering, and Distributed Snapshots
    (2 Lamport papers)
  • Analysis of Distributed Systems (1 Group
    Presentation)
  • Programming Environments (DSM, MapReduce, Group)
  • Process Migration (1 Group)
  • Specialized Distributed Services (Porcupine
    Group)
  • SPRING BREAK
  • Theory Consensus (Byzantine failures and
    fail-stop processors)
  • Cluster-based File Systems (PetalFrangipani and
    GoogleFS)
  • Communication Primitives (RPC vs U-Net)
  • P2P Systems (Measurement, CFS, Amazon, Pangaea,
    LOCKSS)
  • Miscellaneous Trust, Recovery, Mistakes,
    Speculation, Sensor Networks

10
Learning by Doing
  • Warm-up Project
  • Goal Become familiar with existing distributed
    programming environments
  • Examples Hadoop (open-source MapReduce), MPI,
    PVM
  • Task 0 Get environment running
  • Task 1 Implement simple application (e.g.,
    sorting)
  • Task 2 Report sufficient numbers to indicate did
    something
  • Final Project
  • Goal 1 Experience with research process in
    general
  • Work on open-ended project, unknown result
  • New idea where dont know if it will work
  • Goal 2 Learn about specific topic in depth
  • Topic from my list or your own choice work with
    project partner
  • Deliverables 20 minute talk, short research
    paper

11
Agenda for Next Class
  • See websitewww.cs.wisc.edu/cs739-1
  • Read
  • Survey Distributed Operating SystemsAndrew S.
    Tanenbaum and Robbert Van RenesseACM Computing
    Surveys, Volume 17, Issue 4 (December 1985), pp
    419-470
  • Long paper Focus on Sections 1 and 2
  • Answer question
  • What were the goals of distributed systems at
    this time? Which design issue (I.e.,
    communication primitives, naming and protection,
    resource management, fault tolerance, services)
    seems most challenging (or interesting)? Why?
  • Email answer to me with Subject cs739 Survey
  • Think about group presentation papers
Write a Comment
User Comments (0)
About PowerShow.com