David P. Anderson - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

David P. Anderson

Description:

Public Distributed Computing with BOINC David P. Anderson Space Sciences Laboratory University of California Berkeley davea_at_ssl.berkeley.edu – PowerPoint PPT presentation

Number of Views:86
Avg rating:3.0/5.0
Slides: 27
Provided by: berk105
Category:
Tags: jxta | anderson | david

less

Transcript and Presenter's Notes

Title: David P. Anderson


1
Public Distributed Computing with BOINC
  • David P. Anderson
  • Space Sciences Laboratory
  • University of California Berkeley
  • davea_at_ssl.berkeley.edu

2
Public-resource computing
home PCs
your computers
academic
business
95 96 97 98 99 00 01 02
03 04
GIMPS, distributed.net
SETI_at_home, folding_at_home
names public-resource computing peer-to-peer
computing (no!) public distributed
computing _at_home computing
fight_at_home
climateprediction.net
3
The potential of public computing
  • SETI_at_home 500,000 CPUs, 65 TeraFLOPs
  • 1 billion Internet-connected PCs in 2010, 50
    privately owned
  • If 100M participate
  • 100 PetaFLOPs
  • 1 Exabyte (1018) storage

public computing
CPU power, storage capacity
Grid computing
p
cluster computing
supercomputing
cost
4
Public/Grid differences
5
Economics (0th order)
cluster/Grid computing
public-resource computing
you
resources ()
Internet ()
Network (free)
resources (free)
1 buys 1 computer/day or 20 GB data transfer on
commercial Internet Suppose processing 1 GB data
takes X computer days Cost of processing 1
GB cluster/Grid X PRC 1/20 So PRC is
cheaper if X gt 1/20 (SETI_at_home X 1,000)
6
Economics revisited
Underutilized free Internet (e.g. Internet2)
you
...
other institutions
commodity Internet
Bursty, underutilized flat-rate ISP
connection Traffic shapers can send at zero
priority gt bandwidth may be free also
7
Why isn't PRC more widely used?
  • Lack of platform
  • jxta, Jabber not a solution
  • Java apps are in C, FORTRAN
  • commercial platforms business issues
  • cosm, XtremWeb not complete
  • Need to make PRC technology easy to use for
    scientists

8
BOINC Berkeley Open Infrastructure for Network
Computing
  • Goals for computing projects
  • easy/cheap to create and operate projects
  • wide range of applications possible
  • no central authority
  • Goals for participants
  • easy to participate in multiple projects
  • invisible use of disk, CPU, network
  • NSF-funded open source in beta test
  • http//boinc.berkeley.edu

9
(No Transcript)
10
Climateprediction.net
  • Global climate study (Oxford Univ.)
  • Input 10MB executable, 1MB data
  • CPU time 2-3 months (can't migrate)
  • Output per workunit
  • 10 MB summary (always upload)
  • 1 GB detail file (archive on client, may upload)
  • Chaotic (incomparable results)

11
Einstein_at_home (planned)
  • Gravity wave detection LIGO UW/CalTech
  • 30,000 40 MB data sets
  • Each data set is analyzed w/ 40,000 different
    parameter sets each takes 6 hrs CPU
  • Data distribution replicated 2TB servers
  • Scheduling problem is more complex than bag of
    tasks

12
Intel/UCB Network Study (planned)
  • Goal map/measure the Internet
  • Each workunit lasts for 1 day but is active only
    briefly (pings, UDP)
  • Need to control time-of-day when active
  • Need to turn off other apps
  • Need to measure system load indices
    (network/CPU/VM)

13
(No Transcript)
14
Project web site features
  • Download core client
  • Create account
  • Edit preferences
  • General disk usage, work limits, buffering
  • Project-specific allocation, graphics
  • venues (home/school/work)
  • Profiles
  • Teams
  • Message boards, adaptive FAQs

15
General preferences
16
Project-specific preferences
17
Data architecture
  • Files
  • immutable, replicated
  • may originate on client or project
  • may remain resident on client
  • Executables are digitally signed
  • Upload certificates prevent DOS

ltfile_infogt ltnamegtarecibo_3392474_jun_23_01lt/name
gt lturlgthttp//ds.ssl.berkeley.edu/a3392474lt/urlgt
lturlgthttp//dt.ssl.berkeley.edu/a3392474lt/urlgt lt
md5_cksumgtuwi7eyufiw8e972h8f9w7lt/md5_cksumgt ltnbyt
esgt10000000lt/nbytesgt lt/file_infogt
18
Computation abstractions
  • Applications
  • Platforms
  • Application versions
  • may involve many files
  • Work units inputs to a computation
  • soft deadline CPU/disk/mem estimates
  • Results outputs of a computation

19
Scheduling pull model
scheduling server
data server
result 1 ... result n
upload
request X seconds of work host description
...compute...
download
core client
20
Redundant computing
work generator
assimilator
canonical result
replicator
select canonical result assign credit
validator
scheduler
clients
21
BOINC core client
file transfers restartable concurrent user limited
program execution semi-sandboxed graphics
control checkpoint control done, CPU time
app
app
API
API
shared mem
core client
22
User interface
graphics
app
core client
screensaver
app
app
activate screensaver
control/state RPCs
control panel
23
(No Transcript)
24
Anonymous platform mechanism
  • User compiles applications from source, registers
    them with core client
  • Report platform as anonymous to scheduler
  • Purposes
  • obscure platforms
  • security-conscious participants
  • performance tuning of applications

25
Project management tools
  • Python scripts for project creation/start/stop
  • Remote debugging
  • collect/store crash info (stack trace)
  • web-based browsing interface
  • Strip charts
  • record, graph system performance metrics
  • Watchdogs
  • detect system failures dial pager

26
Conclusion
  • Public-resource computing is a distinct paradigm
    from Grid computing
  • PRC has tremendous potential for many
    applications (computing and storage)
  • BOINC enabling technology for PRC
  • http//boinc.berkeley.edu
Write a Comment
User Comments (0)
About PowerShow.com