Title: Desktop Grids
1Desktop Grids
- Ashok Adiga
- Texas Advanced Computing Center
- adiga_at_tacc.utexas.edu
2Topics
- What makes Desktop Grids different?
- What applications are suitable?
- Three Solutions
- Condor
- United Devices Grid MP
- BOINC
3Compute Resources on the Grid
- Traditional SMPs, MPPs, clusters,
- High speed, Reliable, Homogeneous, Dedicated,
Expensive (but getting cheaper) - High speed interconnects
- Upto 1000s of CPUs
- Desktop PCs and Workstations
- Low Speed (but improving!), Heterogeneous,
Unreliable, Non-dedicated, Inexpensive - Generic connections (Ethernet connections)
- 1000s-10,000s of CPUs
- Grid compute power increases as desktops are
upgraded
4Desktop Grid Challenges
- Unobtrusiveness
- Harness underutilized computing resources without
impacting the primary Desktop user - Added Security requirements
- Desktop machines typically not in secure
environment - Must protect desktop program from each other
(sandboxing) - Must ensure secure communications between grid
nodes - Connectivity characteristics
- Not always connected to network (e.g. laptops)
- Might not have fixed identifier (e.g. dynamic IP
addresses) - Limited Network Bandwidth
- Ideal applications have high compute to
communication ratio - Data management is critical to performance
5Desktop Grid Challenges (contd)
- Job Scheduling heterogeneous, non-dedicated
resources is complex - Must match application requirements to resource
characteristics - Meeting QoS is difficult since program might have
to share the CPU with other desktop activity - Desktops are typically unreliable
- System must detect recover from node failure
- Scalability issues
- Software has to manage thousands of resources
- Conventional application licensing is not set up
for desktop grids
6Application Feasibility
- Only some applications map well to Desktop grids
- Coarse-grain data parallelism
- Parallel chunks relatively independent
- High computation-data communication ratios
- Non-Intrusive behavior on client device
- Small memory footprint on the client
- I/O activity is limited
- Executable and data sizes are dependent on
available bandwidth
7Typical Applications
- Desktop Grids naturally support data parallel
applications - Monte Carlo methods
- Large Database searches
- Genetic Algorithms
- Exhaustive search techniques
- Parametric Design
- Asynchronous Iterative algorithms
8Condor
- Condor manages pools of workstations and
dedicated clusters to create a distributed
high-throughput computing (HTC) facility. - Created at University of Wisconsin
- Project established in 1985
- Initially targeted at scheduling clusters
providing functions such as - Queuing
- Scheduling
- Priority Scheme
- Resource Classifications
- And then extended to manage non-dedicated
resources - Sandboxing
- Job preemption
9Why use Condor?
- Condor has several unique mechanisms such as
- ClassAd Matchmaking
- Process checkpoint/ restart / migration
- Remote System Calls
- Grid Awareness
- Glideins
- Support for multiple Universes
- Vanilla, Java, MPI, PVM, Globus,
- Very simple to install, manage, and use
- Natural environment for application developers
- Free!
10Typical Condor Pool
ClassAd Communication Pathway
11Condor ClassAds
- ClassAds are at the heart of Condor
- ClassAds
- are a set of uniquely named expressions each
expression is called an attribute - combine query and data
- semi-structured no fixed schema
- extensible
12Sample ClassAd
- MyType "Machine"
- TargetType "Job"
- Machine "froth.cs.wisc.edu"
- Arch "INTEL"
- OpSys "SOLARIS251"
- Disk 35882
- Memory 128
- KeyboardIdle 173
- LoadAvg 0.1000
- Requirements TARGET.Owner"smith"
LoadAvglt0.3 KeyboardIdlegt1560
13Condor Flocking
- Central managers can allow schedds from other
pools to submit to them.
Collector
Negotiator
Submit Machine
Central Manager (CONDOR_HOST)
Pool-Foo Central Manager
Pool-Bar Central Manager
Schedd
14Example POVray on UT Grid Condor
Was 2h 17 min
5-8 min
5-8 min
Now 15 min
15Parallel POVray on Condor
- Submitting POVray to Condor Pool Perl Script
- Automated creation of image slices
- Automated creation of condor submit files
- Automated creation of DAG file
- Using DAGman for job flow control
- Multiple Architecture Support
- Executable povray.(OpSys).(Arch)
- Post processing with a C-executable
- Stitching image slices back together into one
image file - Using xv to display image back to user desktop
- Alternatively transferring image file back to
users desktop
16POVray Submit Description File
- Universe vanilla
- Executable povray.(OpSys).(Arch)
- Requirements (Arch "INTEL" OpSys
"LINUX") \ - (Arch "INTEL"
OpSys "WINNT51") \ - (Arch "INTEL" OpSys
"WINNT52") - transfer_files ONEXIT
- Input glasschess_0.ini
- Error Errfile_0.err
- Output glasschess_0.ppm
- transfer_input_files glasschess.pov,chesspiece1.
inc - arguments glasschess_0.ini
- log glasschess_0_condor.log
- notification NEVER
- queue
17DAGman Job Flow
A1
A0
An
A2
A3
A4
A5
PARENT
Pre-processing prior to executing Job B
CHILD
B
18DAGman Submission Script
condor_submit_dag povray.dag
- Filename povray.dag
- Job A0 ./submit/povray_submit_0.cmd
- Job A1 ./submit/povray_submit_1.cmd
- Job A2 ./submit/povray_submit_2.cmd
- Job A3 ./submit/povray_submit_3.cmd
- Job A4 ./submit/povray_submit_4.cmd
- Job A5 ./submit/povray_submit_5.cmd
- Job A6 ./submit/povray_submit_6.cmd
- Job A7 ./submit/povray_submit_7.cmd
- Job A8 ./submit/povray_submit_8.cmd
- Job A9 ./submit/povray_submit_9.cmd
- Job A10 ./submit/povray_submit_10.cmd
- Job A11 ./submit/povray_submit_11.cmd
- Job A12 ./submit/povray_submit_12.cmd
- Job B barrier_job_submit.cmd
- PARENT A0 CHILD B
- PARENT A1 CHILD B
- PARENT A2 CHILD B
- PARENT A3 CHILD B
!/bin/sh /bin/sleep 1
!/bin/sh ./stitchppms glasschess gt
glasschess.ppm 2gt /dev/null rm _.ppm .ini Err
.log povray.dag. /usr/X11R6/bin/xv 1.ppm
19United Devices Grid MP
- Commercial product that aggregates unused cycles
on desktop machines to provide a computing
resource. - Originally designed for non-dedicated resources
- Security, non-intrusiveness, scheduling,
- Screensaver/graphical GUI on client desktop
- Support for multiple clients
- Windows, Linux, Mac, AIX, Solaris clients
20How Grid MP Works
- Authenticates users and devices
- Dispatches jobs based on priority
- Monitors and reschedules failed jobs
- Collects job results
- Advertises capability
- Launches job
- Secure job execution
- Returns result
- Caches data for reuse
- Submits jobs
- Monitors job progress
- Processes results
- Web browser interface
- Command Line Interface
- XML Web services API
Administrator
21UD Management Features
- Enterprise features make it easier to convince
traditional IT organizations and individual
desktop users to install software - Browser-based administration tools allow local
management/policy specification to manage - Devices
- Users
- Workloads
- Single click install of client on PCs
- Easily customizable to work with software
management packages
22Grid MP Provisioning Example
Device Group X
Device Group Administrator
User Group B
Device Group Y
Grid MP Services
Device Group Z
Root Administrator
User Group A
Device Group Administrator(s)
23Application Types Supported
- Batch jobs
- Use mpsub command to run single executable on
single remote desktop - MPI jobs
- Use ud_mpirun command to run MPI job across a set
of desktop machines - Data Parallel jobs
- Single job consists of several independent
workunits that can be executed in parallel - Application developer must create program modules
and write application scripts to create workunits
24Hosted Applications
- Hosted Applications are easier to manage
- Provides users with managed application
- Great for applications that are run frequently
but rarely updated - Data Parallel applications fit best in hosted
scenario - Users do not have to deal with the application
maintenance only developer does. - Grid MP is optimized for running hosted
applications - Applications and data are cached at client nodes
- Affinity scheduling to minimize data movement by
re-using cached executables and data. - Hosted application can be run across multiple
platforms by registering executables for each
platform
25Example Reservoir Simulation
- Landmarks VIP product benchmarked on Grid MP
- Workload consisted of 240 simulations for 5 wells
- Sensitivities investigated include
- 2 PVT cases,
- 2 fault connectivity,
- 2 aquifer cases,
- 2 relative permeability cases,
- 5 combinations of 5 wells
- 3 combinations of vertical permeability
multipliers - Each simulation packaged as a separate piece of
work. - Similar Reservoir simulation application has been
developed at TACC (with Dr. W. Bangerth,
Institute of Geophysics)
26Example Drug Discovery
- Think LigandFit applications
- Internet Project in partnership with Oxford
University Model interactions between proteins
and potential drug molecules - Virtual screening of drug molecules to reduce
time-consuming, expensive lab testing by 90 - Drug Database of 3.5 billion candidate molecules.
- Over 350K active computers participating all over
the world.
27Think
- Code developed at Oxford University
- Application Characteristics
- Typical Input Data File lt 1 KB
- Typical Output File lt 20 KB
- Typical Execution Time 1000-5000 minutes
- Floating-point intensive
- Small memory footprint
- Fully resolved executable is 3Mb in size.
28Grid MP POVray Application Portal
29BOINC
- Berkeley Open Infrastructure for Network
Computing (BOINC) - Open source follow-on to SETI_at_home
- General architecture supports multiple
applications - Solution targets volunteer resources, and not
enterprise desktops/workstations - More information at http//boinc.berkeley.edu
- Currently being used by several internet projects
30Structure of a BOINC project
Ongoing tasks - monitor server correctness -
monitor server performance - develop and
maintain applications
31BOINC
- No enterprise management tools
- Focus on volunteer grid
- Provide incentives (points, teams, website)
- Basic browser interface to set usage preferences
on PCs - Support for user community (forums)
- Simple interface for job management
- Application developer creates scripts to submit
jobs and retrieve results - Provides sandbox on client
- No encryption uses redundant computing to
prevent spoofing
32Projects using BOINC
- Climateprediction.net study climate change
- Einstein_at_home search for gravitational signals
emitted by pulsars - LHC_at_home improve the design of the CERN LHC
particle accelerator - Predictor_at_home investigate protein-related
diseases - Rosetta_at_home help researchers develop cures for
human diseases - SETI_at_home Look for radio evidence of
extraterrestrial life - Cell Computing biomedical research (Japanese
requires nonstandard client software) - World Community Grid advance our knowledge of
human disease. (Requires 5.2.1 or greater)
33SETI_at_home
- Analysis of radio telescope data from Arecibo
- SETI search for narrowband signals
- Astropulse search for short broadband signals
- 0.3 MB in, 4 CPU hours, 10 KB out
34Climateprediction.net
- Climate change study (Oxford University)
- Met Office model (FORTRAN, 1M lines)
- Input 10MB executable, 1MB data
- Output per workunit
- 10 MB summary (always upload)
- 1 GB detail file (archive on client, may upload)
- CPU time 2-3 months (can't migrate)
- trickle messages
- preemptive scheduling
35Why use Desktop Grids?
- Desktop Grid solutions are typically complete
standalone - Easy to setup and manage
- Good entry vehicle to try out grids.
- Use existing (but underutilized) resources
- Number of desktops/workstations on campus (or in
an enterprise) is typically an order of magnitude
greater than traditional compute resources. - Power of grid grows over time as new, faster
desktops are added - Typical (large) numbers of resources on desktop
grids enable new approaches to solving problems