Title: The Computer is the Datacenter The Operating System is SML VM
1The Computer is the Datacenter -The Operating
System is SMLVM
- David Patterson Director, RAD LabJanuary, 2007
2Standard SW v. Internet Services Waterfall v.
Process
- Process SupportDADO Evolution, 1 group
- Waterfall Static Handoff Model, N groups
3What have we learned?
- 1 year since launched RAD Lab
- RAD Grad classes ? Hands on experiences
- RAD Lab opened ? Interaction with speakers from
Google, HP, Microsoft, Sun, to refine ideas - RAD Lab opened ? More progress across SML,
systems, networking disciplines - Insight where to borrow, where to innovate
- Borrow Develop, Innovate Analyze/Deploy/Operate
- Time to re-engineer RAD Lab Vision
- Simplifying datacenter management helps ADO
- 1st, identify technological trends
- To inspire inventions that match technology of
2010 - Wont know until 2010 which trends mattered
4Outline
- Technology Trends for 2010
- CPU 2X cores / chip / 2 years, clock, power
- DRAM 2X size / chip / 3 years, latency, gt BW
- Disk 2X size / disk / 3 years , latency, gt BW
- Flash Threat to small disks?
- LAN 10X BW / link / 4-5 years
- Internet Data Centers new Internet backbone
- Op. Sys. Virtual machines deconstructing OS
- RAD Lab Vision 2.0 Milestones
- Revisiting RAD Lab Vision 1.0
5Technology Trends CPU
- Microprocessor Power Wall Memory Wall ILP
Wall Brick Wall - ? End of uniprocessors and faster clock rates
- Since parallel more power efficient (W
CV2F)New Moores Law is 2X processors or
cores per socket every 2 years, same clock
frequency - Conservative 2007 4 cores, 2009 8 cores, 2011
16 cores for embedded, desktop, server - Sea change for HW and SW industries since
changing programmer model, responsibilities - Every program(mer) is a parallel program(mer),
Sequential algorithms are slow algorithms
6Technology Trends DRAM
- DRAM capacity decelerate capacity per chip due
in part to 32-bit address limit, investments - 512 Mbit sold in 2002 still dominates (1 GB
DIMM) - 2X capacity every 3 years? (vs. 4X/3yrs in 1990s)
- DRAM performance only BW improvements (DDR-2,
DDR-3), little latency improvement - 64-bit Addresses Multiple cores/socket ?
Majority number of chips DRAM vs. Logic?
Majority of system cost DRAM vs. Logic - ? Majority of power is DRAM vs. Logic
- Shift in chips, cost, power to DRAM from CPU
increases over time
7Technology Trends Disk
- Disk After capacity 100 per year 96 - 03,
slowdown to 30 per year recently (1TB in 07) - Consolidation of industry, lack of demand by PCs
- Home Video restart PC demand, capacity wars?
- Split ATA best /GB, SCSI best /performance
- Performance Interface switch from parallel to
serial Serial ATA (SATA), Serial SCSI (SAS)?
Low Cost Disk arrays - Disk performance latency slow change, bandwidth
improves, but not as fast as capacity? Takes
longer to read whole disk (3 hours)? Takes
longer repair ? Must handle 2 faults? RAID, as
3X too expensive in cost ( power)?
8Technology Trends Flash
- Flash Memory is credible threat to small disks
- Modular, 1000X latency, BW, lt power, but 1M
writes - Camera, Ipod industry funds flash RD
- Flash Improvement Rate 2X GB/ every 9 months?
- IF disk and flash rates continue, flash matches
GB/ SCSI in 2009, GB/ SATA in 2012 - Future Phase-change RAM (PRAM) no write limit,
write 30X faster, archival Samsung 2008?
9Technology Trends LAN
- Ethernet from shared media to switch and twisted
pair shortens time to new generation - But shorter distance per link over copper
- Year of Standard
- 1983 10 Mbit/s IEEE 802.3
- 1995 100 Mbit/s IEEE 802.3u
- 1999 1000 Mbit/s IEEE 802.3ab
- 10000 Mbit/s IEEE 802.3ac (optical)
- 2006 10000 Mbit/s IEEE 802.3an (copper)
- Expect 10 Gbit/s economical in 2007
- 100 Gbit/sec IEEE standard started 2006
- Standard in 2008? Economical in 2012?
10Technology Trends Internet
- Datacenters new Internet backbone
- Huge concentration of bandwidth computation
- Shift in traffic pattern
- More and more traffic is host?Datacenter
- Huge data transfers between/within DCs are the
norm - Note IP alone not designed for such networks
11Technology Trends OS
- Resurgence of popularity in virtual machines
- Traditional OSes too large and brittle
- VM monitor thin SW layer btw guest OS and HW
- Advantages
- Security, Dependability via isolation
- VMs move from failing processor
- Rosenblum future of OSs could be libraries
where only functions needed linked into app, on
top of thin VMM layer provides protection,
sharing of resources - SW shipped with OS features VM reader?
The Impact of Virtualization on Computer
Architecture and Operating Systems,
Keynote Address, ASPLOS XII, Oct. 23, 2006
12Outline
- Technology Trends
- RAD Lab Vision 2.0
- The Datacenter is the Computer
- SML VMM provides the DC OS Policy Mechanism
- Sensors (-trace, X-trace, Backplane, D-trace, )
- Actuators (VMM, Identity-based Routing Layer,, )
- File System Web Services (GFS, Bigtable, Amazon
S3) - Ruby on Rails is the programming language
- Libraries is the Web Services (MapReduce, Chubby)
- Research Accelerator for MP (RAMP) is simulator
- Automatic Workload Evaluator (AWE) is scaffolding
- Web 2.0 apps are the benchmarks
- RAD Milestones Energy Conservation, Web 2.0
- Revisiting RAD Lab Vision 1.0
13Datacenter is the Computer
- (From Luiz Barrosos talk at RAD Lab 12/11)
- Google program Web search, Gmail,
- Google computer
- Thousands of computers, networking, storage
- Warehouse-sized facilities and workloads may be
unusual today but are likely to be more common
in the next few years
14Datacenter is the Computer
- Datacenter composed of 20 ft. containers
- Power/cooling for 200 KW of racked HW
- External taps for electricity, network, water
- 250 Servers, 7 TB DRAM, or 1.5 PB disk
Project Blackbox10/17/06
15Datacenter is the Computer
- Orders of magnitude more devices (CPUs/rack,
racks/datacenter, number of datacenters)
450,000 servers _at_ 25 datacenters, John
Markoff, Google's not-so-very-secret weapon, NY
Times, 6/13/06
16Datacenter is the Computer
- Re-inventing Client/Server Computing
17Datacenter management challenges
- Management efficient use of resources under
dynamic demand, cost model, reliability - Resources Order(s) of magnitude more devices
- Dynamic demand peaks 5-10X averages,
provisioning hard - Reliability SW/HW failures commonplace
- Software Churn Google search rewritten twice
in last 5 years Ebay rewritten 4 times in 9
years - Dynamic cost model nonlinear cost models for
power, cooling, network traffic - ? Too large scale, too complex for manual
administration by people ? SML?
18Managing the Datacenter
- OS manages shared resources, so a datacenter OS
would manage datacenter resources - Historically, OS grew from I/O library to provide
functions to replace operators - OS 101 separate mechanism (must be efficient,
protected, so in kernel) from policy (should be
easy to change)
19Datacenter OS policy mechanism?
- Policy SML makes the global decisions
scheduling, allocation, migration, - Mechanism Virtual Machine Monitor / computer
provides local isolation, protection,
allocation,
20SML as DC OS Policymaker
- 5 SML strengths
- Train vs. write the logic (handle SW churn)
- Learns how to handle and make policy during
transitions between steady states (unlike
queueing theory) - Coping with a complex cost function (unlike
simple control theory) - Finding trends, needles in data haystack
- Fast enough to run online (today)
- Infeasible algorithms from 1960s fast in 2007
- Computers cheap enough OK to just monitor
21SML as DC OS Policymaker
- 4 Potential SML weakness
- Risk not perfectly accurate
- Design systems to accommodate some inaccuracy
- Starting dumb
- Can initialize based on static knowledge (e.g.,
queueing theory to initialize Reinforcement
Learn) - Opaque Learns vs. told
- Unlike neural networks, can interrogate to ask
how make decisions (e.g., Bayes networks) - Model lifecycle how do you know its obsolete?
- Open question, but change point detection
research may help
22Sensors for SML
- SML needs data to analyze
- Some components come with sensors
- CPUs (performance counters), Disks (SMART
interface), - Can to add sensors to software
- Log files, D-trace for Solaris, Mac OS
- How collect data from 10,000s of nodes
distributed over LANs inside a datacenter?
Between datacenters?
23DC wide sensor
- Imagine a world where path information always
passed along so that can always track user
requests throughout system - Across apps, OS, network components and layers,
different computers on LAN, - Unique request ID
- Components touched
- Time of day
- Parent of this request
24Trace The 1 Solution
- Trace Goal Make Path Based Analysis have low
overhead so it can be always on inside datacenter
- Baseline path info collection with 1
overhead - Selectively add more local detail for specific
requests - Trace an end-to-end path recording framework
- Capture timestamp a unique requestID across all
system components - Top level log contains path traces
- Local logs contain additional detail, correlated
to path ID - Built on X-trace
25X-Trace comprehensive tracing through Layers,
Networks, Apps
- Trace connectivity of distributed components
- Capture causal connections between
requests/responses - Cross-layer
- Include network and middleware services such as
IP and LDAP - Cross-domain
- Multiple datacenters, composed services,
overlays, mash-ups - Control to individual administrative domains
- Network path sensor
- Put individual requests/responses, at different
network layers, in the context of an end-to-end
request
26RAD Lab Sensor Projects
- trace
- Application layer (inside datacenter)
- Augments distributed app logging with path info
- X-Trace
- Comprehensive causal connectivity tracing
- Cross-layer, cross-network, cross-domain
- D-trigger
- a communication efficient anomaly detection and
diagnosis framework across datacenters - Liblog
- Single application deterministic replay based on
distributed tracing of O/S interfaces - Instrumentation Backplane to feed SML, Ops
- Common SW bus to gather trace data
27Actuators for SML
- SML needs to be able to take actions to change
behavior, reallocate resources - 4 examples
- Networked on/off switches to unfailingly stop
equipment - Micro-reboot - SW designed to have fast reboot
- Virtual Machines Monitor
- Identity-based Routing Layer
28Actuator Virtualization
- Vmware, Xen low overhead support OS/version
heterogeneity and migration via VM Monitor - Adjust program resources while running, migrate
program to another computer, faithfully kill
program,
29Middle boxes in Todays DC
- Middle boxes inserted on physical path
- Policy via plumbing
- Weakest link 1 point of failure, bottleneck
- Expensive to upgrade and introduce new
functionality - Proposal add Identity-based Routing Layer to
classify and send packets to middle boxes by
policy vs. by plumbing
High Speed Network
intrusion detector
load balancer
firewall
30ActuatorIdentity-based Routing Layer
- Assign ID to incoming packets (hash table
lookup) - Route based on IDs, not locations (i.e., not IP
addr) - Sets up logical paths without changing network
topology - Set of common middle boxes get single ID
- No single weakest link robust, scalable
throughput
Load- Balancer (IDLB)
Intrusion- Detection (IDID)
Service (IDS)
Firewall (IDF)
- So simple can be done in FPGA?
- More general
- than MPLS
Identity-based Routing Layer
31File System is Web Services
- Use Web Services as Datacenter file system
- E.g., Google FS, Bigtable
- Use HaDoop versions?
- E.g, Amazon Simple Storage Service (S3),
- Smugmug ( Flickr) saves 0.5M/yr using S3
- Actually pay for storage?
32Outline
- Technology Trends
- RAD Lab Vision 2.0
- The Datacenter is the Computer
- SML VMM provides the operating system
- Sensors (-trace, X-trace, Backplane, D-trace, )
- Actuators (VMM, Identity-based Routing Layer, )
- File System Web Services (GFS, Bigtable, Amazon
S3) - Ruby on Rails is the programming language
- Libraries is the Web Services (MapReduce, Chubby)
- Research Accelerator for MP (RAMP) is simulator
- Automatic Workload Evaluator (AWE) is scaffolding
- Web 2.0 apps are the benchmarks
- RAD Milestones Energy Conservation, Web 2.0
- Revisiting RAD Lab Vision 1.0
33Ruby on Rails DC PL
- Ruby on Rails is an open source Web framework
that's optimized for programmer happiness and
sustainable productivity. It lets you write
beautiful code by favoring convention over
configuration. - Testimonials
- Lines of code Java vs. RoR 31
- Lines of configuration Java vs. RoR 101
- More than a fad
- Highest form of flattery Java on Rails, Python
on Rails,
See http//www.theserverside.com/news/thread.tss?t
hread_id33120
See web2.wsj2.com/ruby_on_rails_11_web_20_on_rocke
t_fuel.htm
34Ruby on Rails DC PL
- Reasons to love Ruby on Rails
- Convention over Configuration
- Rails framework feature enabled by Ruby language
feature (Meta Object Programming) - Scaffolding automatic, Web based, (pedestrian)
User Interface to stored data - Program the client v 1.1 write browser-side code
in Ruby then compile to Javascript - Duck Typing/Mix-Ins
- Looks like string, responds like string, its a
string! - Mix-in improvement over multiple inheritance
35Ruby on Rails scale up?
- So far RoR for small apps?
- 37signals.com did 5 RoR apps that serve 400k
users on 13 servers - No reason in principle cant scale up
- Can replace defaults with industrial strength
components - Lighttpd? Apache, MySQL ? Oracle
- May need help with some components
- e.g. Haproxy
- May be a place for us to innovate
36Web Services as DC Library
- Web Services RoR library functions
- E.g., MapReduce, Chubby distributed lock manager
- Academics use HaDoop versions? Mashup?
- Potentially a place for us to innovate if missing
useful library functions
37RAMP as DC Simulator
- RAMP (Research Accelerator for MultiProcessing)
to emulate 1000 CPU computers to aid parallelism
challenge - FPGAs Fast enough, cheap, very flexible
- 6 Universities, 10 faculty cooperatively create
and distribute gateware and software - Microsoft to help make RAMP 2 boards
- New boards available end of 2007?
- Repurpose RAMP to emulate datacenter
- Experiment with DC configurations processors,
disks, switches, networks - Evaluate future systems before buy vs. after
- Also to Build Identity-based Routing Layer
256 node cluster on 8 RAMP 1 boards running NAS
benchmarks
38AWE as DC test scaffolding
- How create future workload to test future system?
- Given proprietary systems with sensitive
information, how study realistic workloads? - Use SML clustering, classification to
characterize how workload request types affect
resource utilization, and mix of types in
workload - Kernelized Canonical Correlation Analysis to
cluster trace requests that elicit similar
system-level behavior - Run Automatic Workload Evaluator behind
firewall, ship realistic, sanitized abstract
models - 2 uses AWE-gen to generate synthetic workload,
AWE-sim simulate applications
39AWE-gen and AWE-sim
- AWE-gen then uses stratified sampling to scale up
workload, while maintaining right request mix - Drives AWE-sim
- Also drive RoR app by mining configuration data?
- AWE-sim as application behavior simulator
- Mimics resource consumption of a particular
request (spin CPU, send msgs, emit gibberish) - Simulate apps effect on system, without
simulating its functional behavior as seen by the
end user - Enables study of 100s of interdependent apps
- AWE RAMP to emulate workload system
40RAD Datacenter, RAD Node
(datacenter cluster or RAMP)
Policy Maker
Per node SW stack
Load- Balancer (IDLB)
Intrusion- Detection (IDID)
Service (IDS)
Web 2.0 Applications
Firewall (IDF)
Ruby on Rails Interpreter
Web Svc APIs
Trace, X-trace
Local OS functions
trace, X-trace, Liblog, D-trigger,
Identity-based Routing Layer,
Virtual Machine Monitor
Actuator Commands
Sensor Data
1. Energy? 2. 1 person run Killer Apps?
41Outline
- Technology Trends
- RAD Lab Vision 2.0
- The Datacenter is the Computer
- SML VMM provides the operating system
- Sensors (-trace, X-trace, Backplane, D-trace, )
- Actuators (VMM, Identity-based Routing Layer, )
- File System Web Services (GFS, Bigtable, Amazon
S3) - Ruby on Rails is the programming language
- Libraries is the Web Services (MapReduce, Chubby)
- Research Accelerator for MP (RAMP) simulator
- Automatic Workload Evaluator (AWE) tester
- Web 2.0 apps are the benchmarks
- RAD Milestones Energy Conservation, Web 2.0
- Revisiting RAD Lab Vision 1.0
42RAD Lab 2.0 1st Milestone Conserve Energy
- Datacenters limited by power
- Increasing cost of power vs. hardware
- 2005 1 spent on servers required an additional
0.48 to power and cool it - 2010 0.71 to power and cool it
- 2005 26B was spent to power and cool servers, to
grow to 45B by 2010 - 450 M KWh ? 300 M metric tons of CO2 emissions
- If could save 1/3 energy of servers, save 15B
per year and 100 M tons CO2 per year - California goal to cut 175M tons/year by 2020
- California was 16th in world for Greenhouse Gases
43RAD Lab 2.0 1st Milestone Conserve Energy
- Good match to SML
- An optimization, so imperfection not catastrophic
- Lots of data to measure, dynamically changing
workload, complex cost function - Not steady state, so not queuing theory
- PGE trying to change behavior of datacenters
- Properly state problem
- Preserve 100 Service Level Agreements
- Dont hurt hardware reliability
- Then conserve energy
- Radical idea can conserving energy improve
hardware reliability?
441st Milestone Conserve Energy Improve
Reliability
- Improve component reliability?
- Disks Lifetimes measured in Powered On Hours,
but limited to 50,000 start/stop cycles - Idea, if turn off disks 50, then 50 annual
failure rate as long as dont exceed 50,000
start/stop cycles ( once per hour) - Integrated Circuits lifetimes affected by
Thermal Cycling (fast change bad),
Electromigration (turn off helps), Dielectric
Breakdown (turn off helps) - Idea If limited number of times cycled
thermally, could cut IC failure rate due EM, DB
by 30?
See A Case For Adaptive Datacenters To Conserve
Energy and Improve Reliability, Peter Bodik,
Michael Armbrust, Kevin Canini, Armando Fox,
Michael Jordan and David Patterson, submitted to
HotOS 2007.
45RAD Lab 2.0 2nd Milestone Killer Web 2.0 Apps
- Demonstrate RAD Lab vision of 1 person creating
next great service and scale up - Where get example great apps, given grad students
creating the technology? - Use Undergraduate Computing Clubs to create
exciting apps in RoR using RAD Lab equipment,
technology - Recruit lecturer Dan Garcia to be RoR club leader
- Recruit Real World RoR programmer (part-time or
consultant) to develop code and advise RoR
computing club during school year - Hire best ugrads to work summers to build RoR apps
46Outline
- Technology Trends
- RAD Lab Vision 2.0
- The Datacenter is the Computer
- SML VMM provides the operating system
- Sensors (-trace, X-trace, Backplane, D-trace, )
- Actuators (VMM, Identity-based Routing Layer, )
- Ruby on Rails is the programming language
- Libraries as Web Services (MapReduce, Chubby)
- Storage as Web Services (GFS, Bigtable, Amazon
S3) - Research Accelerator for MP (RAMP) is simulator
- Automatic Workload Evaluator (AWE)is scaffolding
- Web 2.0 apps are the benchmarks
- RAD Milestones Energy Conservation, Web 2.0
- Revisiting RAD Lab Vision 1.0
47RAD Lab 5-year Mission
- Todays Internet systems complex, fragile,
manually managed, rapidly evolving - To scale Ebay, must build Ebay-sized company
- Moon shot mission statement
- Enable a single person to Develop, Assess,
Deploy, and Operate the next-generation IT
service - The Fortune 1 Million by enabling rapid
innovation - Create core technology to enable vision via
synergy across systems, networking, and SML - RAD 2.0 Making datacenter easier to manage
enables vision of 1 person Analyzing, Deploying,
Operating a scalable IT service
48Conclusion
- Datacenter Computer, SMLVM OS, Web Storage
File System, RoR Prog. Language, Web Services
Library, RAMP simulator, AWE tester, Web 2.0
apps benchmarks - Milestones 1) Energy Conservation Reliability
Enhancement, 2) DADO Web 2.0 apps - Where to borrow Ruby on Rails, web services,
VMM, network middle boxes, OS, - Where to invent Policy Maker, Trace, X-trace,
Identity-based Routing Layer, AWE, RAMP, - Technology trends to guide inventions
- Parallel CPUs, shift to DRAM, Bigger Disks,
Flash?, 10Gb LAN, Datacenter as backbone, VM
ubiquity
49Backup Slides
50Autonomic vs. RADical?
- IBM Autonomic Computing Manifesto
- Self-configuring/upgrading
- Self-optimizing (performance)
- Self-healing (from failures)
- Self-protecting (from attacks)
- 2 approaches
- Automate entire process (get rid of sys admins
from waterfall model) - 2) Libraries, tools to increase productivity of
Developer/Operator teams
51Datacenter is the Computer
- Orders of magnitude more devices (CPUs/rack, No.
racks/managed facility) - 1980s B of A datacenter 20 IBM 370 mainframes
100 CPUs, 100 disks - 1995 Network of Workstations-1 100 CPUs, 200
disks NOW-2 1000 CPUs, 2000 disks - 2000 Inktomi 10,000 nodes in 4 DCs
- 2005 Google 450,000 Nodes in 25 DC
- Computer Architecture Architecture!
John Markoff, Google's not-so-very-secret
weapon, NY Times, 6/13/06
52References
- Amazon Simple Storage Service http//aws.amazon.co
m/s3 - Google results available at http//labs.google.com
/papers/ - Jeffrey Dean and Sanjay Ghemawat MapReduce
Simplified Data Proc. on Large Clusters - Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
The Google File System - Mike Burrows, The Chubby Lock Service for
Loosely-Coupled Distributed Systems - Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson
C. Hsieh, Deborah A. Wallach, Mike Burrows,
Tushar Chandra, Andrew Fikes, and Robert E.
Gruber, Bigtable A Distributed Storage System
for Structured Data - Rob Pike, Sean Dorward, Robert Griesemer, and
Sean Quinlan, Interpreting the Data Parallel
Analysis with Sawzall