Title: Scaleable Computing Jim Gray Researcher USWAT MSR San Francisco Microsoft Corporation GrayMicrosoft'
1Scaleable ComputingJim Gray
ResearcherUS-WAT MSR San FranciscoMicrosoft
CorporationGray_at_Microsoft.com
2Outline
- Why scaleable servers?
- Problems and solutions for scaleable servers
- How Internet Information Server revolutionizes
OLTP - Wolfpack Windows NT clusters for
scaleability, availability, manageability - ActiveX object model as structuring principle
- OLE DB (DAO) for data sources
- MTX as a new programming paradigm
- MTX as a server
- Distributed transactions to coordinate components
- Falcon queues for asynchronous processing
3Kinds Of Information Processing
Point-to-point
Broadcast
Lecture Concert
Conversation Money
Network
Immediate
Book Newspaper
Mail
Time-shifted
Database
Its ALL going electronic Immediate is being
stored for analysis (so ALL database) Analysis
and automatic processing are being added
4Why Put EverythingIn Cyberspace?
Low rent - min /byte Shrinks time - now
or later Shrinks space - here or
there Automate processing - knowbots
5Magnetic Storage Cheaper Than Paper
- File cabinet cabinet (four drawer) 250 paper
(24,000 sheets) 250 space (2x3 _at_
10/ft2) 180 total 700 3/sheet - Disk disk (4 GB ) 800 ASCII 2 mil pages
0.04/sheet (80x cheaper) - Image 200,000 pages 0.4/sheet (8x cheaper)
- Store everything on disk
6DatabasesInformation at Your Fingertips
Information NetworkKnowledge Navigator
- All information will be in anonline database
(somewhere) - You might record everything you
- Read 10MB/day, 400 GB/lifetime(eight tapes
today) - Hear 400MB/day, 16 TB/lifetime(three
tapes/year today) - See 1MB/s, 40GB/day, 1.6 PB/lifetime (maybe
someday)
7Database StoreALL Data Types
- The new world
- Billions of objects
- Big objects (1 MB)
- Objects have behavior (methods)
- The old world
- Millions of objects
- 100-byte objects
- Paperless office
- Library of Congress online
- All information online
- Entertainment
- Publishing
- Business
- WWW and Internet
8Billions Of Clients
- Every device will be intelligent
- Doors, rooms, cars
- Computing will be ubiquitous
9Billions Of ClientsNeed Millions Of Servers
- All clients networked to servers
- May be nomadicor on-demand
- Fast clients wantfaster servers
- Servers provide
- Shared Data
- Control
- Coordination
- Communication
Clients
Mobileclients
Fixedclients
Servers
Server
Super server
10Conclusion
- Commodity hardware allowsnew applications
- New applications need huge servers
- Ideally, clients and servers arebuilt of the
same stuff - Servers should be built from
- Commodity software and
- Commodity hardware
- Servers should be able to
- Scale up (grow by adding CPUs,disks, networks)
- Scale down (can start small)
11Scaleable SystemsBOTH SMP And Cluster
Grow up with SMP 4xP6is now standard Grow out
with cluster Cluster has inexpensive parts
SMP superserver Departmentalserver Personalsy
stem
Clusterof PCs
12SMPs Have Advantages
- Single system image easier to manage, easier to
program threads in shared memory, disk, Net - 4x SMP is commodity
- Software capable of 16x
- Problems
- gt4 not commodity
- Scale-down problem (starter systems expensive)
- There is a BIGGEST one
SMP superserver Departmentalserver Personalsy
stem
13The TPC-C RevolutionShows How Far SMPs Have Come
- Performance is amazing
- 2,000 users is the min!
- 30,000 users on a 4x12 alpha cluster (Oracle)
- Prices dropping fast
Better
14The TPC-C Revolution Shows How Far NT and SQL
Server have Come
- Economy of scale on Windows NT
- Recent Microsoft SQL Server benchmarks are
Web-based
Better
15TPC-C Web-Based Benchmarks
- Client is a Web browser (6,000 of them!)
- Submits
- Order
- Invoice
- Query to server via Web page interface
- Web server (Internet Information Server) acts as
a TP monitor - Translates request to ODBC
HTTP
16TPC-C Web-Based Benchmarks
- SQL Server executes, returns ODBC
- Web server builds HTML page
- Sends it to clientvia HTTP
- 6750 transactions/minute C on 4xP6
- Net Internetserver performance is GREAT!
HTTP
17What Happens To Prices?
- No expensive UNIX front end (20/tpmC)
- No expensive TP monitorsoftware (10/tpmC)
- gt 81/tpmC
18Scaleable SystemsClusters Scale Beyond Largest
SMP
SMP superserver Departmentalserver Personalsy
stem
Clusterof PCs
19Clusters Have Advantages
- Clients and servers made from the same stuff
- Inexpensive
- Built with commodity components
- Fault tolerance
- Spare modules mask failures
- Modular growth
- Grow by adding small modules
- Unlimited growth no biggest one
20ParallelismThe OTHER aspect of clusters
- Clusters of machines allow two kinds of
parallelism - Many little jobs online transaction processing
- TPC-A, B, C
- A few big jobs data search and analysis
- TPC-D, DSS, OLAP
- Both give automatic parallelism
21The Parallel Law Of Computing
Grosch's Law
Parallel Law Needs Linear speedup and
linear scale-up Not always possible
2x is 4x performance
2x is2x performance
1,000 MIPS 1,000
1 MIPS 1
22ThesisMany little beat few big
1 million
100 K
10 K
Pico Processor
Micro
Nano
10 pico-second ram
1 MB
Mini
Mainframe
10
0
MB
1
0 GB
1
TB
1
00 TB
1.8"
2.5"
3.5"
5.25"
1 M SPECmarks, 1TFLOP 106 clocks to bulk
ram Event-horizon on chip VM reincarnated Multi
program cache, On-Chip SMP
9"
14"
- Smoking, hairy golf ball
- How to connect the many little parts?
- How to program the many little parts?
- Fault tolerance?
23Future Super Server4T Machine
- Array of 1,000 4B machines
- 1 bps processors
- 1 BB DRAM
- 10 BB disks
- 1 Bbps comm lines
- 1 TB tape robot
- A few megabucks
- Challenge
- Manageability
- Programmability
- Security
- Availability
- Scaleability
- Affordability
- As easy as a single system
Cyber Brick a 4B machine
Future servers are CLUSTERS of processors,
discs Distributed database techniques make
clusters work
24The Hardware Is In PlaceAnd then a miracle
occurs
?
- SNAP scaleable networkand platforms
- Commodity-distributedOS built on
- Commodity platforms
- Commodity networkinterconnect
- Enables parallel applications
25Two Scaleability Projects1-TB DB and 1 billion
TPD
1 Terabyte DB
Grow UP and grow OUT
1 billion transactions per day
26Building The Biggest Node
- There is a biggest node (size grows over time)
- Today, with Windows NT, it is probably 1TB
- We are building it (with help fromDEC and SPOT)
- 1 TB GeoSpatial SQL Server database
- (1.4 TB of disks 280 drives)
- 30K BTU, 8 KVA, 1.5 metric tons
- We plan to put it on the Web as a demonstration
application - It will hold satellite images of the entire
planet - One pixel per 10 meters
- Better resolution in U.S. (courtesy of USGS)
27Whats A TeraByte?
1 Terabyte 1,000,000,000 business letters 150
miles of book shelf 100,000,000 book pages
15 miles of book shelf 50,000,000 FAX
images 7 miles of book shelf
10,000,000 TV pictures (mpeg) 10 days of
video 4,000 LandSat images
16 earth images (100m) Library of Congress (in
ASCII) is 25 TB
1980 200 million of disc
10,000 discs 5
million of tape silo 10,000 tapes
1996 200,000 of magnetic disc 120
discs 50,000 nearline tape
50 tapes Terror Byte!
28User Interface
Next
29What The 1-Billion TPDProject Is Doing
- Building a 20-node Windows NTCluster (with help
from Intel) - All commodity parts
- Using SQL Server DTCdistributed transactions
- Each node has 1/20th of the DB
- Each node does 1/20th of the work
- 15 of the transactions are distributed
- Uses the Viper distributedtransaction
coordinator
30How Much Is 1 Billion Transactions Per Day?
- 1 Btpd 11,574 tps (transactions per second)
700,000 tpm (transactions/minute) - ATT
- 185 million calls (peak day worldwide)
- Visa 20 M tpd
- 400 M customers
- 250,000 ATMs worldwide
- 7 billion transactions / year (cardcheque) in
1994
31How Much Is 1 Billion Transactions Per Day?
- New York Stock Exchange
- 600,000 tpd
- Bank of America
- 20 M tpd checks cleared (more than any other
bank) - 1.4 M tpd ATM transactions
32Outline
- Why scaleable servers?
- Problems and solutions for scaleable servers
- How Internet Information Server revolutionizes
OLTP - Wolfpack Windows NT clusters for
scaleability, availability, manageability - ActiveX object model as structuring principle
- OLE DB (DAO) for data sources
- MTX as a new programming paradigm
- MTX as a server
- Distributed transactions to coordinate components
- Falcon queues for asynchronous processing
33Wolfpack Windows NT ClustersThe great hope
- Tandem, Teradata, VAX clusters are proprietary
- Microsoft 60 vendors defining Windows NT
Clusters - Code name Wolfpack
- Almost all big hardware and software vendors
involved - No special hardware needed -but it may help
34Wolfpack Windows NT ClustersThe Great Hope
- Fault-tolerant first, scaleable second
- First products 97H1 two-node failover
- Oracle and Microsoft giving demos today
- Next (98) scale to 16 or more nodes
- Will enable
- Commodity fault-tolerance
- Commodity parallelism (data mining, virtual
reality) - Also great for workgroups!
35Wolfpack clusters
- Key goals
- Easy to install, manage, program
- Reliable more reliable than single node
- Scaleable added parts add throughput
- Initial Wolfpack is two-node failover
- Each node can be 4x (or more) SMP
- File, print, Internet, mail, DB, other services
- Easy to manage
- Next (NT5) Wolfpack is modest size cluster
- About 16 nodes (so 64 to 128 CPUs)
- No hard limit, algorithms designedto go further
36SQL Server Failover Using Wolfpack Windows NT
Clusters
- Each server owns half the database
- When one fails
- The other server takes over the shared disks
- Recovers the database and serves it
37Outline
- Why scaleable servers?
- Problems and solutions for scaleable servers
- How Internet Information Server revolutionizes
OLTP - Wolfpack Windows NT clusters for
scaleability, availability, manageability - ActiveX object model as structuring principle
- OLE DB (DAO) for data sources
- MTX as a new programming paradigm
- MTX as a server
- Distributed transactions to coordinate components
- Falcon queues for asynchronous processing
38The BIG PictureComponents and transactions
- Software modules are objects
- Object Request Broker (a.k.a., Transaction
Processing Monitor) connects objects(clients to
servers) - Standard interfaces allow software plug-ins
- Transaction ties execution of a job into an
atomic unit all-or-nothing, durable, isolated
Object Request Broker
39Component Object Model
- COM is Microsoft model, engine inside OLE ALL
Microsoft software is based on COM (ActiveX) - CORBA OpenDoc is equivalent
- Heated debate over which is best
- Both share same key goals
- Encapsulation hide implementation
- Polymorphism generic operationskey to GUI and
reuse - Versioning allow upgrades
- Transparency local/remote
- Security invocation can be remote
- Shrink-wrap minimal inheritance
- Automation easy
- COM now managed by the Open Group
40Linking And EmbeddingObjects are data
modulestransactions are execution modules
- Link pointer to object somewhere else
- Think URL in Internet
- Embed bytesare here
- Objects may beactive can callbackto subscribers
41OLE DB Objects Meet DatabasesThe basis for
universal data servers, access, integration
- OLE DB object-oriented (COM oriented)
programming interface to data - Breaks DBMS into components
- Anything can be a data source
- Optimization/navigation on top of other data
sources - A way to componentized a DBMS
- Makes an RDBMS and O-RDBMS (assumes optimizer
understands objects)
DBMS engine
42Commodity Software ComponentsInexpensive OS,
DBMSand plug-ins
- Recent TPC-C prices
- Oracle on DEC UNIX 30.4 k tpmC _at_ 305/tpmC
- Informix on DEC UNIX 13.6 k tpmC _at_ 277/tpmC
- DB2 on Solaris 6.4 ktpmC _at_ 200/tpmC
- SQL Server on Compaq, Windows NT 6.7 ktpmC _at_
90/tpmC (using Web, no TP monitor!) - Oracle on Windows NT 3.1 ktpmC _at_ 198/tpmC
- Net Open solutionscan do even biggest jobs
thousands of online users per node of cluster - ActiveX, VBX, andJava plug-ins
- Spreadsheets, GeoQuery, FAX, voice, image
libraries, commodity component market
43Transactions Coordinate Components (ACID)
- Programmers view bracket a collection of
actions - A simple failure model
- Only two outcomes
Begin() action action action
action Commit()
Begin() action action action Rollback()
Begin() action action action Rollback()
Fail !
Success!
Failure!
44Transactions Coordinate Components (ACID)
- Transaction properties
- Atomic all or nothing
- Consistent old and new values
- Isolated automatic locking or versioning
- Durable once committed, effects survive
- Transactions are built into modern OSs
- MVS/TM Tandem TMF, VMS DEC-DTM, NT-DTC
45OLE Transactions
- DTC is standard part of Windows NT and
Windows 97 (it is plumbing), basis for
Microsoft Transaction Services - Upward compatible with X/Open Distributed
Transaction Model - Interoperates with SQL Server, DB2, CICS, Encina,
Tuxedo, Topend - Object managers register with Transaction Manager
46OLE Transactions
- Application requests transaction identifier (XID)
- XID flows with method invocations
- Object Managers join (enlist)in transaction
- Distributed Transaction Manager coordinates
commit/abort
47Distributed Transactions Enable Huge Throughput
- Each node capable of 7 KtmpC (7,000 active
users!) - Can add nodes to cluster (to support 100,000
users) - Transactions coordinate nodes
- ORB / TP monitor spreads work among nodes
48Distributed Transactions Enable Huge DBs
- Distributed database technology spreads data
among nodes - Transaction processing technology manages nodes
49Microsoft Transaction ServiceA new programming
paradigm
- Develop your ActiveX object on the desktop
- Better yet download them from the Net
- Script your work flows as invocations of ActiveX
objects - All on desktop
- Then, move work flows and objects to server(s)
- Gives desktop development three-tier deployment
Server(s)
Client
Design and development phase
Database layer
Application objects
Workflow layer
Presentation layer
Application Objects
Database layer
Workflow layer
Deployment phase
Presentation layer
MTX execution environment
50MTX Provides Server-Side Execution Environment
Structure of a
Clients
- Accepts ActiveX objects
- Manages bindings(its an ORB)
- Efficient (pre-bound servers)
- Manages thread pools
- Manages security
- Includes transaction services
- Provides operator interface
- GUI administrative interface
scaleable server
Network
Directory registration, congestion and flow
control
Receiver
Queue
Authentication
Connections
Object handles
Context
Security
National language
Management
Configuration
Thread Pool
Scheduling and load balancing
Service logic
Synchronization
Deadlocks and starvation
Shared Data
51MTX Also Coordinates And Interoperates
- Coordinates distributed transactions
Client application
Windows NT
Windows NT
Windows NT
Server
Server
Server
Warranty
Sales
Inventory
52MTX Also Coordinates And Interoperates
- Interoperates with Internet and with legacy
systems
Browser/client
HTTP
DCOM
Windows NT Server 4.0
MTx
Internet Information Server
ActiveX Components
SNA Server
XA
LU6.2
OLETX
53Falcon Queue Management Asynchronous
transaction processing
- Many tasks aretime-shifted
- Falcon gives a QUEUE mechanism
- Message-oriented middleware
- Decouples clientfrom server
- Server works on priority queues
Point-to-point
Broadcast
lecture concert
conversation money
Net work
Immediate
book newspaper
mail
Time shifted
Database
Server
Client
54Outline
- Why scaleable servers?
- Problems and solutions for scaleable servers
- How Internet Information Server revolutionizes
OLTP - Wolfpack Windows NT clusters for scaleability,
availability, manageability - ActiveX object model as structuring principle
- OLE DB (DAO) for data sources
- MTX as a new programming paradigm
- MTX as a server
- Distributed transactions to coordinate components
- Falcon queues for asynchronous processing
55(No Transcript)