THsort PennySort Award Ceremony Beijing China 19 October 2002 - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

THsort PennySort Award Ceremony Beijing China 19 October 2002

Description:

The need for long-range research. Some long-range systems ... April Fools 1995: Datamation Sort. Sort 1M 100 B records. An IO benchmark: 15-min to 1 hr! ... – PowerPoint PPT presentation

Number of Views:78
Avg rating:3.0/5.0
Slides: 29
Provided by: georg9
Category:

less

Transcript and Presenter's Notes

Title: THsort PennySort Award Ceremony Beijing China 19 October 2002


1
THsort PennySort Award CeremonyBeijing
China19 October 2002
  • Peng Liu, Yao Shi, Li Zhang, Kuo Zhang, Tian
    Wang, ZunChong Tian, Hao Wang, Xiaoge Wang
  • Trophy presentation by Jim Gray

2
Outline
  • Penny Sort history and Award
  • The need for long-range research
  • Some long-range systems research goals.
  • What I have been doing.

3
Benchmark History
1970
IBM TP 1-7CA and Tony Lukes
Debit Credit Gray
1980
Wisconsin Bitton Boral DeWitt Turbyfill
Datamation Anon et al
Sort
MCC Boral ...
Teradata Bollinger ...
TPC-A
1990
TPC-B
TPC-C
TPC-D
PennySort MinuteSort
TPC-W ?
2000
4
A Short History of Sort
  • April Fools 1995 Datamation Sort
  • Sort 1M 100 B records
  • An IO benchmark 15-min to 1 hr!
  • 1993 Minute PennyxDaytona Indy
  • 1998 TeraByte Sort
  • Web site http//research.Microsoft.com/barc/SortB
    enchmark/

5
Ground Rules
  • How much can you sort for a penny (in a minute).
  • Hardware and Software cost
  • Depreciated over 3 years
  • 1M system gets about 1 second,
  • 1K system gets about 1,000 seconds.
  • Time (seconds) SystemPrice () / 946,080
  • Input and output are disk resident
  • Input is
  • 100-byte records (random data)
  • key is first 10 bytes.
  • Must create output file and fill with sorted
    version of input file.
  • Daytona (product) and Indy (special) categories

6
PennySort
  • Hardware
  • 266 Mhz Intel PPro
  • 64 MB SDRAM (10ns)
  • Dual Fujitsu DMA 3.2GB EIDE disks
  • Software
  • NT workstation 4.3
  • NT 5 sort
  • Performance
  • sort 15 M 100-byte records (1.5 GB)
  • Disk to disk
  • elapsed time 820 sec
  • cpu time 404 sec

7
1999 PennySort
  • Daytona Indy 2.58 GB in 917 sec
  • HMsort Brad Helmkamp, Keith McCready,
    Stenograph LLC
  • Intel 400Mhz2 IDE disks

8
1998 TB Sort
  • Chris NybergNsortSGI 32x Origin2000151 Minutes

9
1999 Terabyte Sort
  • Daytona Daivd Cossock, Sam Fineberg,Pankaj
    Mehra, John PeckTandem/Sandia TSort 68 CPU
    ServerNet47 minutes
  • Indy IBM SPsort
  • 408 nodes, 1952 cpu 2168 disks
  • 17.6 minutes 1057sec
  • (all for 1/3 of 94M, slice price is 64k for
    4cpu, 2GB ram, 6 9GB disks interconnect

10
SP sort
  • 2 4 GBps!

11
1999 Sort Records
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

   
12
The THsort Team(and friend)
13
2x/year!
  • Partly hardware
  • Partly software
  • Partly economics

THsort 1TB/
14
Progress on Sorting
  • Speedup comes from Moores law 40/year
  • Processor/Disk/Network arrays 60/year (this is
    a software speedup).

THsort 1TB/
15
Musings PennySortTBsort
  • Sorts 1TB in 1Minute
  • 2 pass so 3TB of disk
  • 10 disks if 330GB/disk
  • 5Gps (if each disk is 50Mbps)
  • So, 600 seconds (3TB/5GBps)
  • So, node costs 1.5k
  • Costs 100x that today
  • maybe in 4 years?

16
Outline
  • Penny Sort history and Award
  • The need for long-range research
  • Some long-range systems research goals.
  • What I have been doing.

17
Properties of a Research Goal
  • Simple to state.
  • Not obvious how to do it.
  • Clear benefit.
  • Can be broken into smaller steps
  • So that you can see intermediate progress.
  • Progress and solution is testable.

18
I was motivated by a simple goal
  • Devise an architecture that scales up Grow the
    system without limits.
  • This is impossible (without limits?),
    but...This meant automatic parallelism,
  • automatic management,
  • distributed,
  • fault tolerant,
  • high performance
  • Benefits
  • long term vision guides research problems
  • simple to state, so attracts colleagues and
    support
  • Can tell your friends family what it is that
    you do ?.

scaleup 1,000,000 1
19
Three Seminal Papers
  • Babbage Computers
  • Bush Automatic Information storage access
  • Turing Intelligent Machines
  • Note
  • Previous Turing lectures described several
    theory problems.
  • Problems here are systems problems.
  • Some include a and prove it clause.
  • They are enabling technologies, not applications.
  • Newells Intelligent Universe (Ubiquitous
    computing.) missing because I could not find
    simple-to-state problems.

20
Charles Babbage (1791-1871)
  • Babbages computing goals have been realized
  • But we still need better algorithms faster
    machines
  • What happens when
  • Computers are free and infinitely powerful?
  • Bandwidth and storage is free and infinite?
  • Remaining limits
  • Content the core asset of cyberspace
  • Software Bugs, 100 per line of code (!)
  • Operations 1,000 /node/year

21
ops/s/ Had Three Growth Curves 1890-1990
Combination of Hans Moravac Larry Roberts
Gordon Bell WordSizeops/s/sysprice
  • 1890-1945
  • Mechanical
  • Relay
  • 7-year doubling
  • 1945-1985
  • Tube, transistor,..
  • 2.3 year doubling
  • 1985-2000
  • Microprocessor
  • 1.0 year doubling

22
Trouble-Free Appliances
  • Appliance just works. TV, PDA, desktop, ...
  • State replicated in safe place (somewhere else)
  • If hardware fails, or is lost or stolen,
    replacement arrives next day (plugplay).
  • If software faults, software and state refresh
    from server.
  • If you buy a new appliance, it plugs in and
    refreshes from the server (as though the old one
    failed)
  • Most vendors are building towards this vision.
  • Browsers come close to working this way.

23
Trouble-Free Systems
  • Manager
  • Sets goals
  • Sets policy
  • Sets budget
  • System does the rest.
  • Everyone is a CIO (Chief Information Officer)
  • Build a system
  • used by millions of people each day
  • Administered and managed by a ½ time person.
  • On hardware fault, order replacement part
  • On overload, order additional equipment
  • Upgrade hardware and software automatically.

24
Trustworthy Systems
  • Build a system used by millions of people that
  • Only services authorized users
  • Service cannot be denied (cant destroy data or
    power).
  • Information cannot be stolen.
  • Is always available (out less than 1 second per
    100 years 8 9s of availability)
  • 1950s 90 availability, Today 99 uptime for
    web sites, 99.99 for well managed sites (50
    minutes/year)3 extra 9s in 45 years.
  • Goal 5 more 9s 1 second per century.
  • And prove it.

25
100 line of code?1 bug per thousand lines?
  • 20 to design and write it.
  • 30 to test and document it.
  • 50 to maintain it.
  • 100 total
  • The only thing in Cyber Space that is getting
    MORE expensive LESS reliable
  • Solution so far
  • Write fewer lines High level languages
  • Non Procedural
  • 10x not 1,000x better Very domain specific
  • Application generators
  • Web sites, Databases, ...
  • Semi-custom apps
  • SAP, PeopleSoft,..
  • Scripting Objects
  • JavaScript DOM

26
Automatic Programming Do What I Mean (not 100
Line of code!, no programming bugs) The holy
grail of programming languages systems
  • Devise a specification language or UI
  • That is easy for people to express designs
    (1,000x easier),
  • That computers can compile, and
  • That can describe all applications (is complete).
  • System should reason about application
  • Ask about exception cases.
  • Ask about incomplete specification.
  • But not be onerous.
  • This already exists in domain-specific areas.
    (i.e. 2 out of 3 already exists)
  • An imitation game for a programming staff.

27
Outline
  • Penny Sort history and Award
  • The need for long-range research
  • Some long-range systems research goals.
  • What I have been doing.

28
What I Have Been Doing
  • Traveling Talking
  • Helping Alex Build the SkyServer
  • Loading data
  • Helping build the Virtual Observatory
  • Doing spatial geometry in SQL (no kidding)!
  • Learning about web services (and implementing
    some)
Write a Comment
User Comments (0)
About PowerShow.com