Mark Silberstein, CS, Technion - PowerPoint PPT Presentation

1 / 31
About This Presentation
Title:

Mark Silberstein, CS, Technion

Description:

none – PowerPoint PPT presentation

Number of Views:52
Avg rating:3.0/5.0
Slides: 32
Provided by: ResearchM53
Category:

less

Transcript and Presenter's Notes

Title: Mark Silberstein, CS, Technion


1
Computational Biology Laboratory
Distributed Systems Laboratory
Superlink-Online Harnessing the worlds
computers to hunt for disease-provoking genes
  • Mark Silberstein, CS, Technion
  • Dan Geiger, Computational Biology Lab
  • Assaf Schuster, Distributed Systems Lab
  • Genetics Research Institutes in Israel, EU, US

2
Purpose of disease gene hunting
  • Why Search ?
  • Detection of diseases before birth
  • Risk assessment and corresponding life style
    changes
  • Finding the mutant proteins and developing
    medicine
  • Understanding basic biological functions
  • How to Search ?
  • Find families segregating the disease (linkage
    analysis) or collect unrelated healthy and
    affected persons (Association analysis or LD
    mapping)
  • Take a simple blood test from some individuals
  • Analyze the DNA in the lab
  • Compute the most likely location of disease gene

3
Steps in Gene Hunting
Linkageanalysis(106107 bp)
4
Recombination During Meiosis
5
Familial Onychodysplasia and dysplasia of distal
phalanges (ODP)
III-15
IV-10
IV-7
6
Family Pedigree
7
Marker Information Added
8
Maximum Likelihood Evaluation
The computational problem find a value of ?
maximizing Pr(data?)
LOD score (to quantify how confident we are)
Z(?)log10Pr(data?) / Pr(data?½).
9
Results of Multipoint Analysis
10
The Bayesian network model



Si3f



Li2f
y2



Xi2



Li2m
Li3f



Xi3



Li3m



Y3



Li1f



Xi1
Y1



Li1m



Si3m
Locus 3
Locus 4
Locus 2 (Disease)
Locus 1
This model depicts the qualitative relations
between the variables. We need also to specify
the joint distribution over these variables.
11
The Computational Task
  • Computing Pr(data?) for a specific value of ?
  • Exponential time and space in
  • variables
  • five per person
  • markers
  • gene loci
  • values per variable
  • alleles
  • non-typed persons
  • table dimensionality
  • cycles in pedigree

12
Task length distribution
  • Task length unknown upon submission
  • From seconds to millenniums
  • Computing task length? NP hard
  • Estimate task length as we go

lt3minuts lt2hours lt2days lt2weeks
lt3months gt3months
13
Divisible Tasks through Variable Conditioning
non trivial parallelization overhead
14
Free resource pools, grids
  • Weak/no quality of service
  • Random failures of execution machines
  • Preemption due to higher priority tasks
  • Hardware bugs may lead to incorrect results
  • Potentially unbounded execution/queue waiting
    time
  • Dynamic/abrupt changes of resource availability
  • High network delays (communication over WAN)
  • Multiple tasks

15
Terminology
  • Basic unit of execution batch job
  • Non-interactive mode enqueue wait execute
    return
  • Self-contained execution sandbox
  • A linkage analysis request - a task
  • A bag (of millions) of jobs
  • Turnaround time is important

16
Requirements
  • The system must be geneticists-friendly
  • Interactive experience
  • Low response time for short tasks
  • Prompt user feedback
  • Simple, secure, reliable, stable,
    overload-resistant, concurrent tasks, multiple
    users...
  • Fast computation of previously infeasible long
    tasks via parallel execution
  • Harness all available resources grids, clouds,
    clusters
  • Use them efficiently!

17
Grids or Clouds?
Remaining Jobs in Queue
Long tail due to failures
Time
  • Small tasks are severely slow on grids
  • Takes 5 minutes on 10-nodes dedicated cluster
  • May take several hours on a grid

Should we move scientific loads on the cloud? YES!
18
Grids or Clouds?
  • Consider 3.2x106 jobs, 40 min each
  • It took 21 days on 6000-8000 CPUs
  • It would cost about 10K on Amazons EC2

Should we move scientific loads on the cloud? NO!
19
Clouds or Grids? Clouds and Grids!
Opportunistic
Dedicated
Burst computing
Throughput computing
20
Cheap and Expensive Resources
  • Task sensitivity to QoS differ in different stages

Remaining jobs in queue
  • Use cheap unreliable resources
  • Grids
  • Community grids
  • Non-dedicated clusters
  • Use expensive reliable resources
  • Dedicated clusters
  • Clouds
  • Dynamically determine entering tail mode
  • Switch to expensive resources (gracefully)

21
Glue pools together via overlay
Submitter to Grid 2
Issues granularity, load balancing, firewalls,
failed resources, scheduler scalability
22
Practical considerations
  • Overlay scalability and firewall penetration
  • Server may not initiate connect to the agent
  • Compatibility with community grids
  • The server is based on BOINC
  • Agents are upgraded BOINC clients
  • Elimination of failed resources from scheduling
  • Performance statistics is analyzed
  • Resource allocation depending on the task state
  • Dynamic policy update via Condor classad mechanism

23
(No Transcript)
24
Superlink-online 1.0 http//bioinfo.cs.technion.a
c.il
25
Task Submission
26
Superlink-online statistics
  • 1720 CPU years for 18,000 tasks during
    2006-2008 (counting)
  • 37 citations (several mutations found)
  • Examples Ichthyosis,"uncomplicated" hereditary
    spastic paraplegia (1-9 people per 100,000)
  • Over 250 (counting) users Israeli and
    international
  • Soroka H., Be'er Sheva, Galil Ma'aravi H.,
    Nahariya, Rabin H., Petah Tikva, Rambam H.,
    Haifa, Beney Tzion H., Haifa, Sha'arey Tzedek H.,
    Jerusalem, Hadassa H., Jerusalem, Afula H. NIH,
    Universities and research centers in US, France,
    Germany, UK, Italy, Austria, Spain, Taiwan,
    Australia, and others...
  • Task example
  • 250 days on single computer - 7 hours on 300-700
    computers
  • Short tasks few seconds even during severe
    overload

27
Using our system in Israeli Hospitals
  • Rabin Hospital, by Motti Shochats group
  • New locus for mental retardation
  • Infantile bilateral striatal necrosis
  • Soroka Hospital, by Ohad Birks group
  • Lethal congenital contractural syndrome
  • Congenital cataract
  • Rambam Hospital, by Eli Shprechers group
  • Congenital recessive ichthyosis
  • CEDNIK syndrome
  • Galil Maaravi Hospital, by Tzipi Faliks group
  • Familial Onychodysplasia and dysplasia
  • Familial juvenile hypertrophy

28
Utilizing Community Computing
3.4 TFLOPs, 3000 users, from 75 countries
29
Superlink-online V2(beta) deployment
Submission server
EGEE-II BIOMED VO
Dedicated cluster
UW in Madison Condor pool
12,000 hosts operational during the last month
Superlink_at_Campus
Superlink_at_Technion
OSG GLOW VO
30
3.1 million jobs in 21 days
60 dedicated CPUs only
31
Conclusions
  • Our system integrates clusters, grids, clouds,
    community grids, etc.
  • Geneticist friendly
  • Minimizes use of expensive resources while
    providing QoS for tasks
  • Generic mechanism for scheduling policy
  • Can dynamically reroute jobs from one pool to
    another according to a given optimization
    function (budget, energy, etc.)

32
Why GPUs?
Memory BW 88 GB/s peak 56GB/s observed on
GTX8800 NVIDIA - 550 Memory BW 21GB/s peak on
3.0 Ghz Intel Core2 Quad - 1100 CPUs 1.4x
annual growth GPUs 1.7x annual growth
33
NVIDIA Compute Unified Device Architecture (CUDA)
GPU
1 cycle TB/s
Global Memory
34
Key ideas (Joint work with John Owens -UC Davis)
  • Software-managed cache
  • We implement the cache replacement policy in
    software
  • Maximization of data reuse
  • Better compute/memory access ratio
  • A simple model for performance bounds
  • Yes, we are (optimal)
  • Use special function units for hardware-assisted
    execution

35
Results summary
  • Experiment setup
  • CPU single core Intel Core 2 2.4GHz, 4MB L2
  • GPU NVIDIA G80 (GTX8800), 750MB GDDR4, 128 SP,
    16K mem / 512 threads
  • Only kernel runtime included (no memory
    transfers, no CPU setup time)

2500 2 x 25 x 25 x 2
Use of SFU expf is about 6x slower than
on GPU, but 200x slower on CPU
Hardware
Software managed Caching
36
Acknowledgments
  • Superlink-online team
  • Alumni Anna Tzemach, Julia Stolin, Nikolay
    Dovgolevsky, Maayan Fishelson, Hadar Grubman,
    Ophir Etzion
  • Current Artyom Sharov, Oren Shtark
  • Prof. Miron Livny (Condor pool UW Madison, OSG)
  • EGEE BIOMED VO and OSG GLOW VO
  • Microsoft TCI program, NIH grant, SciDAC
    Institute for ultrascale visualization

If your grid is underutilized let us
know! Visit us at http//bioinfo.cs.technion.ac.i
l/superlink-online Superlink_at_TECHNION project
home page http//cbl-boinc-server2.cs.technion.ac
.il/superlinkattechnion
37
  • QUESTIONS???

Visit us at http//bioinfo.cs.technion.ac.il/supe
rlink-online
Write a Comment
User Comments (0)
About PowerShow.com