Title: Improving the Research Bootstrap of Condor High Throughput Computing for Non-Cluster Experts Based on Knoppix Instant Computing Technology
1Improving the Research Bootstrap of Condor High
Throughput Computing for Non-Cluster Experts
Based on Knoppix Instant Computing Technology
- RIKEN Genomic Science Center
- Fumikazu KONISHI
2Background
- Biologists need a high performance computing
system for their research process. However, they
do not know how to build a cluster system by
themselves.
3Meet Chie-san.
I borrowed slides from Condor.
4Chie-sans Application
- Run a Sequence Sweep of InterProScan for Mouse
cDNAs of a total of 103,000 clones . - InterProScan takes on the average 1 minute to
compute on a typical workstation (total
103000 1 103000 minutes 1716 hours ) - InterProScan requires 6G bytes Public Database
set for each.
http//www.ebi.ac.uk/interpro/README1.html
5I have 103,000 sequences to search a gene
functional domain. And I am Non-Cluster Experts.
Who will help me?
6Getting Knoppix for InterProScan High Throughput
computing Edition
- Available as a free download from
- Google Search fumikazu.
- Download the image file.
- The image includes
- InterProScan4.1
- Condor 6.6.10
- PVFS2 1.2
- Ganglia 3.0.1
7Chie-san can boot up by an image of Instant High
Throughput Computing with an Application on labs
machines
She can borrow labs computers on weekend without
any software installation.
8Goal
- This research goal is to provide an instant high
performance bioinformatics research workbench for
all biology researchers, and allow us easy setup
in collaborative project without side effect to
local system.
Bioinformatics
9Instant Setup Technologies
- Install-Based Deploy System
- RPM-Based automatic configuration technology
(Redhat) - NPACI Rocks toolkits (UCSD)
- Image-Based Deploy System
- Live-CD technology (Knoppix)
10Key Solutions
- Knoppix
- A GNU/Linux distribution that construct a machine
without hard disk instillation. - Parallel File System
- PVFS is intended a high-performance parallel file
system for cluster computing. This system
provides high bandwidths access and huge volume
storage area.
11Parallel File System on RAM Disk
12Knoppix for InterProScan4.1 High Throughput
Computing Edition
13Worker Node
PXE Boot
Head Node
Database download server
14Step 1 Booting image
Boot the head node, IP address leased by the DHCP
server is displayed after the boot sequence.
15Step 2 after the successful, two setup
optionsEASY and ADVANCEDare displayed on the
screen.
16Step 3 Boot work nodes
All nodes must support PXE boot The system must
automatically assess whether sufficient resources
are available for the database arrangement of
InterProScan4.1.
17Step 4 building cluster system
18(No Transcript)
19Download InterProScan database set
20Testing
The system submits a single test job. The test
jobs are completed in a few minutes. The condor
job status is displayed on the browser, and
Ganglia provides a large amount of information on
all nodes. All configurations can be tested in
this phase.
21Results
22(No Transcript)
23Web site
http//big.gsc.riken.jp/index_html/Members/fumikaz
u/htc
24Questions