The CBSU Computational Biology Service Unit is a Cornell University bioinformatic core facility. - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

The CBSU Computational Biology Service Unit is a Cornell University bioinformatic core facility.

Description:

Hardware: cbsusrv02, Dell PowerEdge 2550 2 Pentium 3 1.3GHz, 2GB RAM, 70GB HD; ... Hardware: Dell PowerEdge 4600, 2 Pentium Xeon 3.0GHz, 12GB RAM, 1.0TB HD; Dell ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 31
Provided by: cornellthe
Category:

less

Transcript and Presenter's Notes

Title: The CBSU Computational Biology Service Unit is a Cornell University bioinformatic core facility.


1
Computational Biology Service Unit
  • The CBSU (Computational Biology Service Unit) is
    a Cornell University bioinformatic core facility.
  • Initiated by a collaboration among Cornell
    University, Rockefeller University, and Memorial
    Sloan-Kettering Cancer Center.
  • Since February 2006 CBSU is a Microsoft HPC
    Institute.
  • Hosted in the Cornell Theory Center that provides
    administrative and system support.

2
Computational Biology Service UnitMission
Research Collaborations    Web Computing /
HPC Computing Software Development
3
Computational Biology Service UnitHardware
We have significant dedicated MS Windows
computational resources for use in our
collaborative research, service and web
applications.
  • 3 clusters with 242 dual-CPU nodes combined
  • 3 general purpose servers for web applications
    and software development
  • 2 SQL servers, 1 ftp server, 1 file server 6TB
    combined storage.

more
4
Computational Biology Service Unit
Web Computing / HPC Computing We design
web-based interfaces that allow easy access to
our HPC dedicated computational resources by
biologists with limited computing experience
5
Computational Biology Service Unit
  • Web Computing / HPC Computing
  • Porting bioinformatics and computational biology
    application to HPC environment
  • by turning it into a genuine parallel
    application,
  • by using it with a parallel wrapper program.
  • There are massively parallel applications
    representing both categories in CBSU web pages.

6
Web computing general scheme
FTP server
network disk
File server
queue submission script
Web server
network disk
web page / .NET
Database server (SQL)
e-mail / web page
User
Clusters
7
Computational Biology Service Unit
  • http//cbsuapps.tc.cornell.edu/Easy-to-use
    interface to various computational biology tools
  • Parallel and serial tools
  • Many applications from different classes
    (sequence analysis, protein structure prediction,
    population genetics)
  • Communication with interface via web page and
    e-mail
  • Used in many research projects at Cornell and all
    over the world 26,000 job submissions since
    6/13/2003
  • ASP .NET, written in C.

8
(No Transcript)
9
Protein structure prediction
SER200
HIS440
GLU327
1EA5 Acetylcholinesterase
DDHSELLVNTKSGKVMGTRVPVLSSHISAFLGIPFAEPPVGNMRFRRPEP
KKPWSGVWNASTYPNNCQQYVDEQFPGFSGSEMWNPNREMSEDCLYLNIW
SSEEEBSVPSPRPKSTTVMVWIYGGGFYSGSSTLDVYNGKYLAYTEEVVL
VSLSYRVGAFGFLALHGSQEAPGNVGLLDQRMALQWVHDNIQFFGGDPKT
VTIFGESAGGASVGMHILSPGSRDLFRRAILQSGSPNCPWASVSVAEGRR
RAVELGRNLNCNLNSDEELIHCLREKKPQELIDVEWNVLPFDSIFRFSFV
PVIDGEFFPTSLESMLNSGNFKKTQILLGVNKDEGSFFLLYGAPGFSKDS
ESKISREDFMSGVKLSVPHANDLGLDAVTLQYTDWMDDNNGIKNRDGLDD
IVGDHNVICPLMHFVNKYTKFGNGTYLYFFNHRASNLVWPEWMGVIHGYE
IEFVFGLPLVKELNYTAEEEALSRRIMHYWATFAKTGNPNEPHSQESKWP
LFTTKEQGGTGGGKFIDLNTEPMKVHQRLRVQMCVFWNQFLPKLLNATAC
10
(No Transcript)
11
(No Transcript)
12
Computational Biology Service Unit
The most popular applications Job submission
from 6/13/2003 to 6/21/2006
LOOPP 10,351 protein structure prediction MDIV
9,822 population genetics P-BLAST
2,223 sequence analysis / data mining MrBayes
1,049 population genetics All applications
26,551 LOOPP parallel, uses 5-20 nodes for 3-10
hours MDIV serial, uses 1 node from few hours to
two weeks (average 2-5 days) P-BLAST parallel,
restricted resource, uses 10 100 nodes for a
few days to a week (average few
days) MrBayes parallel, uses 8-20 nodes for a few
hours to two weeks (average a week)
13
Computational Biology Service Unit The interface
allows for easy administration of users and
applications

14
Computational Biology Service Unit
Web Computing / HPC Computing Currently users
interact with our applications via web pages
using web server and ftp server directly. We plan
to expand the interface by allowing web service
access, and also by possible integration with MS
Excel for some applications. Some of our
applications are used by bioinformatics
metaservers.
15
GENOME-WIDE SCREENING FOR GENES IMPORTANT FOR
Listeria monocytogenes NICHE ADAPTATION
Computational Biology Service UnitResearch
Collaboration Example
  • Renato Orsi, Martin Wiedmann
  • Department of Food Science
  • Cornell University

16
Computational Biology Service UnitResearch
Collaboration Example
Listeria monocytogenes and listeriosis
  • Causes listeriosis in humans, cows, sheep and
    goats
  • Estimated to be responsible for (among known
    pathogens)
  • 0.01 of foodborne illnesses
  • 3.8 of foodborne hospitalizations
  • 27.6 foodborne deaths

17
Computational Biology Service UnitResearch
Collaboration Example
PathogenTrackeropen, web-based database for
information exchange on bacterial subtypes and
strains and for studies on bacterial biodiversity
and strain diversity. The database is useful in
finding source of bacterial food contamination,
epidemiology, and in research on food
pathogens. Developed in collaboration with CBSU.
Microsoft SQL Server/ASP.NET Web Server
more
18
Computational Biology Service UnitResearch
Collaboration Example
GENOME SEQUENCES AVAILABLE NOW
CLIP 11262
non-pathogenic
cheese isolate associated with the 1985
listeriosis outbreak in California
F6854
associated with a multi-state outbreak in US
between 1998-99
EGD-e
associated with a sporadic listeriosis case in
Oklahoma, in 1998
F2365
laboratory strain firstly associated with animal
cases in 1924.
H7858
19
Computational Biology Service UnitResearch
Collaboration Example
GOALS
  • Identify genes involved in pathogenicity, (type
    of bacteria that cause disease or not)
  • Identify putative
  • Vaccine targets
  • Diagnosis assay targets
  • Therapeutics/drugs targets
  • Growth inhibitors targets

20
Computational Biology Service UnitResearch
Collaboration Example
Procedures
  • Identifying and align the evolutionary related
    genes (orthologs)
  • BLAST and TribeMCL (10 nodes, 1 hour)
  • Run analyses
  • Positive selection PAML (50 nodes, 5 hours)

21
Computational Biology Service UnitResearch
Collaboration Example
Selection criteria
  • Equivalent rates of DNA mutations that affect/not
    affect protein sequences (synonymous and
    nonsynonymous changes) neutral
  • Higher rate of synonymous than nonsynonymous
    changes negative selection
  • Higher rate of nonsynonymous than synonymous
    changes positive selection

22
Computational Biology Service UnitResearch
Collaboration Example
Summary of results
  • 110 genes under positive selection in overall
    analyses
  • Approximately 30-50 genes under positive
    selection in lineage I and lineage II

23
Computational Biology Service UnitResearch
Collaboration Example
FUTURE WORK
  • Experimentally confirm positive selection and
    recombination of selected genes (10) using a
    set of 40 isolates
  • Apply the similar approach to other food borne
    pathogens, including Salmonella.

24
Computational Biology Service Unit
  • Experience with Microsoft Compute Cluster Server
  • We are using now final release of CCS (recently
    upgraded), which is linked to our web interface
    (users can run on cluster)
  • CCS is relatively simple to deploy and upgrade,
    however it requires access rights to Active
    Directory. We experienced only minor problems
    related to network configuration (NIC1 must be
    external) and RIS.
  • It is easy to use, both from experienced user
    point of view (GUI, cmd), as well as
    programmatically from our web interface
  • The scheduler is very robust and handles jobs
    well. We didnt have any problems with many jobs
    submitted through the interface since the middle
    of April.

25
Computational Biology Service Unit
  • Experience with Microsoft Compute Cluster Server
  • Problems Suggestions
  • Users should have Remote Desktop access to
    allocated nodes, controlled by the scheduler
  • Due to authentication problems users cannot run
    mpi jobs manually on the nodes only via the
    scheduled jobs
  • 32-bit version of CCS is needed! We would be
    happy to use CCS on our 190 node Windows cluster
    as well

26
(No Transcript)
27
Computational Biology Service Unithardware
  • 192 node  cluster (queue name cbsu1). 192 2-CPU
    Dell nodes with Pentium 4 Xeon  2.4GHz 
    processors, 2GB RAM and 56GB local HD space. OS
    MS Windows Server 2003 with CTC's Cluster
    Controller.
  • 20 node experimental Microsoft Compute Cluster
    (queue name cbsum). 20 2-CPU Dell PowerEdge 1855
    nodes with x64 Pentium 4 Xeon 3.4GHz, 4GB RAM
    and 144GB local HD space. OS MS Windows Compute
    Cluster Server 2003.
  • 40 node cluster (queue name cbsu2). 40 2-CPU
    Dell PowerEdge 1855 nodes with x64 Pentium 4 Xeon
    3.4GHz, 4GB RAM and 144GB local HD space. OS MS
    Windows Server 2003 with CTC's Cluster
    Controller. 
  • 3 general purpose Windows servers for web
    applications and software development. OS MS
    Windows Server 20003. Hardware cbsusrv02, Dell
    PowerEdge 2550 2 Pentium 3 1.3GHz, 2GB RAM, 70GB
    HD cbsusrv02, Dell PowerEdge 2650 2 Pentium 4
    Xeon 2.6GHz, 4GB RAM, 70GB HD cbsusrv03, Dell
    PowerEdge 2850, 2 Pentium 4 Xeon Xeon 3.4GHz
    2GB RAM 320GB HD.  
  • 2 database servers running Microsoft SQL Server
    2005. OS MS Windows 2003 Server. Hardware Dell
    PowerEdge 4600,  2 Pentium Xeon 3.0GHz, 12GB
    RAM, 1.0TB HD Dell PowerEdge 2850,  2 Pentium
    Xeon 3.4GHz, 8GB RAM, 1.5TB HD. 
  • File server Dell PowerEdge 2850 with 2 Pentium
    Xeon 3.4GHz, 2GB RAM, 1.5TB HD. OS MS Windows
    Server 2003. 
  • 3 general purpose Linux servers. OS RedHat
    Enterprise Linux  AS 4. Hardware  cbsuss02, Dell
    PowerEdge 2650 2 Pentium 4 Xeon 2.8GHz 8GB RAM
    32GB HD cbsuss01 Dell PowerEdge 2850, 2 Pentium
    4 Xeon Xeon 3.4GHz 2GB RAM 320GB HD cbsuss03,
    Dell PowerEdge 2850, 2 Pentium 4 Xeon Xeon
    3.4GHz 2GB RAM 320GB HD.  
  • 1 microarray analysis server. Dell PowerEdge 4600
    server dedicated to running GeneTraffic
    microarray data analysis software. OS Linux. 
  • 3 CBSU Collaboratory desktops. We have 3 desktop
    computers available  for our collaborators.

back
28
Computational Biology Service UnitPathogenTracker
  • Sample information
  • GPS, time, Source (human, food, soil, et al),
    patient pathology, et al
  • Isolate information
  • Genus, species, ribotype, pulse-filed gel
    electrophoresis patterns, sequences of a set of
    marker genes, et al
  • Data source and users
  • Cornell Food Safety laboratory
  • Dairy farms, food processors in New York State
  • Input method
  • Web page
  • Scripts for loading Excel spreadsheets
  • Data access
  • SQL direct access
  • Web search pages through PathogenTracker web site

29
Computational Biology Service Unit
PathogenTracker
PathogenTracker allows web-based database
searches using all data fields including ribotype
pattern, DNA sequence, phenotypic
characteristics, PFGE patterns and all text
fields.
Current data status of PathogenTracker
back
30
Computational Biology Service UnitUnit Members
  • Dr. Jaroslaw Pillardy
  • Dr. Qi Sun
  • Dr. Daniel Ripoll
  • Dr. Tamara Galor
  • Dr. Robert Bukowski
  • Academic director Prof. Ron Elber (CS)
  • More than 40 ongoing scientific collaborations
Write a Comment
User Comments (0)
About PowerShow.com