Title: Dr. Rosemary Renaut, renaut@asu.edu Director, Computational Biosciences
1- Dr. Rosemary Renaut, renaut_at_asu.edu Director,
Computational Biosciences - http//math.asu.edu/cbs/
02/23/2006
1
ATLANTA
2- A Professional Science Masters Program
- Mathematics and Statistics
- The School of Life Sciences
- Computer Science Engineering
- W. P Carey Schoool Of Business
3- OUTLINE
- THE CBS PROGRAM AT ASU OVERVIEW
- CBS CURRICULUM
- REQUIREMENTS
- SOME HISTORY
- FUTURE
- PROJECTS WHAT DO THEY INVOLVE
- OUR CASE STUDIES COURSE(S)
- INTRODUCING BASIC MATHEMATICS TO LS/CSE STUDENTS
3
02/23/2006
ATLANTA
44
ATLANTA
02/23/2006
5CORE REQUIREMENTS (30 hours)
- Scientific Computing for Biosciences (4)
- Case Studies/ Projects in Biosciences(4)
- Structural and Molecular Biology (4)
- Statistics and Experimental Design(6)
- Business Practice and Ethics(6)
- Internship and Applied Project(6)
5
ATLANTA
02/23/2006
6ELECTIVE TRACKS (12 hours)
- Genomics/Proteomics
- Data Mining Data Bases,
- Medical Imaging
- Molecular/Functional Genomics
- Microarray Analysis
- Individualized
6
02/23/2006
ATLANTA
7PRE-REQUISITES
- Calculus and Differential Equations
- Basic Statistics (junior)
- Discrete Algorithms and Data Structures
- Programming skills(C/Java)
- Cell biology, genetics(junior level)
- Organic and Bio Chemistry (junior)
- Motivation, creativity, determination!
7
02/23/2006
ATLANTA
8- Interdisciplinary Training/Team Work
- Internship/Applied Project Report
- Business, Management and Ethics
- (Health Services Administration MBA)
- Small Groups/Close Faculty Involvement
- Computer Laboratory
- Extensive Project work/Consulting
8
ATLANTA
02/23/2006
9- Internship is at least 400 hours , possible full
time summer - Student must write a project report with required
format - Student presents report to committee in oral exam
- International students can work off campus using
EIP program - Encourage students to seek projects outside AZ
10DATA
- Year 4 total 74 students, currently 30
- Graduates 33 (11 left without graduating)
- Internships NIH, ASU, Tgen, AZ Game and Fish,
- US Water conservation lab, AZ biodesign
- Jobs Tgen, ASU, Codon Solutions, Medical
record keeping, Matlab, St Judes Memphis, Walt
Disney! Cisco, Google (shortly arriving in
Tempe!!),Ingenuity - AZ Game and Fish
- PhD programs (10) Biology, Computer Science,
Biochemistry (France, UK and ASU)
10
ATLANTA
02/23/2006
11OTHER DEVELOPMENTS
- Undergraduate NIH MARC
- Calculus for Life Sciences (sophomore)
- Quantitative Skills (sophomore)
- Modeling Comp Bio (Junior)
- PhD Program Computational Biosciences
- Molecular Cellular Biology / Mathematics
11
02/23/2006
ATLANTA
12WHAT DO WE DO SPRING 2004
- Database Construction/Mining of Pathology
Specimens (Tgen) - Gegenbauer high resolution reconstruction for
MRI, ASU - TLS-SVM for Feature Extraction of Microarray
Data, ASU - Automated video analysis for cell behavior. Tgen
- EST DB for Marine Dinoflagellate Crypthecodinium
cohnii, ASU - Data mining for microsatellites in ESTS from
arabidopsis thaliana and brassica species (US
Water Conservation Laboratory) - The Genome Assembler- Tgen
- A user interface to support navigation for
scientific discovery ASU - Cell Migration Software Tool Tgen
12
02/23/2006
ATLANTA
13WHAT DO WE DO SPRING 2005
- EVALUATION OF BIOINFORMATICS RESOURCES
(Tgen/NIH/ASU) - Pattern recognition Automated Cytoskeleton
Reconstruction (ASU) - Develop workable database on crop Lesquerella
using Integrated Crop Information Systems (ICIS)
(US Water Conservation Laboratory) - Investigation of Xylella fastidiosa Within an
Almond Tree Population A Model System for Golden
Death ( ASU Mathecology, AZ) - Search for Epigenetic Properties of DNA and RNA
involved in X Chromosome Inactivation , (Codon
Solutions LLC)
13
02/23/2006
ATLANTA
14- WHAT DID WE NEED FOR THESE PROJECTS
Image Analysis Data Mining Fourier
Analysis Modeling Differential Equations Sequence
Comparisons Mathematics for Genetic Analysis
Statistics Data base development for
BIOLOGICAL APPLICATIONS Geographic Information
Systems PERL/BIOPERL/MATLAB/MYSQL
14
02/23/2006
ATLANTA
15- Bioinformatics Managing Scientific Data tackles
this challenge head-on by discussing the current
approaches and variety of systems available to
help bioinformaticians with this increasingly
complex issue. The heart of the book lies in the
collaboration efforts of eight distinct
bioinformatics teams that describe their own
unique approaches to data integration and
interoperability. Each system receives its own
chapter where the lead contributors provide
precious insight into the specific problems being
addressed by the system, why the particular
architecture was chosen, and details on the
system's strengths and weaknesses. In closing,
the editors provide important criteria for
evaluating these systems that bioinformatics
professionals will find valuable.
15
02/23/2006
ATLANTA
160
Column 1
Column 2
Column 3
- Computational Modeling Skills Motivated by Case
Studies - Phylogenetics and Tree Building (for the data
make the tree)
Human(A) Chimp(B) Gorilla(C) Orang-Utan(D) Gibbon(E)
Human(A) - .09190 .1082 .1790 .2057
Chimp(B) .0919/.0821 - .1134 .1940 .2168
Gorilla(C) .1057/.1083 .1161/.1330 - .1882 .2170
Orang-Utan(D) .1806/.1838 .1910.1838 .1895/.1838 - .2172
Gibbon(E) .2067/.2142 .2171/.2142 .2156/.2142 .2172/.2142 -
16
02/23/2006
ATLANTA
17All additive trees with 5 branches which is the
correct one?
17
02/23/2006
ATLANTA
18Repeat for all trees Use matlab Understand Least
Squares Nonnegative constraints Constrained
LS Exhaustive search Genetic Algorithms
For this tree we can calculate the patristic
distances between sequences pBDe2e6e7e4 T
his should match the distance from the measured
data We do a goodness of fit for all distances
p-d2 Ae d 2 What is A, what is e? Any
conditions on e?
18
02/23/2006
ATLANTA
190
Column 1
Column 2
Column 3
- Computational Modeling Skills Motivated by Case
Studies - Phylogenetics and Tree Building (for the data
make the tree)
Human(A) Chimp(B) Gorilla(C) Orang-Utan(D) Gibbon(E)
Human(A) - 79 92 144 162
Chimp(B) 79 - 95 154 169
Gorilla(C) 92 96.7 - 150 169
Orang-Utan(D) 149.3 154 152.1 - 169
Gibbon(E) 166.2 170.9 169 169 -
19
02/23/2006
ATLANTA
20An ultrametric tree what are the distances
ei? Solve the linear programming problem min
L(e) min ? ei, where this is the total length
of the tree. Moreover each length is positive,
and the total lengths are preserved eg e1e2, and
e4e8e1e6e7
LP problem with constraints max cTx with Axb x
0 Students identify x, c, b, A? Use matlab
linprog
20
02/23/2006
ATLANTA
21BUT THERE AREMANY DIFFERENT TREE SHAPESAND
WHICH IS CORRECT? WE NEED EXHAUSTIVE
SEARCH GENETIC ALGORITHMS?
21
02/23/2006
ATLANTA
22HOW WAS THIS USEFUL?
- Introduction to
- data fitting,
- optimization, genetic algorithms, exhaustive
search - matlab routines,
- Realistic solutions (positive branch lengths)
- Start on some multivariable calculus to derive
normal equations
OTHER APPLICATIONS USING SIMILAR TECHNIQUES
Neural networks for classification how do they
learn? Data mining k-means clustering
minimize energy Gradient Descent
22
ATLANTA
02/23/2006
23Clustering has recently been demonstrated to be
an important preprocessing step prior to
parametric estimation from dynamic PET images.
Clustering, as a form of segmentation, is useful
in improving the accuracy of voxel level
quantification in PET images. Classical
clustering algorithms such as hierarchical
clustering and K-means clustering can be applied
to dynamic PET data using an appropriate
weighting technique. New variants of hierarchical
clustering with different preprocessing criteria
were developed by Dr. Guo recently. Our research
focus is to validate these different algorithms
with respect to their efficiency and accuracy.
Different inter and intra cluster measures and
statistical tests are considered to assess the
quality of the different cluster results.
24(No Transcript)
25(No Transcript)
26(No Transcript)
27Otolith Aging and AnalysisWilliam T.
StewartAdvisors
Dr. Rosemary Renaut Dr. Paul MarshArizona State
University
Scott Byan Kirk Young Marianne MedingArizona
Game and Fish Department
- Otoliths, also known as earstones are paired
calcified structures used for balance and hearing
in teleost fish. An otolith is acellular and
metabolically inert providing biologists with a
record of exposure to both the temperature and
composition of the ambient water. Otoliths
provide an abundance of information ranging from
temperature history, detection of anadromy,
determination of migration pathways, stock
identification, use as a natural tag, and most
importantly age validation. Growth rings
(annuli) on the otolith record the age and growth
of a fish from birth to death. With the use of
Matlab the goal of this project is to design a
program that uses digital otolith images to
semi-automate the aging process. There are three
main components to this task.
28(No Transcript)
29New technology allows hundreds of pathology
specimens from human diseases to be sampled as
.6mm punches of tissues that are arrayed into new
TMA paraffin blocks these blocks are then
sectioned with microtomes to produce hundreds of
slides containing hundreds of human tissue
specimens (tissue microarrays, TMAs). Databases
to support analysis of these high throughput TMAs
will include information on diagnosis, treatment,
disease response, and multiple images from
follow-on studies linked to the coordinates of
each of the hundreds of punches on the TMA. Data
mining from the results of TMA experiments will
allow text mining and image feature extraction.
In this project, we present the requirements,
design, and a prototype of a web based TMA
database application.
30(No Transcript)
31(No Transcript)
32(No Transcript)
33Sequencing a Microbial GenomeMaulik Shah
Advisors Dr. Jeffrey Touchman Dr.
Rosemary Renaut Dr. Phillip
Stafford
- Although many genomes are available for download
today, the underlying technologies should not be
taken for granted. By using shotgun sequencing
techniques and a gauntlet of informatics, we are
able to produce high-quality DNA sequence. We
will first look at some of the robotics and
chemistries of preparing DNA as samples for the
sequencing instruments. Then we will look at the
series of applications used in taking raw data
signals, converting them to sequence and then
finally assembling the data into a single genome.
Highlighted will be some of the techniques used
to speed the informatics processes as well as
some of the challenges that informatics faces in
processing data and assembling the genome.
34Supertree Analysis of the Plant Family
Fabaceae Tiffany J. Morris Advisor Martin F.
Wojciechowski School of Life Sciences, Arizona
State University
The Tree-of-Life is a national and
international project to collect information
about the origin, evolution, and diversity of
organisms, with the goal of producing a tree of
all life on Earth (Pennisi, 2003). The obstacles
to achieving this goal are many. From questions
related to the kinds and number of data to be
used, to building that phylogeny, to the
methodological and computational resources
required to analyze the massive amounts of data
expected to be necessary to bring this to
fruition. The goal of this project is to obtain a
Supertree for the plant family Fabaceae utilizing
phylogenetic trees found in previously published
studies.
35PROTEIN INTERACTION MAPPING USE OF OSPREY TO MAP
SURVIVAL OF MOTOR NEURON PROTEIN INTERACTIONS by
Margaret BarnhartAdvisor Dr. Ron Nieman
- Spinal Muscular Atrophy is one of the leading
genetic causes of death in infants. In humans,
the disease state is characterized by homozygous
deletion of the telomeric copy of the survival of
motor neuron gene (SMN1). The centromeric copy,
SMN2, rescues lethality by producing a small
amount of full-length SMN protein as its minor
product. The SMN gene was first characterized in
1995, and research efforts to describe the
molecular mechanisms of SMN protein in the cell
have since revealed a highly complex set of
functions and interactions for SMN. The large
amount of protein-protein interaction data
collected for SMN exceeds the limitations imposed
by current methods of interaction
visualization. Osprey allows a network
representation of protein-protein interactions
and has been used to describe the recorded sets
of interactions of SMN. This method of
interaction visualization allows relationships to
be drawn between the functions of SMN and
analogous proteins, clustering of interactions
based on level of interaction or function, and
ultimately, the derivation of clues to the
critical function of SMN.
36- More Information please contact renaut_at_asu.edu
- More information on projects www.asu.edu/compbio
sci