Title: MB280 Introduction to Bioinformatics 3 credits Genes, machines and you' Learn the basics of analyzin
1MB280 Introduction to Bioinformatics (3
credits)Genes, machines and you. Learn the
basics of analyzing DNA and protein sequences
using state of the art computer software
Note This is an experimental class for graduate
students with limited space for undergraduates.
Last years class incorporating both student
groups was highly successful.
MB535-LabGenomic Analysis (4 credits)
Learn the basics of Bioinformatics database
searching, sequence analysis, phylogenetic
reconstruction and in silico research design.
Take advantage of biological databases that will
inform and direct your future research agenda.
- Tuesday lecture 200-315
- Thursday laboratory 200-500
- INBRE Bioinformatics Lab
- Cooley Lab B2
- Professor Marcie McClure
- mars_at_parvati.msu.montana.edu
2What was cover last lecture?
A little bit about the machines. General
Concepts about how hardware and software
interact Basic definitions
3- Tentative syllabus.
- 8/28/07 1) Class orientation, Introduction to
computers and operating languages - 8/30/07 1st) Lab Assignment know the machines
your monitor is accessing/using your own computer - i. Know how to access the Internetsend
me an email - ii Do the Unix/Linux tutorials.
- iii. Do a web search for the terms
bioinformatics and computational biology. - iv. Create list of sites and types of
methods necessary to do bioinformatics and
- computational biology.
- 9/4/07 2) What is Bioinformatics/Computational
Biology - 9/6/07 2nd) Lab Assignment
- i. Loading software on your own
machine. - ii. The NCBI/EBI/PR sites, familiarize yourself
with them. - iii. Do a conceptual translation of a nucleic
acid sequence. - iv. Choose a gene family to work on for the rest
of the semester. - 9/11/07 3) Database searching and pairwise
alignments - 9/13/07 3rd Lab Assignment
4Computational Biology is biology that cannot be
done without the intensive use of computers.
There are many domains in Computational Biology
Ecology Evolutionary Biology
Structural Biology
Bioinformatics
Physiology
McClure, 2000
5Whats in a name?
Encodeome Expressome Interactome
Mobilome Retrome Oming is very
sexy !
Genomics--DNA/RNA Transcriptomics--RNA Proteomics-
-Proteins Phenomics--Proteins Operomics--NA/Protei
ns Biological Informatics In silico research
Computational Biology Bioinformatics Phrase of
the month!
McClure, 2000
6What is Bioinformatics/Computational Biology?
These terms are used to describe technical and
methodological approaches to studying biology
that are computer based. The goal of this
research is the creation of new knowledge, or
meta-data, from existing primary data. This
type of research takes place in silico and
includes the development and testing of the
software tools necessary to analyze biological
data.
McClure, 2000
7Multidisciplinary Nature of Bioinformatics/Comput
ational Biology
Biological Sciences
How much of which does who need to learn?
Computer/ Systems Science
Mathematics Statistics
McClure 2002
8Opportunities in Bioinformatics/Computational
Biology
Training
Ph.D.
M.A.
B.A.
Service Providers Research Staff Principle
Investigators
Industry
Government
Academia
McClure 2002
9Bioinformatics/Computational Biology
New Knowledge
Evolution
Structure
Function
McClure, 2000
10The practice of Bioinformatics /Computational
Biology is an interplay between knowledge of
empirically derived data, bioinformatic tools
and human decision making. Exactly which
information and tools are to be accessed is
dependent on the nature of the question of
interest.
McClure, 2000
11- What is this class?
- Bioinformatics/Computational Biology
- A brief intro to computing concepts
- Basic concepts
- a) databases
- b) search
- c) align
- d) structure/function methods
- 3) How to use Bioinformatic methods
- 4) Designing in silico experiments
- 5) Interpretation of results
12Central Dogma of Bioinformatics
- i. databases
- ii. searches
- iii. sequence annotation/gene ontologies
- iv. alignments
- v. phylogenetics
- vi. functional predictions
- vii. structural predictions
- gene expression pathways and micro array data
- modeling
- math and algorithms
- programming
13McClure, 2000
14Why is critical reading of the literature
important? Why is reading the software manual
important?
McClure, 2000
15Levels of Analysis of Primary Structure Data
COMPLETE GENOME (global relationships) 1)
universal versus unique genes 2) consensus
phylogenetic relationship 3) genome architecture
(deviation from tree-like behavior)
INDIVIDUAL GENES (local relationships) 1)
congruency of phylogenies for individual genes 2)
relative rates of change 3) recombination 4) gene
architecture 5) gene product structure and
interactions
INTRAGENIC (sub-local relationships) 1) rate of
change 2) recombination 3) motif analysis
McClure, 2000
16Primary Structure the Sequence
Sequence Alignment
Phylogenetics gt70 id N.A. lt 70 id A.A.
2-D and 3-D Predictions
OSM lt 30 id A.A.
function
evolution
structure
McClure, 2001
17(No Transcript)
18From McClure, 1991
19Ordered Series of Motifs for the Reverse
Transcriptase Name I II III
IV V VI LINE ILIPKPGRD LMNIDAKIL
TGTRQGCP SLFADDMIVY RIKYLGIQL PCSWVGRIN LHERV WPV
QKTDGS YAAIDLANA TVLPQGYI VHYIDDIMLI SVKFLGSSG
HISYLGVLF EHERV LPVPKPGTK FTCLDLKDA TQLPQRFK
LQYVDDLLLG QVCYLGFTI VREFLGAVG FHERV ILPIKKPDG
FSVLDFKDF TILHQGFR LQHEDDLLLC KVSYLGLII
LLSFLGLVG WHERV LGVQKPNRQ FTVLDLQDA TILPQGFR
SVGVDDLLLA SQQYLGLKL LRGFLGVIG FRDHERV ILTVKKTNG
FSVLDFKNF TVLPQGFR LQYMDDLLIC AIQYLGIIM
FAFLGITR SHERV WPVRKPDGT HFVVDLANA TMLPQGYV
FHYIDDIMIL SAKLLGVIW FVGFLGYQ RHERV NLSGKKQYP
FTVLDLKDA TVLPQGFK LQYVDDLLIS TIEYLGFLL
LKGFLGMAG T47DHERV ILPVKKSDG FTVIDLKVD TVLPQGFT
LQYMDDLLIS EVKYLGHLI LRKFLGLVT KHERV FVIQKKSGK
LIIIDLKDC KVLPQGML IHCIDDILCA PFHYLGMQI
FQKLLGDIN IHERV ILPVKKSDG FTVIDLKDA TVLPQGFM
LQYVDDILIS KVKYLGRLI LRKFLGLVG HHERV LPVQKPDKS
YSVLDLKDG TVLPQGFR IQYIDELLLC SVTYLGIIL
LLSFLGMVG FMuLV LPVKKPGTN YTVLDLKDA TRLPQGFK
LQYVDDLLLA QVKYLGYLL LREFLGTAG HTLV1 FPVKKANGT
LQTIDLKDA RVLPQGFK LQYMDDILLA TIKFLGQII
LQALLGEIQ SRV2 FVIKKKSGK KIVIDLKDC KVLPQGMA
IHYMDDILIA PYTYLGFQI FQKLLGDIN Snakehead WPVGKPDG
S YSSLDISNG TRLPQGFH LQYVDDILLM QVQYLGVNV
LRSALGLFN Spuma YPVPKPDGR KTTLDLANG TRLPQGFL
QVYVDDIYLS TVEFLGFNI LQSILGLLN FIV FAIKKKSGK
VTVLDIGDA CSLPQGWI YQYMDDIYIG PYTWMGYEL
LQKLAGKIN HIV1 FAIKKKDST VTVLDVGDA NVLPQGWK
YQYMDDLYVG PFLWMGYEL IQKLVGKLN Dirs FTVPKPGTN
MVKLDIKKA KTMPFGLS IAYLDDLLIV SITFLGLQI
PRKLAGLKG Gypsy VLVPKKDGT FTTLDLHSG TVMPFGLV
NVYLDDILIF ETEFLGYSI AQRFLGMIN Caulimo KRRGKKRMV
FSSFDCKSG NVVPFGLK CVYVDDILVF KINFLGLEI
LQRFLGILT Badna EVAQKPRIV FSKFDLKAG NVCPFGIA
LLYIDDILIA EVEYLGVEI LQAYLGLLN HBV FLVDKNPHN
WLSLDVSAA RKIPMGVG FSYMDDVVLG SLNFMGYVI
IVGLLGFAA Copia WTITKRPEN KYQIDYEET MRLPQGIS
LLYVDDVVIA IKHFIGIRI CRSLIGCLM Intron VGGEKGPYS
TGRIDDQEN GLTPKTEF VRYADDLLLG TVEFPGMVI
KFRNLGNSI Retron TVEKKGPEK ILNIDLEDF NLLPQGAP
TRYADDLTLS QRKVTGLVI HHIFCGKSS PMAUP VYIPKANGK
FPSVDLAYL NGVPQGAS IMYADDGILC SVKFLGLEF
YIQVLGYLP Archaea IEIPKKSGG LLEFDIKGL KGTPQGGV
ERYADDSVIH KFDFLGYTF WVNYYGLFY HTERT RFIPKPDGL
FVKVDVTGA QGIPQGSI LRLVDDFLLV EDEALGGTA RRKLFGVLR
20- Structure predictions
- Primary sequence
- Secondary
- Tertiary
- Fold prediction
- Homology modeling
- Disorder prediction
- Interaction predictions
- Among and between proteins
- Expression predictions
21Predicting Interactions of the Replication/Transcr
iption Complex
Multiple Alignment
Experimental data regarding interactions of L. N
and P
N, P and L sequences
Evolutionary Dynamics Analysis
Predict regions of disorder
Inter-CM analysis
Phylogenetic reconstruction
ESF-analysis
Intra-CM analysis
Integration of Heterogeneous Data Sources in a
Bayesian Framework
Most Probable Amino Acid Contact Points
22Search DatabasesSequence, Literature,
Structural Other??
Data Retrieve, Annotate, Manage
Determine Methodological Limitations
Analyze Data Multiple Alignment of
Sequences OSM/MIR Determination 2D and 3D
Modeling Phylogenetic Reconstruction Gene and
Genome Architecture Structural Determination
McClure, 2001
23A) Experimental Types 1) analytical a)
datamining b) functional determination c)
structural determination d) hypothesis
generation and testing 2) technical a)
algorithm/software development b) comparative
testing of methods B) Data acquisition /- a
priori information 1) search methods and
limitation 2) databases C) Experimental
Design 1) literature/experimental knowledge 2)
data maskers/biological knowledge 3)
variables 4) controls D) Running programs and
collecting the resulting data 1) program
limitations 2) data manipulation E) Analyzing
data 1) numerical data 2) qualitative data F)
Data presentation
McClure, 2001
24- 9/6/07
- 2nd) Lab Assignment
-
- Loading software on your own machine.
- The NCBI/EBI/PR sites, familiarize yourself with
them. - Do a conceptual translation of a nucleic acid
sequence. - iv. Choose a gene family to work on for the rest
of the semester.