Computational Virology - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Computational Virology

Description:

Computational Virology. Lectures in. Bioinformatic Studies on the Evolution Structure and Function ... Does the RT domain of the RdDp share common ancestry ... – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 36
Provided by: crg3
Category:

less

Transcript and Presenter's Notes

Title: Computational Virology


1
Computational Virology
Lectures in
Bioinformatic Studies on the Evolution Structure
and Function of RNA-based Life Forms
Marcella A. McClure, Ph.D. Department of
Microbiology and the Center for Computational
Biology Montana State University, Bozeman
MT mars_at_parvati.msu.montana.edu
2
Summary Lecture I
  • 1) Introduction to RNA-based life forms
  • Methods to test the hypothesis.
  • Testing the hypothesis.
  • Predicting protein contacts.

3
The World of Viruses
DNA viruses
RNA viruses
RdDp
ssRNA
dsRNA
ssDNA
dsDNA
RdRp
host Pol II
ssRNA
- ssRNA
Does the RT domain of the RdDp share common
ancestry with the RdRp of negative and positive
polarity, single-stranded viruses?
4
Rhabdoviridae
Paramyxoviridae
Filoviridae
Retroviridae
Picornaviridae
5
Retroid Agents
Retroviruses, retrotransposons,
pararetroviruses, retroposons, retroplasmids,
retrointrons, and retrons
reverse transcriptase mediated replication or
transposition
RNA viruses e.g., Ebola, rabies, influenza, polio
All cellular systems most DNA Viruses
RNA
DNA
transcription
Replication by DNA-dependent DNA polymerase
Replication by RNA-dependent RNA Polymerase
translation
snRNAs, ribozymes, tRNA, rRNA
PROTEIN SYNTHESIS
McClure, 2000
6
Mononegavirales
OLD FOES rabies (Rhabdoviridae) measles,
RSV, mumps (Paramyxoviridae) EMERGING
THREATS Ebola, Marburg (Filoviridae) equine
morbillivirus, Nipah virus (Paramyxoviridae) MOD
EL AGENT vesicular stomatitis virus
(Rhabdoviridae)
7
Roles of Retroid Agents
1) Disease a) retroviruses 1) exogenous
infectious HIV HTLV 2) endogenous
associations breast cancer, testicular tumors,
insulin dependent diabetes, multiple
sclerosis, rheumatoid arthritis,
schizophrenia and systemic lupus erythematosus
b)LINEs insertional mutagenesis 1)
Hemophilia A 2) muscular dystrophies Duchenne
and Fukuyama- congenital type 3) X-linked
disorders Alport Syndrome-Diffuse
Leiomyomatosis and Chronic Granulomatous Disease
2) Regulation of cellular genes and
reproduction 3) Telomere maintenance 4) Repair of
broken dsDNA 5) Exchange of genetic information
among and between organisms
8
Plus-strand RNA Virus Families and Human Diseases
Togaviridae - Riff Valley Fever Flaviviridae -
Dengue Fever virus, West Nile virus Coronaviridae
- Infectious Bronchitis Caliciviridae - Hepatitis
E virus Picornaviridae - Human poliovirus,
Hepatitis A
9
VSV Transcription
leader
N
VSV Transcription
5'
5'
read through
3'
P
P
P
P
P
P
VSV Replication
L
L
CO-ASSEMBLY
N
?
P
P
10
RNA Template
11
Replication
12
Model of a poliovirus polymerase-dsRNA complex
HIV-1 Reverse Transcriptase
Poliovirus Polymerase
Poliovirus Polymerase Oligorner
Model of a poliovirus polymerase-dsRNA complex
based on the structure of HIV-1 RT complexed to
dsDNA (Huang etal., 1998).
13
Rhabdoviridae Genome
Paramyxoviridae Genome
Filoviridae Genome
N VP35 VP40 G
VP30 VP24 RdRp
MMLV Genome
Picornaviridae Genome
RdRp
VPg
Poly(A)
L P4 P2 P3 P1 2A 2B 2C
3A 3B 3C 3D
14
RdRp of Plus strand viruses
GDD
RdRp of Mononegavirales
GDNQ
RdDp

FADDM
RT
RH
HYPOTHESIS The Reverse Transcriptase domain of
the RNA-dependent DNA Polymerase shares common
ancestry with the RNA-dependent RNA Polymerase
of the OrderMononegavirales and Plus Strand RNA
viruses.
15
Biological Patterns
Whether randomness can be measured is a
difficult problem. One cannot judge the absence
of pattern without specifying which pattern, and
what is a pattern to you may not be a pattern to
me.
McClure, 2000
16
Basic Strategy
Search Databases
Annotate and Preparation of Sequences
Multiple Alignment of Sequences
Refined Multiple Alignment
Analysis of Multiple Alignment
McClure, 2000
17
What is an ordered series of motifs (OSM)?
An OSM, which may span hundreds of residues, is
defined as a set of conserved or semi-conserved
motifs (1-9 contiguous amino acid residues) found
in the same arrangement relative to one another
in all sequences of a protein family. The amino
acids of these patterns are involved in catalysis
or structural integrity. The spacing between
motifs or motif intervening regions (MIRs) can be
highly variable, reflecting the regions of a
protein that are less restricted by functional or
structural constrains. MIRs may evolve more
rapidly and be more subject to insertion/deletion
events, and duplications that the OSM. Why is
OSM identification important? The OSM of a
protein family can be used to predict function.
The identification of an OSM common among protein
sequences with as little as 8 amino acid
identity has led to successful prediction of
function. If a multiple alignment method, (be it
global or local) cannot correctly identify the
highly conserved residues of a given sequence
that are critical for function and structure,
then it is of little value.
McClure 2002
18
Levels of Sequence Comparisons
McClure, 2000
19
Example of local subsequences or OSM
McClure, 2000
20
Strategy for Assessing Protein Sequence Homology
Protein Sequence Data
SEQUENCE COMPARISON
gt30 identical homology
lt30 identical
MOTIF DETECTION
Support for homology Statistical tests
OSM present functionally equivalent
likely homologue
Functional identification, Phylogenetic
analysis, Structural prediction
Support for homology Gene order and size,
common function
McClure, 2000
21
DoRNA-Dependent Polymerases Share Common Ancestry?
22
Experimental Design for Testing Motif Detection
Methods
Methods Appropriateness Availability Assumptions
Limitations User specific parameters
Bench Mark Sequences Biologically informative
markers Sequence length distribution Evolutionary
distribution Set size
Parameter Range Tests
Types of Test Data
Evaluate Results for Correct Identification of
Biologically Informative Marker
Method (s) that Accurately Identify Biologically
Informative Marker
RdRp and RdDp sequences
Test hypothesis RdRp share common ancestry with
RdDp
23
RdRp of Plus strand viruses
GDD
RdRp of Mononegavirales
GDNQ
RdDp

FADDM
RT
RH
HYPOTHESIS The Reverse Transcriptase domain of
the RNA-dependent DNA Polymerase shares common
ancestry with the RNA-dependent RNA Polymerase of
the Order Mononegavirales and Plus Strand RNA
viruses.
24
(No Transcript)
25
Sequence Length, Percent Identity and Distance
Values
26
Small Dataset Output
27
Large Dataset Output
28
New work
A Functional Genomics Approach to Inferring Amino
Acid Contacts Among the L, P and N proteins of
the Replication/Transcription Complex of the
Order Mononivavirales
  • Protein disorder
  • Low hydrophobicity and high mean net charge are
    good indicators of natively unfolded proteins
  • Predictors of Natural Disordered Regions
    (PONDR)--
  • utilizes neural networks to distinguish
    disordered from ordered regions

2) Evolutionary Dynamic Approaches A)
Intermolecular compensatory mutations Pazos and
Valencia 1) predicting
interacting partners 2) detecting
correlated mutations between two interacting
proteins 3) extending to three
interacting partners B)
Evolutionary-Structure Function (EFS) -- Simon
and Sidow Determines numbers amino acid
replacements given a fixed phylogenetic topology,
ranking constrained regions C)
Intramolecular compensatory mutations
-- Pollack calculates likelihood estimates of
allowing for rate variation and robustly
discriminates coevolution of intra-sites versus
random effects.
3) Use experimental results to model and validate
expectations 4) Test the predicted structure for
the Ebola
29
Figure 1. Schematic of VSV RNA Synthesis. The L
and P proteins interact with the ribonuclear
protein complex, NRNA and the 5 individual RNA
messages of the genome are transcribed. The same
complex also replicates nascent genomes that
undergo co-assembly with the N protein. (Figure
from J. Perrault, personal communication.)
30
Rhabdoviridae Genome
VSV
Paramyxoviridae Genome
Sendai
31
N, P and Proteins
required for replication
N protein
RNA-BS
1
524
Sendai
RNA-BS
PPBS
RNA-BS
PPBS
PCS
VSV
1
422

PPBS
P protein
Oligomerization domain
NPBS
RSR

RES
1
LPBS
Sendai


NPBS

NPBS
568






NPBS

NPBS
LPBS
GTP binding
VSV
1
265


L protein

I

II

III

IV

V
Sendai
2228
1

RSR
PPBS
MT
RNA-BS
VI


I

II

III
V

IV
VSV

1
2109



MT
PPBS
32
Mtase of Ebola virus
33
Update Mononegavirales Sequence
Update Mononegavirales Sequence and Literature
Database
Annotated N, P, L protein maps with ALL
information regarding positions of
experimentally determined functions and
interactions
N, P and L sequences
Multiple Alignment
Evolutionary Dynamics Analysis
Predict regions of disorder
Inter-CM analysis
Phylogenetic reconstruction
PONDR
Calculate H/R
ESF-analysis
Intra-CM analysis
Integration of heterogeneous data in Bayesian
Inference Network
34
sequence-based experiments
replication
transcription
xy contact in virus 1
Fig. 4. A small Bayesian network for inferring
proteinprotein contact for a one virus.
35
sequence-based experiments
replication
transcription
replication
transcription
xy contact in virus type 1
xy contact in virus type 2
Fig 5 A Bayesian network representing multiple
instances of proteinprotein contact inference
for more than one virus.
36
1
2
Integration of heterogeneous data into a Bayesian
Framework
Update Sequences, literature and construct
structure/function maps
Determine disordered regions
Construct multiple alignments and initiate
Inter-CM analysis
Initiate phylogenetic reconstruction
Conduct ESF and Intra-CM analysis
Figure 5. Two year project timeline for proposed
studies.
Write a Comment
User Comments (0)
About PowerShow.com