A Systematic Search for Genes Encoding Proline Rich Proteins in Arabidopsis PowerPoint PPT Presentation

presentation player overlay
1 / 34
About This Presentation
Transcript and Presenter's Notes

Title: A Systematic Search for Genes Encoding Proline Rich Proteins in Arabidopsis


1
A Systematic Search for Genes Encoding Proline
Rich Proteins in Arabidopsis
  • Aaron Newman
  • CMPS290N Project

2
Synopsis
  • Background
  • Objective
  • Methods Results
  • Analysis of
  • Putative PRPs
  • Future work

3
Proline Physiochemical Properties
  • An Imino Acid
  • Non-Polar
  • Planar Cyclic Molecule
  • Cis Trans forms
  • Rotationally Hindered
  • a-Helix Breaker
  • Often Present in Structural Proteins

4
A well-characterized example of a Proline Rich
Structural Protein
  • Collagen, a fibrous protein, contains relatively
    long regions of tandem repeats of the motif,
    Gly-X-Pro/HyP.
  • Collagen strands assume a triple helical
    assemblage

5
Arabidopsis thaliana, a Model Plant
  • Complete genome is sequenced
  • An immense database awaits biological inquiry.
  • Arabidopsis, like other plants, incorporates
    Proline-Rich Proteins into cell wall matrices.
    PRPs assist in structural support and may also
    help defend against physical damage and pathogens.

6
Outline
  • Background
  • Objective
  • Methods Results
  • Analysis of Putative PRPs
  • Future work

7
Project Objective
  • Mine the Arabidopsis Protein Databank for
    hypothetical/unknown/unnamed proteins that
    satisfy the following broad criterion
  • Sufficient sequence similarity to predefined
    PRPs. This requirement can be understood in terms
    of the presence of
  • Known Proline-rich motifs and/or proposed
    Pro-Rich motifs
  • Tandem or regular repeats of one or more of the
    above
  • Predicted N-terminal hydrophobic subsequence
    indicative of secretion signal
  • The genes underlying candidate PRPs will be
    flagged for further investigation.

8
Outline
  • Background
  • Objective
  • Methods Results
  • Analysis of Putative PRPs
  • Future work

9
Methods Results
  • Given set of known AtPRPs, arbitrarily select
    PRP4 to probe Arabidopsis protein database using
    Protein-Protein Blast.
  • Use sequences with top five scores for training
    set.
  • Since PRP1 and PRP3 not present in training set,
    add them to enrich sequence space for motif
    detection.
  • Training Set
  • iv) PRP4 GI7620015
  • ii) PRP2 GI7620011
  • iii) PRP1 GI25456291
  • iv) PRP3 GI25456294
  • v) Putative extensin protein GI24030361
  • vi) Extensin-like protein GI24030361
  • vii) At2g21140 GI30017301

10
Methods Results continued
  • Use MEME to extract motifs from training set with
    maximum of motifs parameter set to three.
  • MEME output
  • i) PVPVYKPP
  • ii)  IPKKPCPP
  • iii) SPPYYTPP

11
Methods Results continued
  • Use MEME to extract motifs from training set with
    maximum of motifs parameter set to three.
  • MEME output
  • i) PVPVYKPP
  • ii)  IPKKPCPP
  • iii) SPPYYTPP
  • Notice similarity in sequence (ii) with known
    PRP2 and PRP4 motif, KKPCPP and similarity of
    sequence (i) with defined PRP4 motif
    PPPKIEHPPPVPVYK

12
Methods Results continued
  • Use the three consensus sequences produced by
    MEME to query the Arabidopsis Protein Database
    using an instance of Protein-Protein BLAST that
    searches for short, nearly exact matches.
  • Combine the output for all consensus sequences to
    produce one set of hits.
  • For this set of hits, manually remove
  • Sequences not annotated as hypothetical/unknown/un
    named
  • Each protein sequence that fails to exhibit 8 or
    more hits to at least one consensus sequence.
    (The threshold of 8 hits is arbitrarily chosen)

13
Methods Results continued
  • The following list of sequences emerged
  • gi11994465dbjBAB02467.1 unnamed protein
    product Arabidopsis thaliana
  • gi10178119dbjBAB11412.1 unnamed protein
    product Arabidopsis thaliana
  • gi23308313gbAAN18126.1 At2g10940/F15K19.1
    Arabidopsis thaliana
  • gi8978351dbjBAA98204.1 unnamed protein
    product Arabidopsis thaliana
  • gi24417264gbAAN60242.1 unknown Arabidopsis
    thaliana
  • gi5430752gbAAD43152.1 HypotheticalProteinAra
    bidopsis thaliana
  • gi20259488gbAAM13864.1unknown protein
    Arabidopsis thaliana
  • gi5306245gbAAD41978.1 unknown protein
    Arabidopsis thaliana
  • gi5306260gbAAD41992.1 hypothetical protein
    Arabidopsis thaliana

14
Methods Results continued
  • Let the new set of potential PRPs as well as the
    training set be independently queried against the
    PFAM database.

15
Methods Results continued
  • Explanation of conserved domains of several
    putative PRPs according to PFAM database
  • Extensin 2 Family of hydroxyproline-rich
    glycoproteins found in plant extracellular matrix
    and characterized by repetitive motifs.
  • Root Cap Conserved region within plant root cap
    proteins
  • LRR 1-Leucine-Rich Repeat region Implicated in a
    diverse array of functions and found in a broad
    range of organisms
  • Tryp alpha amal-Family in plants composed of
    trypson alpha amylase inhibitors, seed storage
    proteins and lipid transport proteins.
  • Proteasome-Broad family of proteins that function
    to degrade other proteins
  • Description of conserved domain identified in
    defined PRPs
  • DUF1210 Representative region of family of
    Proline-Rich Proteins. Aside from a PRP
    indicator, the significance of this region is not
    understood/published.
  • Interestingly, AtPRP1 and AtPRP3 do not harbor
    DUF1210 nor do they contain any other discernable
    regions of conservation when compared to PFAM
    database.

16
Methods Results continued
  • In order to narrow down the set of putative PRPs
    further, a CLUSTALW multiple alignment was
    performed.
  • The following set of sequences have the highest
    degree of relatedness. This claim is predicated
    on manual inspection of the alignment.
  • GI 5306260
  • GI 20259488
  • GI 5306245
  • GI 5430752
  • GI 23308313
  • GI 24417264

17
Methods Results continued
  • Furthermore,
  • Sequences of the same color were observed to be
    very similar to one another. CLUSTALW alignments
    of each set of sequences of the same color in
    isolation from the other potential PRP sequences
    supported this categorization scheme.
  • PFAM HMM
  • GI 5306260
  • GI 20259488 LRR_1
  • GI 5306245 LRR_1
  • GI 5430752 LRR_1
  • GI 23308313 tryp_alpha_amal
  • GI 24417264

18
Methods Results continued
  • Furthermore,
  • Sequences of the same color were observed to be
    very similar to one another. CLUSTALW alignments
    of each set of sequences of the same color in
    isolation from the other potential PRP sequences
    supported this categorization scheme.
  • PFAM HMM Let sequence i be denoted by
  • GI 5306260 A
  • GI 20259488 LRR_1 B1
  • GI 5306245 LRR_1 B2
  • GI 5430752 LRR_1 B3
  • GI 23308313 tryp_alpha_amal C1
  • GI 24417264 C2

19
Outline
  • Background
  • Objective
  • Methods Results
  • Analysis of Putative PRPs
  • Future work

20
Analysis-Secretion Signal
  • Each defined pre-processed PRP has a signal
    peptide at the N-terminus. The signal peptide is
  • 20-70 successive amino acids
  • hydrophobic
  • cleaved upon translocation of the nascent PRP
    into the ER
  • in general, the cellular version of a destination
    address, that, in this case, allows a PRP to be
    transported out of the cell

21
Analysis-Secretion Signal
  • Two signal peptide detection programs were
    employed on known AtPRP set (PRP1,2,3,4) and
    refined potential AtPRP set
  • SPScan in GCG(Empirically derived Scoring Matrix)
  • SignalP 3.0 (HMMs Neural Networks)

22
Analysis-Secretion Signal
  • Characterized PRPs
  • SPScan predicted same signal peptides as those
    that are predicted in the literature. (perhaps
    the same program was used)
  • SignalP also yielded identical predictions with
    the exception of PRP2, where the cleavage site is
    inferred to be two amino acids downstream from
    the cleavage site predicted by SPScan and the
    publication.
  • PRP1 and PRP3 exhibit a high degree of
    relatedness in regard to signal peptides and
    motifs.
  • Similarly, PRP2 and PRP4 show a high level of
    similarity in terms of signal peptides and
    motifs.
  • Thus, in Arabidopsis, there is variation within
    both PRP signal peptides and PRP motif
    composition (will be supported shortly).
  • Fowler J., Characterization and Expression of
    Four Proline Rich Proteins in Arabidopsis

23
Analysis-Secretion Signal
  • Putative PRPs
  • By and large, SPScan and SignalP converged on the
    same secretory signal predictions.
  • Both programs rejected the presence of an
    N-terminal secretion signal in B2 when parameters
    were set to default values.
  • By reducing maximum score threshold from 7 to 5.5
    in SPScan, a secretion signal for B2 was
    predicted with probability score of .9 (the
    closer to zero, the better)
  • Sequence Predicted Signal Peptide
  • A 1 mrvplidflrflvlilslsgasvaad 26
  • B1 1 mtrrtmekpfgcflllfcftisiffys 27
  • B2 1 mphiykqplgifqgfvptltdaev 24
  • B3 1 merpfgcffilllisytvvatf 22
  • C1 1 mdssklsslslclfliciiylpqhslacg 29
  • C2 1 mdssklsslslclfliciiylpqhslacg 29

24
Analysis-Signal Comparison
  • Predicted signal peptide A observed to be similar
    to PRP3 and PRP1 predicted signal peptides.
  • CLUSTAL W (1.82) multiple sequence alignment
  • PRP3 MAITRSSLA--ICLILSLVTITTA 22
  • PRP1 MAITRASFA--ICILLSLATIATA 22
  • A MRVPLIDFLRFLVLILSLSGASVA 24
  • . . .
  • Predicted signal peptides B1,B2,B3 are mildly
    similar to PRP4 and PRP2 signal peptides. B1 and
    B3 are fairly similar to one another.
  • CLUSTAL W (1.82) multiple sequence alignment
  • B3
    -----MERPFG---CFFILLLISYTVVA--- 20
  • B1
    MTRRTMEKPFG---CFLLLFCFTISIFF--- 25
  • PRP4
    --MRILPEPRGSVPCLLLLVSVLLSATLSLA 29
  • PRP2
    --MRILPKSGGGALCLLFVF-ALCSVAHS-- 26
  • B2
    -MPHIYKQPLG----IFQGFVPTLTDA---- 22
  • ..
    .
  • Predicted signal peptides C1 and C2 are
    identical.

25
Analysis of Motifs
  • Known PRP Motifs Proposed PRP Motifs
  • Compared to Known PRPs Potential PRPs
  • A B1 B2 B3 C1 C2

Cooper, J., A New Proline-rich Early Nodulin
from Medicago truncatula
26
Analysis sequence A
  • Colored regions correspond to proposed motifs
    with the exception of PPVHK which is a previously
    described motif. This analysis is not thorough.

27
Analysis - group B
  • Comparison of group B sequences

28
Analysis group B cont.
  • Comparison of group B sequences

29
Analysis group C
  • Comparison of group C sequences

30
Analysis Upshot A
  • So far, A is the strongest contender for PRP
    denomination because
  • Contains many occurrences of the previously
    established PRP motif, PPVHK.
  • The majority of the hypothetical sequence is both
    proline rich and in tandem repeat configuration
  • The predicted signal peptide is strikingly
    similar to the estimated signal peptides for PRP1
    and PRP3.

31
Analysis Upshot B
  • The sequences of group B are potential PRPs
    because
  • They contain multiple occurrences of proposed PRP
    motifs, PPVHS and PPVYS. These PRP motif
    suppositions only differ from the following known
    PRP motifs by the last amino acid, Serine. Serine
    Threonine are physiochemically similar.
  • Defined PRP motifs Estimated PRP motifs
  • PPVHT PPVHS
  • PPVYT PPVYS
  • B1 and B3 have probable signal peptides
  • B2, while similar in amino acid sequence to B1
    B3, is somewhat of an anomaly in regard to its
    improbable signal peptide.
  • The potential presence of LRRs among the
    sequences of group B does not seem to diminish
    the probability that the sequences of B are
    PRPs.

32
Analysis Upshot C
  • The sequences of group C are putative PRPs
    because
  • They contain multiple tandem repeats of a
    decapeptide proposed proline rich motif, which is
    of amino acid composition comparable to known PRP
    motifs.
  • Both have probable signal peptides.
  • The probable presence of typson alpha amylase
    inhibitor domain in C1 is unexpected, but it is
    far from clear that the presence of this domain
    raises sufficient doubt to remove C1 or group C
    from the set of potential PRPs.

33
Outline
  • Background
  • Objective
  • Methods Results
  • Analysis of Putative PRPs
  • Future work

34
Future Work
  • Continue using bioinformatics tools to
    increase/decrease confidence levels regarding A,
    B, and C.
  • Another Approach to detect PRPs
  • Subtraction Method
  • Acquire raw Arabidopsis Genome
  • Remove all inferred non-coding regions
  • Convert estimated exons into amino acids.
  • Remove all inferred proteins without probable
    N-terminal signal peptides
  • Search for tandem repeats with similar character
    composition to known PRP motifs.
  • Admit remaining sequences to set of potential
    PRPs
  • Perform analysis as previously described.
Write a Comment
User Comments (0)
About PowerShow.com