Protein%20Sequence - PowerPoint PPT Presentation

View by Category
About This Presentation
Title:

Protein%20Sequence

Description:

Protein Sequence Amino Acid Composition IEC RP HPLC Ancient Sequencing methods Modern Sequencing methods Sequencing the Gene Then what? – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 32
Provided by: Gary4221
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Protein%20Sequence


1
Protein Sequence
  • Amino Acid Composition
  • IEC
  • RP HPLC
  • Ancient Sequencing methods
  • Modern Sequencing methods
  • Sequencing the Gene
  • Then what?

2
Amino Acid Composition
  • 1952 - Complete Acid Hydrolysis
  • Ion Exchange Chromatography with programmed
    buffer changes (3 hr)
  • Post-column derivatization with
  • Ninhydrin
  • Fluorescamine
  • 1980 - Complete Acid Hydrolysis
  • Precolumn derivatization to Phenylthiohydantoins
  • Reversed-Phase HPLC (30 min)

3
Sequencing
  • Sanger Endgroup Analysis
  • Modify the protein with fluorodinitrobenzene
    (amines), aka FDNB, Sangers reagent.
  • Alternative reagent, dansyl chloride,
    fluorescent.
  • Hydrolyze protein
  • Separate by TLC
  • Identify N-terminal amino acid by Rf
  • Treat protein with Aminopeptidase
  • Repeat until the end gets ragged
  • Use proteolytic fragments for simplicity

4
Sequencing
  • Generate proteolytic fragments
  • Use more than one protease in separate
    experiments
  • Trypsin cleaves after Arg and Lys residues
  • Chymotrypsin cleaves after Phe, Tyr, Trp
  • Separate fragments (HV paper electrophoresis/HPLC)
  • Sequence all peptides independently
  • Assemble the sequence using overlap info

Trypsin Chtr
5
Automated Sequencing
  • Use proteolytic fragments
  • Sequence each peptide using automated Edman
    Degradation
  • Each Edman cycle removes one amino acid
  • Converts it to PTH amino acid for HPLC
  • Assemble the sequence using overlap info

Trypsin Chtr
6
N-Terminal Edman Degradation
Peptide
Attack on Phenylisothiocyanate
H
Rearrangement
Analino- thiazolinone amino acid

PTH-amino acid Absorbs 260-275 nm RP-HPLC
compatible
Peptide N-1
7
C-Terminal Edman Degradation
-
Activation of carboxyl by acetic anhydride
Attack by thiocyanate
Peptide N-1
H2O
-
TH-amino acid
Hydrolysis
8
Alternative Sequencing - MS
  • Use non-fragmenting ionization
  • Electrospray Ionization traditional mass Spec
  • Matrix-assisted laser desorption-ionization
    time-of-flight mass spec (MALDI-TOF)
  • Measures mass of mature, intact protein and/or
    complexes

9
Sequencing the Gene
  • DNA synthesis in vitro requires
  • Template (the DNA you want to sequence)
  • Primer (complementary to region up stream of
    where you want to sequence)
  • Polymerase
  • dXTPs, Mg
  • Primer pairs with template, free 3-OH group
    ready for action
  • As dXTPs basepair with template, the 3-OH
    attacks the a-phosphate of the dXTP, displacing
    PPi, making a phosphodiester, extending the
    nascent DNA chain by one base

10
The Polymerase Reaction
Elongation of a primer that is base-paired with a
template Requires a free 3-0H group
OH
5
PP
P
G C
C G
A T
A T
C G
C G
A T
T A
T A
A T
A T
T
A T
A C T A G A A T T C A
3
5
11
Di-deoxy Terminators
  • If 2, 3-dideoxy nucleoside triphosphates were
    used, the reaction would proceed for only one
    cycle because there would be no free 3-OH group
    to attack the next dXTP
  • If a fraction of a percent of ONE 2, 3-dideoxy
    nucleoside triphosphate (say ddTTP) were used
  • SOME polymer would be terminated EACH time that
    base was incorporated, i.e., each time dA occurs
    in the template.
  • If 1/1000th of the dTTP were ddTTP, then 1/1000th
    of the polymers would terminate at each dA in the
    template the rest would continue
  • You would get many polymers of different sizes,
    each corresponding to the occurrence of a dA in
    the template
  • Use four separate reactions, one with ddTTP, one
    with ddATP, one with ddGTP, and one with ddCTP
    (and all other components)
  • One of the reaction mixtures would contain a
    polymer that terminated at each base

12
Dideoxy Terminators
Sequence of template
Base in polymer
  • Use fluorescent or radioactive primer so you can
    see every polymer
  • Separate them by size (gel electrophoresis)
  • Read sequence of polymers from gel
  • Infer the sequence of the template by
    Watson-Crick

3 A T G T C A C A G G A C A G A 5
5 TACA G T C T C C T G T C T 3
small
large
Agarose gel
13
A, T, G, and C. What are the Amino
Acids?Standard Genetic Code
14
ORFs - Look for longest uninterrupted sequence
15
So, youve got the sequenceSo what?
Next topic Bioinformatics Inferences based on
homology
16
Questions
  1. Has the gene been sequenced before? (Will I be
    able to publish?)
  2. What is the sequence of the protein encoded by
    the gene?
  3. Has the protein been sequenced before?
  4. Is the gene similar to one that has been
    sequenced before?
  5. Did I sequence the right gene?
  6. Will I be able to find structural or functional
    relatives?
  7. Is the protein similar to one that has been
    sequenced before?
  8. How similar?
  9. What does the similarity mean?
  10. Can I predict the function of the gene product,
    or is the predicted function consistent with what
    I know about the protein?
  11. Can I get information about structural features
    of the gene product?
  12. Secondary structure
  13. Folding domains or other common patterns
  14. Hydropathy profiles
  15. How might predicted helices and/or sheet pack?
  16. Is it likely to be a membrane protein, a
    transmembrane protein?

17
Answers Sequence Similarities and Similarity
Searches
  1. Search sequence databases for homologous
    proteins.
  2. Find families of proteins that are similar to
    your protein.
  3. Use information about the structure and
    properties of the similar protein(s) to establish
    inferences about your protein. If the exact
    sequence is in the database, the similarity
    search routines will find that, too.
  4. Determine whether two sequences are related (or
    identical) by aligning them so that homologous
    regions are adjacent.
  5. For two identical sequences

MGKARSMVLKHSTKARS MGKARSMVLKHSTKARS
18
But, what about
  • Imperfect homology
  • MGKARSMLLKHSTKARS
  • MGKARTMVLKHSTRARS
  • Gaps/insertions
  • MGKARSMLLKHSLKARS
  • MGRA LKHSLRART
  • And, how homologous is homologous

19
Need
  • Similarity scores for pairs amino acids
  • Method for dealing with gaps
  • Algorithms for comparing a sequence with a
    database
  • Ways to assess the degree of homology
  • Ways to link structural info with sequence info

20
Dynamic Programming
  • Needleman-Wunsch Algorithm
  • Compares similarity of two proteins a b at
    positions i j
  • NWi,j max(NWi-1, j-1 s(aibj) NWi-1, j g
    NWi, j-1 g)
  • NWi-1, j-1 running total
  • s(aibj) similarity between residue i of protein
    a and residue j of protein b
  • g gap penalty
  • http//www.avatar.se/molbioinfo2001/dynprog/dynami
    c.html

21
Fill a Matrix with all possibilities
Simple example s 1,0 and g 0
22
Smith-Waterman
  • Always compare NW terms to zero so that it
    doesnt get too small.

NWi,j max (NWi-1, j-1 s(aibj) NWi-1, j
g NWi, j-1 g 0)
23
BLAST FASTA
  • FASTA - great, we wont talk about it
  • much faster and more selective than SW, but less
    sensitive
  • Basic Local Alignment Search Tool
  • less selective and more sensitive than FASTA,
  • i.e., you may get more hits, but some of them may
    be wrong

24
BLAST
  • Divide sequence into words of length W (eg.
    BLASTp, initial W 3)
  • Compare all W-length words
  • Retain only pairs with similarity above a
    threshold,T
  • Call them High-Scoring Pairs
  • Increase W, repeat with HSPs
  • Keep going
  • remaining above a minimum similarity,
  • and compare to random probability (E)

25
Scoring Matrices- Making similarity quantitative
  • Compare the actual frequency to the frequency
    expected by chance alone.
  • Probablilty that alanine appears at position x in
    a protein
  • fraction of Ala in all proteins
  • pAla
  • Probability that one protein has Ala at position
    x, and another protein has Gly?
  • pAlapGly
  • The frequency due to chance, alone.

26
Similarity
  • qAla,Gly ACTUAL frequency that Ala and Gly are
    at position x in two proteins (in your database)
  • Ri,j qi,j/pipj
  • Score Si,j log2(Ri,j) log2(qi,j/pipj)
  • Log-Odds Scores
  • Remember Chou Fasman?

27
PAM Matrices
  • Margaret Dayhoff assembled the Atlas of Protein
    Structure
  • Evolutionarily-accepted mutations
  • Calculated qi,j for all aas in closely-related
    proteins
  • These were accepted by Nature as similar/close
    enough
  • Generate half matrices Point Accepted
    Mutation/Percent Accepted Mutations
  • Scale, so PAM1 reflects 1 mutation per 100
    residues, PAM50, 50 allowed mutation/100

28
BLOSUM
  • Henikoff and Henikoff
  • BLOcks of Amino Acid SUbstitution Matrix
  • BLOCKS is a database of related proteins

29
BLAST Search
  • Go to BLAST Website
  • Enter Nucleotide or AA sequence
  • Choose BLAST type
  • Nucleotide-nucleotide BLASTn
  • Protein-protein, BLASTp
  • 6-frame-translated nucleotide-ProteinBLASTx
  • others

30
Then?
  • Does it make sense?
  • Multisequence Alignment
  • Secondary structure prediction
  • Domains
  • Families

31
Caveat
  • It ain't what you don't know that'll kill you,
  • it's what you know that ain't so.
About PowerShow.com