Title: Extra credit problem
1Extra credit problem
- For the Functional and GO exercises,
- Eukaryotic Annotation Course
2Example 1
- This protein from Ixodes scapularis has 386 amino
acids, and was submitted by a member of the
community.
- gtexample_sequence_1
- MKKKPISKVFVPRSRNSESDAIFKIPMAALKFTGLFWNTTCRPARLLSFL
LKISIVTTQA KLLSDAFTYETVDMVLYGSRILTANVSFIIFALQERNLR
NAIKDLSDKASFLLPLQRQRK IRTLSCSLACVSAIIIAVFLSGPAYVLF
FTDKRLQTDLLSRFVAYLNEVCFAVVIWYPLC
FMPILFVNVSQTFAELLSQYNEMIPKLFCTENHNIYSLNCKFRHSREQRH
EMRRLLSVCG KIFAPCLFIWYGPTFLGCCAELSNFMRQSDAWVHRYYKA
VTSAHGWAMFWGVSLAAHHVY ATGRASWDVLQDCTLRLPLDVGVHMELV
MLKEDCRKIAMAFTIGGFYKLTLRTAFSVFSC
MLTYAFVWYQIGPGSQPNVASHTNSD
3Check the gene structure
4BLASTP
http//www.ncbi.nlm.nih.gov/blast/Blast.cgi
5BLASTP Result
Conclusion No significant homology
6Pfam
gtexample_sequence_1
http//pfam.sanger.ac.uk/
7Pfam Result
8HMM score significance
For each Pfam family, there is a "trusted cutoff"
and a "noise cutoff", TC1 and NC1. TC1 is the
lowest score for sequences included in the family
(e.g. in the Full alignment). NC1 is the highest
score for sequences not included in the Full
alignment.
There are two HMMs for each Pfam entry one to
represent full length matches (ls model), and one
to represent fragment matches (fs model).
No significant hits, not even above the Noise
Cutoff for the 7TM_7 domain. Poor e-value.
9Look at HMM alignment
10Transmembrane Domains(TMHMM)
http//www.cbs.dtu.dk/services/TMHMM/
11TMHMM Result
5 Transmembrane helices predicted
Red bars denote significant result.
12Looking for a Signal SequenceSignalP
http//www.cbs.dtu.dk/services/SignalP/
13SignalP Result
No signal sequence found.
14Look for a Targeting SequenceTargetP
http//www.cbs.dtu.dk/services/TargetP/
15TargetP Result
Result ambiguous
RC Reliability class, from 1 to 5, where 1
indicates the strongest prediction. RC is a
measure of the size of the difference ('diff')
between the highest (winning) and the second
highest output scores. There are 5 reliability
classes, defined as follows 1 diff gt
0.800 2 0.800 gt diff gt 0.600 3 0.600
gt diff gt 0.400 4 0.400 gt diff gt 0.200 5
0.200 gt diffThus, the lower the value of RC
the safer the prediction.
SP Secretory pathway, i.e. the sequence contains
SP, a signal peptide ambiguous result.
16Superfamily
17There is no significant Superfamily result,
either.
18Other information
ClustalW alignment of other members of this
putative family
19Viewing the phylogenetic tree
20Discussion
- BLASTP No significant matches.
- Domains Possibly a diverged tick domain. Weak
hit (below Noise Cutoff) to Pfam08395 7tm_7. 7tm
Chemosensory receptor. This family includes a
number of gustatory and odorant receptors mainly
from insect species such as A. gambiae and D.
melanogaster. They are classified as
G-protein-coupled receptors (GPCRs), or
seven-transmembrane receptors. They show high
sequence divergence, consistent with an ancient
origin for the family. - 5 transmembrane helices.
- Possibly secreted, ambiguous.
- Not closely matched to others in the putative
family. Not grouped in a domain-based family
calculation. - Whats a curator to do?
In this case, the conservative approach would be
to name the protein hypothetical protein. We
could call it transmembrane protein, putative.