Distribution of Introns among Full Length cDNA - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

Distribution of Introns among Full Length cDNA

Description:

The coding region begins with the initiation codon, which is normally ATG. It ends with one of three termination codons: TAA, TAG or TGA. mRNA. Function of UTRs ... – PowerPoint PPT presentation

Number of Views:56
Avg rating:3.0/5.0
Slides: 31
Provided by: xinh
Category:

less

Transcript and Presenter's Notes

Title: Distribution of Introns among Full Length cDNA


1
Distribution of Introns among Full Length cDNA
Bioinformatics Capstone
  • By Xin Hong
  • Advisor Dr. Michael Lynch and Dr. Sun Kim

2
Main Points
  • Motivation
  • Background
  • Data sources
  • Method
  • Results and discussion

3
Motivation
  • Genomic sequences
  • Full length cDNA project
  • Gene predict program does not include UTR
    regions.
  • The UTR structure and Function and NMD theory.

4
Definition of UTRs and Introns
  • 5UTR sequences were defined as the mRNA region
    spanning from the cap site to the starting codon
    (excluded).
  • 3UTR sequences were defined as the mRNA region
    spanning from the stop codon (excluded) to
    poly(A) starting site.
  • The coding region begins with the initiation
    codon, which is normally ATG. It ends with one of
    three termination codons TAA, TAG or TGA.

Genomic sequence
Pre-mRNA
1
2
3
mRNA
3UTR
5UTR
CDS
5
Function of UTRs
  • Translational control
  • mRNA sub cellular localization
  • mRNA stability

Pesole, 2001
6
Nonsense-Mediated Decay (NMD)
  • An mRNA is immune to NMD if translation
    terminates less than 5055 nucleotides upstream
    or downstream of the 3'-most exonexon junction,
    which is the last intron of cDNA.
  • NMD is a a mRNA surveillance mechanism that leads
    to selective degradation of transcripts
    containing premature termination codon.

7
Objectives
  • To explore introns in the UTR region
  • To find the rule about introns distribution among
    UTR regions.
  • To compare the introns distribution between UTRs
    and CDS.
  • To compare the introns distribution rules among
    different species.

8
Data source
  • Full length cDNA sequences
  • MGC (Mammalian Gene Collection) - mammalian
  • BDGP fruit fly
  • KOME plant
  • Genomic sequences
  • Genbank
  • Ensmbal
  • CDS prediction (Furuno et al. 2003)
  • ProCrest
  • rsCDS
  • NCBI predictor
  • DECODER
  • Experiment

Human (hs) 15504 15458
Mouse (mm) 12828 12803
Rat (rn) 641 634
Drosophila melanomas (dm) 9152 9096
Arabidopsis thaliana (at) 18415 18414
9
Method
  • Do alignment between cDNA sequences and Genomic
    sequence
  • How about gaps, overlapping even polymorphism?
  • BLAST, Mega BLAST ..
  • sim4, gap2, spidey, BLAT and GeneSeqer

Jim Kent - the Blat Rap
10
Steps
  1. Clear full length cDNA and genomic sequence.
  2. Parse cDNA to 5UTR, CDS and 3UTR three parts.
  3. Aligning cDNA to genomic sequence by BLAT
  4. Parse BLAT result to get locations of exon and
    intron.
  5. Get sequences of exon and intron.
  6. Check if sum of exons equal to cDNA to remove
    suspect candidates.
  7. Calculate the average length of the cDNA, the
    average number of introns in cDNA, etc.
  8. Compare the intron distribution of 5UTR, CDS and
    3UTR regions.
  9. Compare the intron distribution rules among
    different species.

11
Flow Chart
12
Objectives
  • To explore introns in the UTR region
  • To find the rule about introns distribution among
    UTR regions.
  • To compare the introns distribution between UTRs
    and CDS.
  • To compare the introns distribution rules among
    different species.

13
Introns Do Exist in UTRs
  • Introns do exist in UTRs.
  • However, for arabidopsis as an example, 80 of
    sequences of 5UTR dont have introns. 90 of
    sequences of 3UTR dont have introns.

14
Introns in CDS
  • 80 of sequences of CDS have introns.

15
Introns number UTRs vs. CDS
  • Most of CDS sequences have introns, but most of
    UTR sequences dont have introns.

Number of sequences
Number of intron
16
Objectives
  • To explore introns in the UTR region
  • To find the rule about introns distribution among
    UTR regions
  • To compare the introns distribution between UTRs
    and CDS
  • To compare the introns distribution rules among
    different species

17
Introns in UTR
  • Introns of 5UTR and 3UTR are overspread, but
    not evenly or uniformly distributed.
  • If evenly distributed, the expected intron
    location 1/(number of intron1)

Intron Number
Number of intron
18
Introns in UTR
  • The number of intron increase, when the length of
    sequence increase.
  • For human 5UTR, on average an intron is present
    for each 100nt.
  • Introns of 3UTR tend to concentrate toward the
    center of 3UTR.

Location of introns
Length of sequences
Number of intron
Number of intron
19
Objectives
  • To explore introns in the UTR region
  • To find the rule about introns distribution among
    UTR regions.
  • To compare the introns distribution between UTRs
    and CDS.
  • To compare the introns distribution rules among
    different species.

20
Introns in CDS
  • Introns in CDS are overspread.
  • For human, if there are more than one intron, the
    interval between 2 introns is about 140nt. (In
    other word, the average exon in CDS is 140nt)
  • Introns are shift toward 5.

21
Intron distribution UTRs vs. CDS
  • Human as example
  • The frequency of introns occurring 5UTR is
    higher than that of CDS.
  • The frequency of introns occurring CDS is higher
    than that of 3UTR.

Number of intron
Number of intron
22
Intron distribution UTRs vs. CDS
5UTR CDS 3UTR
Interval between 2 introns 100nt 140nt uncertain
Intron frequency Higher than CDS Higher than 3UTR Lowest
distribution evenly Shift toward 5 of CDS Concentrate toward the center of 3UTR
23
Objectives
  • To explore introns in the UTR region
  • To find the rule about introns distribution among
    UTR regions.
  • To compare the introns distribution between UTRs
    and CDS.
  • To compare the introns distribution rules among
    different species.

24
Different species UTRs vs. CDS
  • Number of introns increase with the length of
    sequence in both UTRs and CDS.
  • The sequences of 5UTR less than 100nt dont have
    introns for human, mouse, rat, Arabidopsis and
    fruit fly.
  • While the sequences of CDS less than 800nt dont
    have introns for human, mouse, Arabidopsis and
    fruit fly. For rat this boundary is 500nt.
  • The fruit fly sequence length increase faster
    than the other species in both UTRs and CDS.

Number of intron
Number of intron
25
Different species UTRs vs. CDS
  • For 5 species, most of UTRs dont have introns.
  • For 5 species, most of CDS have introns.
  • The intron distribution rule works for human,
    mouse, rat, arabidopsis and fruit fly.

Number of sequences
Number of sequences
Number of intron
Number of intron
26
Summary
  • The introns do exist in UTRs.
  • The intron distributions in 5UTR, CDS and 3UTR
    are different for same organism.
  • The intron distribution rules are in common for
    human, mouse, rat, Arabidopsis and fruit fly.
  • The sequences of 5UTR less than 100nt dont have
    introns for human, mouse, rat, Arabidopsis and
    fruit fly.
  • While the sequences of CDS less than 800nt dont
    have introns for human, mouse, Arabidopsis and
    fruit fly except for rat is 500nt.
  • The fruit fly fl-cDNA sequence length increase
    faster than the other species in both UTRs and
    CDS.

5UTR CDS 3UTR
Percentage (sequence have introns) 20 80 10
Interval between 2 introns 100nt 140nt uncertain
Intron frequency Higher than CDS Higher than 3UTR Lowest
distribution evenly Shift toward 5 of CDS Concentrate toward the center of 3UTR
27
Future work
  • NMD widely exists among different species.
  • The reason why most UTR dont have introns.
  • The reason why intron frequency decrease when
    sequence goes from 5 to 3 along the full length
    cDNA.

28
Reference
  • Lynch, Micheal and Kewalramani, Avinash (2003)
    Messenger RNA Surveillance and the Evolutioary
    Proliferation of introns. Mol.Biol.Evol
    20(40)563-571
  • Flavio Mignone, Carmela Gissi, Sabino Liunu and
    Graziano Pesole (2002) Untranslated regions of
    mRNAs. Genome Biology 3(3) revies 0004.1-0004.10
  • Pesole G, Grillo G, Larizza A, Liuni S. (2000)
    The untranslated regions of eukaryotic mRNAs
    Structure, function, evolution and bioinformatics
    tools for their analysis. Briefing in
    Bioinformatics. 1(3)236-249
  • W.James (2002) Kent BLAT The BLAST-Like
    Alignment Tool Genome Res. Apr12(4)656-64.
  • Furuno M, Kasukawa T, Saito R, Adachi J, Suzuki
    H, Baldarelli R, Hayashizaki Y, Okazaki Y.(2003)
    CDS annotation in full-length cDNA sequence.
    Genome Res, Jun 13(6B) 1478-1487
  • Strausberg RL et al. (2002) Generation and
    initial analysis of more than 15,000 full-length
    human and mouse cDNA sequences. Proc Natl Acad
    Sci U S A. 2499(26)16899-903.
  • http//www.ncbi.nlm.nih.gov

29
Acknowledgement
  • Dr. Micheal Lynch
  • Dr. Sun Kim
  • Dr. Douglas G. Scofield

30
THE END
Write a Comment
User Comments (0)
About PowerShow.com