ASTD - PowerPoint PPT Presentation

1 / 23
About This Presentation
Title:

ASTD

Description:

database of alternative splice events and the resultant isoform splice patterns ... Thirdly, for grant purposes, ATD - Alternative transcript diversity database, ... – PowerPoint PPT presentation

Number of Views:89
Avg rating:3.0/5.0
Slides: 24
Provided by: exter75
Category:
Tags: astd | thirdly

less

Transcript and Presenter's Notes

Title: ASTD


1
ASTD
  • Alternative Splicing and Transcript Diversity
    database

2
What/who are we?
  • Firstly AltExtron
  • Secondly ASD - Alternative splicing database, and
    the AltSplice pipeline
  • database of alternative splice events and the
    resultant isoform splice patterns of genes from
    human, and other model species.
  • Thirdly, for grant purposes, ATD - Alternative
    transcript diversity database, and the AltTrans
    pipeline
  • formation of transcript isoforms on a genome-wide
    scale by creating a value-added database of
    full-length alternate transcripts from human and
    other model species.
  • We also host the AEdb database manual
    annotations
  • the two, ASD and ATD, blended into 1 pipeline, so
    now we are
  • ASTD
  • Alternative splicing and transcript diversity
    database
  • www.ebi.ac.uk/astd

3
Pipeline in a nutshell
Poly(A) Pipeline
1. Ensembl gene slices EMBL
EST/mRNA/HTC/HInv download
TSS Pipeline
Peptide Pipeline
2. Immunoglobulin filtering (Blast)
9. Data generation
SNP Pipeline
3. Redundant gene filtering (Blat)
8. Events prediction
Conservation Pipeline
4. Genes vs EST/mRNA Alignment (Blast)
7. Splice patterns delineation
6. Intron/exon delineation
5. HSP Collection
4
Limitations of the pipeline
  • Pipeline defines consensus splice sites
  • True biology is removed
  • dicistronic transcripts
  • Nested genes
  • Single exon genes
  • Small exons
  • Large introns
  • Manual annotation would resolve these issues ..

5
Improvements
  • New web interfaces user friendly
  • New database schema that is normalised,
    extendable and maintainable
  • Pipeline improvements some steps now automated,
    bugs corrected, some improvements and blat
    replaces blast for filtering redundant genes
  • Database allows external features to be included
    (Ensembl and VEGA annotations) to compare to our
    transcripts
  • Schema allows export of data in standard format
    GTF2 and GFF3, EMBL flat file format, fasta
    format, and excel spreadsheet
  • Transcripts for complete genome, not restricted
    to those with alternative splice events
  • Introduction of unique identifiers
  • Addition of datasets as input to pipeline HTC
    and HInv
  • Extension of 5 and 3 UTR to capture more TSS
    and poly(A)
  • Annotation of TSS (Align 5 capped mRNAs from
    human and mouse to transcript ) and poly(A) to
    generate full length transcripts

6
www.ebi.ac.uk/astd - Query tools
  • Three query tools are available to retrieve
    entries
  • Simple text search on the main page
  • Genome browsing
  • Advanced search

7
Gene information
8
Genomic region information 1
9
Genomic region information 2
10
Transcript information
11
evidence for transcript 1
12
evidence for transcript 2
13
Expression information
14
Splice event 1
15
Splice event 2
16
Peptide information
17
Statistics
  • Human
  • Number of genes with an ASTD transcript
    16715
  • Number of genes with an ASTD transcription_start
    _site 4936
  • Number of genes with an ASTD polyA_site
    15376
  • Number of genes with an ASTD splicing event
    11316
  • Number of genes with multiple ASTD transcripts
    14101
  • Proportion of genes undergoing alternative
    splicing 68
  • Proportion of genes undergoing alternative
    polyadenylation 92
  • Proportion of genes undergoing alternative
    transcription_start_sites 30
  • Mouse
  • Number of genes with an ASTD transcript
    16491
  • Number of genes with an ASTD transcription_start
    _site 948
  • Number of genes with an ASTD polyA_site
    13556
  • Number of genes with an ASTD splicing event
    9474
  • Number of genes with multiple ASTD transcripts
    13028
  • Proportion of genes undergoing alternative
    splicing 57
  • Proportion of genes undergoing alternative
    polyadenylation 82
  • Proportion of genes undergoing alternative
    transcription_start_sites 6

18
Graph of human growth
19
Controlled vocabularies/ontologies
  • GO
  • SOFA
  • eVOC
  • Splice event ontology
  • MeSH terms

20
Future 1
  • Addition of new species
  • Experimental validation of transcript structure
    and alternative poly(A)s
  • Use EMBL CDS as another source of alignments to
    the genome
  • More frequent releases every 3 months
  • Addition of regulatory motifs ESS, ESE, ISS and
    ISE
  • microRNA target sites from the EURASNET NoE
    (University Basel)

21
Future 2
  • Introduction of unique identifiers means
  • Addition as xref in EMBL so transcripts in the
    INSDC can be grouped into one gene
  • Addition into UniParc so translations can be
    linked to UniProt IsoIds and again grouped as
    being variants of one gene
  • UniParc translations also undergo full InterPro
    scan, TM and SignalP predictions so data is
    precomputed and not done on the fly

22
Future 3
  • The EBI sequence database group and Ensembl have
    merged making the Hinxton Sequencing Forum (HSF)
  • Outcome is that ASTD will be vehicle to augment
    the Ensembl transcript views
  • Full length transcripts with TSS, splice events
    and polyA
  • Definition of the major transcript set using
    annotation of features to transcripts, eg
    expression state, exon array, splice junction
    array evidence etc
  • VEGA/Havana annotations also included
  • Time scale - within 2 years

23
Acknowledgements
  • The ASTD Team
  • Gautier Koscielny
  • Vincent Le Texier
  • Eleanor Whitfield
  • Chellapa Gopalakrishnan
  • Vasudev Kumanduri
  • Sequence Database Group and External Services
  • ASD consortium (Stefan Stamm for AEdb)
  • ATD consortium (Daniel Gautheret for AltPAS)
  • EURASNET consortium
  • The ASTD project at EBI is supported by a grant
    from the EC Eurasnet Network of Excellence
    (LSHG-CT-2005-518238).
Write a Comment
User Comments (0)
About PowerShow.com