1 of 34 - PowerPoint PPT Presentation

About This Presentation
Title:

1 of 34

Description:

Paul Flicek (EBI), Steve Searle (Sanger Institute) Ensembl ... Ensembl Team. 33 of 28 Sean T. McHugh (http://www.cambridgeincolour.com) Cambridge ... – PowerPoint PPT presentation

Number of Views:57
Avg rating:3.0/5.0
Slides: 35
Provided by: BertOv6
Category:
Tags: gordon | jeff | kim | list | names | of | paul | sean | star | yates

less

Transcript and Presenter's Notes

Title: 1 of 34


1
Genomes with Ensembl
Dr. Giulietta M. Spudich European Bioinformatics
Institute Hinxton, UK
2
Today
  • Introduction to the Ensembl project
  • Walk-through of the browser
  • BioMart
  • Variation
  • Comparative Genomics

3
Introduction to Ensembl
  • Why do we have genome browsers?
  • Why Ensembl?
  • Ensembl genes and genomes
  • Help and tutorials

4
Genome browsers provide a map
Figure adapted from the EnCODE project www.nature.
com/nature/focus/encode/
5
Genome Browsers
  • Ensembl Genome browser
  • http//www.ensembl.org
  • NCBI Map Viewer
  • http//www.ncbi.nlm.nih.gov/mapview/
  • UCSC Genome Browser
  • http//genome.ucsc.edu

6
What Distinguishes Ensembl from the UCSC and NCBI
Browsers?
  • The gene set. Automatic annotation based on mRNA
    and protein information.
  • Programmatic access via the Perl API (open
    source)
  • BioMart
  • Integration with other databases (DAS)
  • Comparative analysis (gene trees)

7
Subjects
  • Why do we have genome browsers?
  • Why Ensembl?
  • How can we extract data from Ensembl?
  • Where can I find help?

8
To meet a challenge
  • Ensembls AIM To provide annotation for the
    biological community that is freely available and
    of high quality
  • Started in 2000
  • Joint project between EBI and Sanger
  • Funded primarily by the Wellcome Trust,
    additional funding by EMBL, NIH-NIAID, EU, BBSRC
    and MRC

9
Vertebrates are available
Extension to other genomes Plants,
Microorganisms, www.ensemblgenomes.org
Non-chordates D. melanogaster C. elegans S.
cerevisiae
10

Extending Ensembl across the taxonomic space
Slide design by Jeff Almeida-King
F. D. Ciccarelli, T. Doerks, C. von Mering, C. J.
Creevey, B. Snel P. Bork. Towards automatic
reconstruction of a highly resolved tree of life.
Science, 3 March 2006.
11
Exploring genomes
  • Vertebrates focus www.ensembl.org
  • Other species www.ensemblgenomes.org

12
Subjects
  • Why do we have genome browsers?
  • Why Ensembl?
  • Ensembl (vertebrate) genes genomes
  • Help and tutorials

13
What is known?
Genomic assemblies from sequencing consortia
14
What is known?
Proteins and cDNA/mRNA sequences from the
research community found in
  • UniProt/Swiss-Prot (manually curated)
  • UniProt/TrEMBL
  • www.uniprot.org
  • NCBI RefSeq (manually curated)
  • www.ncbi.nlm.nih.gov/RefSeq

15
Combining genes and genomes
tgcctgttag...
16
Too many pieces
17
Ensembl shows one transcript
with underlying evidence
18
VEGA/Havana
  • Automatic annotation pipeline Gene building all
    at once (whole genome)
  • Ensembl
  • Manual curation case-by-case basis
  • VEGA Vertebrate Genome Annotation
  • Havana

19
HAVANA
  • http//www.sanger.ac.uk/HGP/havana/

20
Genes and Transcripts in Ensembl
  • Ensembl known transcripts
  • Ensembl novel transcripts
  • Ensembl merged transcripts (Havana)
  • EST clusters
  • More manual curation (SGD, WormBase, FlyBase)

21
Ensembl/Havana
  • Transcripts are labelled
  • Ensembl
  • Havana
  • Ensembl/Havana merge

22
Names in Ensembl
  • ENSG Ensembl Gene ID
  • ENST Ensembl Transcript ID
  • ENSP Ensembl Peptide ID
  • ENSE Ensembl Exon ID
  • For other species than human a suffix is added
  • MUS (Mus musculus) for mouse ENSMUSG
  • DAR (Danio rerio) for zebrafish ENSDARG,
    etc.

23
Low-coverage genomes
  • High-coverage sequencing is time-consuming and
    expensive
  • BAC clones (gt10x) Human, Mouse, Zebrafish
  • Whole Genome Shotgun (6x) Chimp, Rat,
    Chicken,...
  • Low (2x) coverage genome sequencing
  • Faster, cheaper, but only useful when annotated
  • Assembled into lots of scaffolds
  • Classic Ensembl gene-build would result in many
    partial and fragmented genes

24
Some 2X genomes
25
Low-Coverage Gene-Build
  • Whole Genome Alignment to an annotated
    high-quality reference genome
  • Guided re-ordering of scaffolds
  • Annotation of longer, more complete gene
    structures

26
2X Genebuild
Human gene
Human genome
Cat scaffold 2
Cat scaffold 1
Human or dog gene (projected)
27
What other annotation?
  • Non-coding (nc)RNAs
  • IDs in other databases
  • microarray probes, clonesets, BAC maps
  • Other features of the genome
  • repeats, CpG islands
  • Comparative data
  • orthologues and paralogues, protein families,
    whole genome alignments, syntenic regions
  • Variation data
  • SNPs, InDels
  • Regulatory data (a first guess at promoter and
    enhancer elements)
  • Data from external sources (DAS)

28
Sources of Variation
  • NCBI dbSNP
  • Import alleles, flanking sequence, frequencies,
  • Calculate position, transcript effect
  • http//www.ncbi.nlm.nih.gov/SNP/
  • For human also
  • HGVbase
  • Affy GeneChip 100K and 500K Mapping Array
  • Affy Genome-Wide SNP array 6.0
  • Ensembl-called SNPs (from Celera reads and Jim
    Watsons and Craig Venters genomes)
  • For mouse, rat, dog and chicken also
  • Sanger- and Ensembl-called SNPs (other strains /
    breeds)
  • STAR Project for rat, other projects

29
External Sources
  • Large-scale variations in
  • DECIPHER
  • Database of Chromosomal Imbalance and Phenotype
    in Humans using Ensembl Resources
  • DGV loci
  • Database of Genomic Variants
  • CNVs, Inversions, InDels

30
Subjects
  • Why do we have genome browsers?
  • Why Ensembl?
  • Ensembl genes and genomes
  • Help and tutorials

31
How is this information organised?
  • Ensembl Views (Website)
  • Ensembl Database (open source)
  • BioMart DataMining tool

32
Help and Information
  • Comments and questions?
  • helpdesk_at_ensembl.org
  • Check out our tutorials page
  • www.ensembl.org/info/website/tutorials/index.h
    tml
  • Videos http//www.youtube.com/user/EnsemblHelpdesk
  • Mailing list ensembl-announce_at_ebi.ac.uk
  • Come visit our blog! http//ensembl.blogspot.co
    m/
  • FTP site ftp//ftp.ensembl.org
  • Amazon Web Services http//aws.amazon.com/publicd
    atasets

33
Ensembl Team
Ensembl Paul Flicek (EBI), Steve Searle (Sanger Institute)
Software Glenn Proctor, Andreas Kähäri, Stephen Keenan, Rhoda Kinsella, Eugene Kulesha, Ian Longden, Daniel Rios, Iliana Toneva
Comparative Genomics Javier Herrero, Kathryn Beal, Stephen Fitzgerald, Leo Gordon, Albert Vilella
Functional Genomics Ian Dunham, Nathan Johnson, Steven Wilder
Variation Fiona Cunningham, Yuan Chen, Pontus Larrson, Will McLaren
Analysis and Annotation Bronwen Aken, Julio Banet, Susan Fairley, Jan-Hinnerck Vogel, Simon White, Amonida Zadissa
Web Team Anne Parker, Eugene Bragin, Bethan Pritchard, Steve Trevanion (VEGA)
Outreach Xosé M Fernández, Jeff Almeida-King, Bert Overduin, Michael Schuster (QC), Giulietta Spudich
Systems Support Guy Coates, James Beal, Gen-Tao Chiang, Peter Clapham, Simon Kelley, Shelley Goddard, Tracy Mumford, Kerry Smith
Research Benoît Ballester, Damian Keefe, Dace Ruklisa, Petra Catalina Schwalie, Guy Slater
Vertebrate Genomics Illka Lappalainen, Chao-Kung Chen, Laura Clark, Jonathan Hinton, Vasudev Kumanduri, Edoardo Marcora, Damian Smedley, Richard Smith, Phil Wilkinson, Holly Zheng-Bradley
Ensembl Genomes Paul Kersey, Paul Derwent, Matthias Haimel, Arnaud Kerhornou, Uma Maheswari, Michael Nuhn, Dan Staines, Andy Yates
VectorBase Dan Lawson, Gautier Koscielny, Karyn Megy
Zebrafish Kerstin Howe, Kim Brugger (GRC), Will Chow, Britt Reimholz, James Torrance
Ensembl Strategy Ewan Birney, Richard Durbin, Tim Hubbard
34
The Wellcome Trust Genome Campus
Write a Comment
User Comments (0)
About PowerShow.com