XOMICS FIRST MEETING June 17th 2005 - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

XOMICS FIRST MEETING June 17th 2005

Description:

Full length insert sequencing (MG) Annotation jamboree, Cambridge (MG) Xenbase (NP) ... Annotation jamboree, Cambridge. July 11th 15th Sanger Institute ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 23
Provided by: mikegil
Category:

less

Transcript and Presenter's Notes

Title: XOMICS FIRST MEETING June 17th 2005


1
X-OMICS FIRST MEETINGJune 17th 2005
  • Bioinformatics Workpackage
  • And
  • Related Issues
  • Frederic Brunet
  • Nicolas Pollet
  • Mike Gilchrist

2
Agenda - I
  • REPORTS
  • Status of genome assembly (MG)
  • Expansion of EST collection clustering (MG)
  • Full length insert sequencing (MG)
  • Annotation jamboree, Cambridge (MG)
  • Xenbase (NP)
  • Online resources (NP)

3
Agenda - II
  • DISCUSSIONS
  • Co-ordinating data between sites (MG)
  • Ontologies (NP)
  • Xenopus resource portal (MG/NP)
  • Orthologous gene sets (MG/FB)
  • Comparing X. laevis and X. tropicalis (FB/MG)

4
Status of genome assembly
  • JGI Genome Assembly v3
  • v4 will be released in late July
  • Mostly long range joins
  • End-sequencing large inserts
  • Very useful even if not complete
  • Maybe 90 of genes
  • Sequence is on fragments scaffolds
  • Not complete enough for chromosomes
  • Much finishing could be done
  • But very useful for gene structure

5
Expansion of EST collection clustering
  • 2004 400,000 ESTs
  • 2005 JGI sequenced 325,000 more clones
  • 600,000 ESTs
  • 450,000 released to GenBank
  • Many diverse libraries, 20,000 per
  • Esp. adult tissues metamorphosis stage
  • The current one million ESTs have been clustered
    (MG) and are awaiting release from JGI
  • Milestone!
  • 40,000 clusters
  • 40,000 singletons
  • (14,000 EF1-alpha)

6
Full length insert sequencing
  • Full length cDNAs
  • 2883 Xenopus tropicalis MGC
  • 6000 from Wellcome FL project
  • 4100 sequenced (some overlap)
  • 3100 validated FL
  • 1920 more from JGI currently being sequenced at
    Stamford (some overlap)
  • 5000 more in pipeline
  • Cloning vector is significant pCS107/8
  • Sanger will sequence some more

7
Annotation jamboree, Cambridge
  • July 11th 15th Sanger Institute
  • Annotate 4,100 cDNA sequences
  • Mark ends of ORF
  • Identify genes (functional annotation)
  • Exons on genomic sequence - ?
  • Should have partially automated genome annotation
  • Issues
  • Identify genes from Mouse/Human/many??
  • Concerns about accuracy
  • Gene symbols style?
  • More annotation in September/October - JGI

8
Xenbase
  • Primarily concerned with community information
    and literature annotation
  • Collaboration
  • Avoid duplication of effort

9
Online resources
  • There is a lot of Xenopus data out there

10
Agenda - II
  • DISCUSSIONS
  • Co-ordinating data between sites (MG)
  • Ontologies (NP)
  • Xenopus resource portal (MG/NP)
  • Orthologous gene sets (MG/FB)
  • Comparing X. laevis and X. tropicalis (FB/MG)

11
Co-ordinating data between sites
  • Why?
  • What data is there?
  • Where is it? And who owns it?
  • Gene-centric
  • Key Words Ontologies
  • Data sharing and access
  • Centralised vs. Distributed

12
What? Where? Who?
  • Related data is kept in various separate
    databases
  • It would undoubtedly be useful to be able to ask
    questions of the entire dataset at once
  • Find me all the genes that are expressed
  • For this gene find all the mutant phenotypes
  • Need to identify what there is
  • How its managed
  • Is it accessible
  • Who owns it
  • What long terms plans for it are

13
Gene-centric
  • How to reach a state where we can compare data
    between different databases using a gene
    identity?
  • Annotation will help
  • Gene symbols may be unreliable
  • Sequence is best!
  • Wont happen on its own

14
Key Words Ontologies
  • There are other sets of information that must be
    consistent between databases for co-ordination to
    work
  • Must be consistently able to describe (for
    instance) where in the organism and at what stage
    of development some observed effect has taken
    place
  • Key words vs ontologies

15
Data sharing and access
  • Public Data
  • Shareable Data
  • Private Data
  • Make sure the technological effort is not
    disproportionate
  • What are the timescales?

16
Centralised vs. Distributed
  • Three pots of the same type of data
  • E.g. in situ expression data
  • Assuming all the experimental and informational
    problems have been dealt with, what are our
    options?
  • Amalgamate the data
  • Distributed model

17
Centralised
18
Distributed
19
Ontologies
  • The glue that holds data sets together

20
Xenopus resource portal
  • A single point of reference
  • Links to all the key Xenopus resources
  • But with some idea of the sort of data to be
    found, and how to use it

21
Orthologous gene sets
  • For each Xenopus gene find the gene that has the
    same function (!?) in other model organisms
  • N.b. potential difference between sequence
    ortholog and functional ortholog
  • What might this resource be good for?
  • Frederic

22
Comparing X. laevis and X. tropicalis
  • Frederic
  • Looking for pairs of X. laevis genes
Write a Comment
User Comments (0)
About PowerShow.com