Title: Value-adding, Access, and Use: Biological Databases as a Case Study
1Value-adding, Access, and Use Biological
Databases as a Case Study
2Genes..
3.make proteins
4Proteins form complex 3D structures
5Molecules interact
6the right molecules need to bepresent at the
right time
7(No Transcript)
8EMBL-BankDNA sequences
9EMBL-BankDNA sequences
SWISS-PROT TrEMBL InterPro
10EMBL-BankDNA sequences
SWISS-PROT TrEMBL InterPro
EnsEMBL Metazoan Genome Gene Annotation
11EMBL-BankDNA sequences
SWISS-PROT TrEMBL InterPro
EnsEMBL Metazoan Genome Gene Annotation
Array-Express Microarray Expression Data
12EMBL-BankDNA sequences
SWISS-PROT TrEMBL InterPro
EnsEMBL Metazoan Genome Gene Annotation
Array-Express Microarray Expression Data
13EMBL-BankDNA sequences
SWISS-PROT TrEMBL InterPro
EnsEMBL Metazoan Genome Gene Annotation
Array-Express Microarray Expression Data
EMSD Macromolecular Structure Data
14EMBL-BankDNA sequences
SWISS-PROT TrEMBL InterPro
EnsEMBL Metazoan Genome Gene Annotation
Array-Express Microarray Expression Data
EMSD Macromolecular Structure Data
15EMBL-BankDNA sequences
EnsEMBL
Array-Express Microarray Expression Data
SWISS-PROT TrEMBL InterPro
IntAct Protein Protein Interaction Data
EMSD Macromolecular Structure Data
16(No Transcript)
17(No Transcript)
18Integr8
19EMBL-BankDNA sequences
EnsEMBL
Array-Express Microarray Expression Data
SWISS-PROT TrEMBL InterPro
IntAct Protein Protein Interaction Data
EMSD Macromolecular Structure Data
20EMBL-BankDNA sequences
SWISS-PROT TrEMBL InterPro
IntAct Protein Protein Interaction Data
21Running a database project
Database design
End Users
Service Tools
Service DB
Genomes Genes Patents Updates
Submitters
Add value (computation)
Releases Updates
Q/C etc
Add value (review etc.)
22Running a database project
Database design
End Users
Service Tools
Production DB
Service DB
Genomes Genes Patents Updates
Submitters
Add value (computation)
Releases Updates
Q/C etc
Add value (review etc.)
23Running a database project
Database design
End Users
Service Tools
Production DB
Service DB
Genomes Genes Patents Updates
Submission tools
Submitters
Add value (computation)
Releases Updates
Q/C etc
Add value (review etc.)
24Running a database project
Database design
End Users
Service Tools
Production DB
Service DB
Genomes Genes Patents Updates
Submission tools
Submitters
Add value (computation)
Releases Updates
Q/C etc
Add value (review etc.)
25Running a database project
Database design
End Users
Service Tools
Production DB
Service DB
Genomes Genes Patents Updates
Submission tools
Submitters
Add value (computation)
Releases Updates
Q/C etc
Add value (review etc.)
26Running a database project
Database design
End Users
Service Tools
Production DB
Service DB
Genomes Genes Patents Updates
Submission tools
Submitters
Add value (computation)
Releases Updates
Q/C etc
Add value (review etc.)
27Running a database project
Database design
End Users
Service Tools
Production DB
Service DB
Genomes Genes Patents Updates
Submission tools
Submitters
Data Distrib.
Add value (computation)
Releases Updates
Q/C etc
Add value (review etc.)
28Running a database project
Other archives
Database design
End Users
Data exchange
Service Tools
Production DB
Service DB
Genomes Genes Patents Updates
Submission tools
Submitters
Data Distrib.
Releases Updates
Q/C etc
Add value (review etc.)
29Running a database project
Other archives
Database design
Development DB
End Users
Data exchange
Service Tools
Production DB
Service DB
Genomes Genes Patents Updates
Submission tools
Submitters
Data Distrib.
Releases Updates
Q/C etc
Add value (review etc.)
30Running a database project
Other archives
Database design
Development DB
End Users
Data exchange
Service Tools
Production DB
Service DB
Genomes Genes Patents Updates
Submission tools
Submitters
Data Distrib.
Add value (computation)
Releases Updates
Q/C etc
Add value (review etc.)
31EMBL nucleotide sequence database
32Dataflow
33EMBLFlat File
34EMBL Relational Schema
Sequence Info
Reference Info
Location Info
Taxonomy Info
Feature Info
35Data Access and Use
- Network services
- Sequence Retrieval System (SRS) integrating and
linking the main nucleotide and protein databases
plus many specialized databases - Database releases are produced quarterly- via
FTP (inc. mirror sites) and CD-ROM - Daily and cumulative updates via FTP
- Sequence search servers
36April 2003 TrEMBL 23.4 SWISS-PROT 41.2
- 829,111 TrEMBL entries
- 123,721 SWISS-PROT entries
- weekly production of a non-redundant and
comprehensive protein sequence database
consisting of SWISS-PROT, TrEMBL, and TrEMBLnew
ftp.ebi.ac.uk/pub/databases/sp_tr_nrdb/
37Goals
- High level of annotation
- Minimal redundancy
- High level of integration with other databases
- Complete and up-to-date
- Availability
38(No Transcript)
39Automatic annotation of TrEMBL
- Data-mining to extract conditions from InterPro
- Extract SWISS-PROT reference entries fulfilling
the conditions - Extract common annotation
- Store conditions and common annotation in
RuleBase - Group TrEMBL by conditions
- Add common annotation to TrEMBL
InterPro
SWISS-PROT
TrEMBL
RuleBase
40Cross-references
41(No Transcript)
42Funding
- EMBL
- European Commission
- NIH
- Industrial licenses
- MRC
- IUPHAR
43(No Transcript)
44SWISS-PROT, TrEMBL, InterPro, etc, at EBI and SIB
- Group leaders Rolf Apweiler, Amos Bairoch
- Co-ordinatorsWolfgang Fleischmann, Henning
Hermjakob, Michele Magrane, Maria-Jesus Martin,
Nicola Mulder, Claire ODonovan, Manuela Pruess - Annotators/curators Philippe Aldebert, Andrea
Auchincloss, Kirsty Bates, Marie-Claude Blatter
Garin, Brigitte Boeckmann, Silvia Braconi
Quintaj, Paul Browne, Evelyn Camon, Danielle
Coral, Elisabeth Coudert, Tania de Oliveria
Lima, Kirill Degtyarenko, Sylvie Dethiollaz, Ann
Estreicher, Livia Famiglietti, Nathalie
Farriol-Mathis, Stephanie Federico, Serenella
Ferro, Gill Fraser, Raffaella Gatto, Vivienne
Gerritsen, Arnaud Gos, Nadine Gruaz-Gumowski,
Ursula Hinz, Chantal Hulo, Janet James, Florence
Jungo, Vivien Junker, Youla Karavidopoulou, Maria
Krestyaninova, Kati Laiho, Minna Lehvaslaiho,
Karine Michoud, Virginie Mittard, Madelaine
Moinat, Sandra Orchard, Sandrine Pilbout, Sylvain
Poux, Sorogini Reynaud, Catherine Rivoire, Bernd
Röchert, Michel Schneider, Christian Sigrist,
Andre Stutz, Shyamala Sundaram, Michael Tognolli,
Sandra van den Broek, Bob Vaughan, Eleanor
Whitfield - Programmers Daniel Barrell, David Binns, Michael
Darsow, Ujjwal Das, Eduardo de Castro, Alexander
Fedotov, Astrid Fleischmann, Elisabeth Gasteiger,
Alain Gateau, Andre Hackmann, Ivan Ivanyi, Eric
Jain, Alexander Kanapin, Paul Kersey, Ernst
Kretschmann, Corinne Lachaize, Chris Lewington,
Xavier Martin, John Maslen, Peter McLaren,
Rupinder Singh Mazara, Lorna Morris, John
ORourke, Isabelle Phan, Astrid Rakow, Kai Runte,
Florence Servant, Allyson Williams, Dan Wu - Research staff Kristian Axelsen, Pierre-Alain
Binz, Nicolas Hulo, Anne-Lise Veuthey - Clerical/secretarial assistance Veronique
Mangold, Claudia Sapsezian, Margaret Shore-Nye,
Veronique Verbegue - Students Pavel Dobrokhotov, Alexandre Gattiker,
various MCF, etc