Data retrieval - PowerPoint PPT Presentation

1 / 19
About This Presentation
Title:

Data retrieval

Description:

Data retrieval BioMart Export View Data sets on ftp site MySQL queries of databases Perl API access to databases Data Mining in Ensembl with EnsMart Possible queries – PowerPoint PPT presentation

Number of Views:155
Avg rating:3.0/5.0
Slides: 20
Provided by: Nur108
Category:

less

Transcript and Presenter's Notes

Title: Data retrieval


1
Data retrieval
BioMart
Export View
Data sets on ftp site MySQL queries of
databases Perl API access to databases
2
ExportView
3
Data Mining in Ensembl with EnsMart
August 2005
4
Possible queries
  • All genes from a candidate region
  • Genes with a particular protein domain
  • Members of a protein family
  • Genes associated with SNPs

5
More specific queries
  • Human genes with upstream regions conserved
    w.r.t. mouse
  • Upstream sequence for all Ensembl genes mapped to
    U95A chip (similarly, complete genomic annotation
    of MG_U74).
  • Genomic location and description of all mouse,
    rat and fugu homologues of all human genes, with
    transmembrane domains, expressed in
    cardiovascular system and have non-synonymous
    SNPs.

6
Ensembl core database
  • Normalised
  • Each data point stored only once
  • Quick updates
  • Minimal storage requirements
  • But
  • Many tables
  • Many joins for complicated queries
  • Slow for data mining questions

7
BioMart and EnsMart
  • Large-scale data retrieval tool
  • Query builder interface
  • Databases Ensembl, SNP, Vega, (MSD, UniProt)
  • Associated features or sequences
  • Flexible output formats
  • http//www.ebi.ac.uk/biomart/
  • http//www.ensembl.org/EnsMart/

8
Mart database
  • De-normalised
  • Tables with redundant information
  • Query-optimised
  • Fast and flexible
  • designed for data mining

9
Primary Data Sets
  • Ensembl genes
  • SNP
  • Single nucleotide polymorphisms
  • Deletion-insertion polymorphisms
  • Short tandem repeats
  • Vega genes
  • (MSD protein structures)
  • (UniProt proteomes)

10
Secondary Data Sets
  • Markers
  • Diseases
  • Gene ontology
  • Gene expression information
  • Homology predictions
  • Protein annotation

11
Information flow
start
filter
output
12
BioMart
http//www.biomart.org/
13
BioMart - Features
14
BioMart - Sequences
15
Output formats
16
What about queries not possible to do in EnsMart
  • Direct database access at ensembldb.ensembl.org
  • martdb.ebi.ac.uk
  • MySQL client

Download MySQL for Windows http//www.winmysql.com
/page4.html File wmysr11.zip
17
Access via Perl object API
  • Based on bioperl
  • Ensembl modules
  • For an introduction, see the tutorial at
  • http//www.ensembl.org/info/software/core/

18
There are other ways
MartShell
Commandline interface to Mart written in Java.
It works with a Mart Query Language
19
MartExplorer
Write a Comment
User Comments (0)
About PowerShow.com