NCBI Highlights - PowerPoint PPT Presentation

1 / 21
About This Presentation
Title:

NCBI Highlights

Description:

PubMed-The biomedical literature. Genbank-Nucleotide sequences, protein ... Most biological experiments require bio-informatics. Design tools & data collection ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 22
Provided by: bch7
Category:
Tags: ncbi | highlights

less

Transcript and Presenter's Notes

Title: NCBI Highlights


1
Lecture 3 NCBI Highlights and Text Search
Tools (Entrez, PubMed, OMIM)
2
(No Transcript)
3
Haaretz Publication, June 2000
4
The Central Paradigm of Bio-informatics
Molecular structure
Biochemical function
Genetic information
Symptoms
5
http//www.ncbi.nlm.nih.gov/Education/BLASTinfo/mi
lestones.html
6
-Haernophilus influenzae (2 Mb).
-First Eukaryote genome (Saccharomyces
cereviseae (12 Mb)).
-First multi-cellular Eukaryote (Caenorhabditis
elegans (100Mb)).
-A model organism for animal kingdom (Drosophila
melanogaster).
-A model organism for plant kingdom-
(Arabidopsis thaliana).
7
What is a Biological Database ?
A biological database is a large, organized body
of data, usually associated with computer
software designed to update, query, and
retrieve stored components of the data.
Example nucleotide sequence database will
contain information such as contact name
sequence and description of the molecule
scientific name of the source organism
literature citations.
8
Challenge of Data Retrieval
  • Store large amounts of data.
  • Update data on a regular basis.
  • Fast retrieval of information Extract as much
  • updated information as possibleCross
    multiple databanks.
  • Many data sources in different locations
  • Use connected databases.
  • Systematic classification of the data.
  • Search multiple terms in one query
  • Use multiple search fields.

9
International collaboration by NCBI, DDBJ, EMBL
10
(No Transcript)
11
NCBI
ENTREZ - PubMed
http//www.ncbi.nlm.nih.gov/
http//www.ncbi.nlm.nih.gov/Sitemap/index.html
12
At NCBI, many of the databases are linked
through a unique search and retrieval system,
called . Entrez allows a user not only to
access and retrieve specific information from
many NCBI databases, but to access integrated
information from a single database. Example the
protein database is cross-linked to the
taxonomy database.


http//www.ncbi.nlm.nih.gov/Entrez/
13
Information Flow
  • PubMed-The biomedical literature.
  • Genbank-Nucleotide sequences,
  • protein sequence databases.
  • 3D macromolecular structures.
  • Complete genome maps.
  • Taxonomy-organisms in GenBank.
  • OMIM-Genetic diseases.

http//www.ncbi.nlm.nih.gov/Tour/tour.html
14
Exponential growth of biological information
Efficient storage and management tools are most
important.
15
Primary (raw) databases genomic, DNA, protein.
Types of Data (Databases)
Ribbons
Publications
Cylinders
Secondary (analyzed) Databases
16
Types of Primary Databases
DNA sequences GenBank http//www.ncbi.nlm.n
ih.gov/GenBank/GenBankOverview.html
EMBL http//www.ebi.ac.uk/embl/index.html
DDBJ (DNA Data Bank of Japan)
http//www.ddbj/nig.ac.jp/ Protein
sequences Swiss-prot and TrEMBL http//www.expas
y.ch/sprot/sprot-top.html Protein Identification
Resource (PIR) http//www-nbrf.georgetown.edu/p
irwww/pirhome.shtml Genomic Databases Whole
genomes (NCBI) http//www.ncbi.nlm.nih.gov/entr
ez/query.fcgi?dbgenome Whole microbial genome
(TIGR) http//www.tigr.org/tigr-scripts/CMR2/C
MRGenomes.sp1 Human gene mutations
http//www.uwcm.ac.uk/uwcm/mg/hgmd0.html Others
17
Analysis and interpretation of data may reveal
patterns and trends in Biology
  • Common sequences can be identified by multiple
    alignment.
  • Sequence families or neighborhoods can be
    defined.
  • Motifs can provide clues for biochemical
    function.
  • Clustering sequences into trees reflect the
    degree of similarity between
  • species and evolutionary relationships.

18
Types of Database Search
  • Text-based search.
  • Sequence based database search
  • (based on sequence
    similarities).
  • Structure based database search
  • (based on
    structure similarities).
  • Motif/Domain based database search
  • (based on Domain
    similarities).
  • Other.

19
Biological Databases
  • DNA databanks
  • GenBank, DDBJ, EMBL,
  • Protein databases
  • PIR, Swiss-Prot, GenPept, PDB, TrEMBL
  • EST databases
  • dbEST, DOTS, UniGene, STACK
  • Structure databases
  • MMDB, PDB
  • Pathway databases
  • KEGG, BRITE, TRANSPATH,
  • Motif databases
  • Prosite, Pfam, BLOCKS, TransFac, PRINTS, URLs,
  • Gene, protein disease databases
  • GeneCards, OMIM, OMIA,
  • Taxonomy databases
  • Literature databases
  • PubMed, Medline,
  • Patent database
  • Apipa, CA-STN, IPN, USPTO, EPO, Beilstein,
  • Others
  • RNA databases, SNP,
  • microarray

20
Gene finding
Design tools data collection
Over 2x109 bp (mainly human)
Whole genome approach
  • Huge data explosion.
  • Management of biological information is
    crucial but becomes harder.
  • Most biological experiments require
    bio-informatics.

http//www.ncbi.nlm.nih.gov/Web/Newsltr/Summer99/d
ecade.html
21
Growth of GenBank
GenBank Divisions
PLN - Plant sequences. PRI - Primates ROD
- Rodents MAM- Other mammals VRT - Other
vertebrates INV - Invertebrates BCT -
Bacterial PHG - Phage VRL - Viral
SYN - Synthetic UNA - Un-annotated
PAT - Patent NEW - New
22
EMBL European Molecular Biology lab
http//www1.embl-heidelberg.de/
EMBL Database Entries by species
Write a Comment
User Comments (0)
About PowerShow.com