Databases at NCBI - PowerPoint PPT Presentation

1 / 155
About This Presentation
Title:

Databases at NCBI

Description:

The structure is achieved by organizing the data according to a ... http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html. HomoloGene. HomoloGene ... – PowerPoint PPT presentation

Number of Views:4321
Avg rating:5.0/5.0
Slides: 156
Provided by: lslSin
Category:

less

Transcript and Presenter's Notes

Title: Databases at NCBI


1
Databases at NCBI
  • Shiau, Cheng-Kai

2
Database
  • A database is a structured collection of records
    or data that is stored in a computer system. The
    structure is achieved by organizing the data
    according to a database model. The model in most
    common use today is the relational model. Other
    models such as the hierarchical model and the
    network model use a more explicit representation
    of relationships.

http//en.wikipedia.org/wiki/Database
3
Database
4
Database
5
Database
6
Database
7
Database
8
Database
AmiGO
9
Database
AmiGO
10
Database
11
Database
AmiGO
12
Database
  • A database is a structured collection of records
    or data that is stored in a computer system. The
    structure is achieved by organizing the data
    according to a database model. The model in most
    common use today is the relational model. Other
    models such as the hierarchical model and the
    network model use a more explicit representation
    of relationships.

http//en.wikipedia.org/wiki/Database
13
About NCBI
  • What does NCBI do?
  • Established in 1988 as a national resource for
    molecular biology information, NCBI creates
    public databases, conducts research in
    computational biology, develops software tools
    for analyzing genome data, and disseminates
    biomedical information - all for the better
    understanding of molecular processes affecting
    human health and disease.

http//www.ncbi.nlm.nih.gov/
14
About NCBI
15
Databases at NCBI
  • Databases at NCBI
  • Literature databases
  • PubMed, PubMed Central, Books, OMIM
  • Molecular databases
  • Sequences
  • EST, STS, GSS, HTGS, HTC, FLIC, UniGene, RefSeq,
    HomoloGene
  • Structures
  • MMDB, CDD,
  • Taxonomy
  • Other databases
  • GEO, SKY/CGH

16
Databases at NCBI
http//www.ncbi.nlm.nih.gov/
17
Databases at NCBI
  • Databases at NCBI
  • Literature databases
  • PubMed, PubMed Central, Books, OMIM
  • Molecular databases
  • Sequences
  • EST, STS, GSS, HTGS, HTC, FLIC, UniGene, RefSeq,
    HomoloGene
  • Structures
  • MMDB, CDD,
  • Taxonomy
  • Other databases
  • GEO, SKY/CGH

18
Literature Databases
  • Literature databases
  • PubMed
  • PubMed Central
  • Books
  • OMIM

19
PubMed
  • PubMed
  • PubMed database was designed to provide access to
    citations (with abstracts) from biomedical
    journals.
  • Subsequently, a linking feature was added to
    provide access to full-text journal articles at
    web sites of participating publishers, as well as
    to other related web resources.

20
PubMed
  • Data sources
  • MEDLINE
  • NLMs premier bibliographic databases covering
    the fields of medicine, nursing, dentistry,
    veterinary medicine, the health care system, and
    the preclinical sciences, such as molecular
    biology.
  • Non-MEDLINE
  • General science and chemistry journals that
    contain life sciences indexed for MEDLINE, e.g.,
    the plate tectonics or astrophysics articles from
    Science magazine.
  • Other databases
  • HealthSTAR, AIDSLINE, HISTLINE, SPACELINE,
    BIOETHICSLINE, and POPLINE.

21
PubMed
  • All electronic data are supplied via FTP to NCBI
    in XML format, in accordance with the NLMs
    specifications (document type definition, or
    DTD).
  • XML extensible markup language
  • DTD document type definition
  • Example
  • A 160cm 50kg B 170cm 60kg
  • ltNgtAlt/NgtltHgt160lt/HgtltWgt50lt/WgtltNgtBlt/NgtltHgt170lt/HgtltWgt60
    lt/Wgt

22
PubMed
  • PubMed citations are indexed by MeSH (Medical
    Subject Headings) terms.

NCBI Handbook
23
PubMed Central
  • PubMed Central (PMC) is the National Library of
    Medicine's digital archive of full-text journal
    literature.
  • Journals deposit material in PMC on a voluntary
    basis.
  • Articles in PMC may be retrieved either by
    browsing a table of contents for a specific
    journal or by searching the database.
  • Certain journals allow the full text of their
    articles to be viewed directly in PMC.
  • Other journals require that PMC direct users to
    the journals own web site to see the full text
    of an article. In this case, the material will
    always be available free to any user no more than
    1 year after publication but will usually be
    available only to the journals subscribers for
    the first 6 months to 1 year.

24
Literature Databases
  • Literature databases
  • PubMed
  • PubMed Central
  • Books
  • OMIM

25
NCBI BookShelf
  • The BookShelf is a collection of biomedical books
    that can be searched directly in Entrez or found
    via keyword links in PubMed abstracts.
  • Books have been added to the BookShelf in
    collaboration with authors and publishers, and
    the complete content (including all figures and
    tables) is free to use for anyone with an
    Internet connection.
  • The online books are displayed one section at a
    time, with navigation provided to other parts of
    the current chapter or to other chapters within
    the book.
  • Many of the books on the BookShelf can be browsed
    without any restriction at all others have less
    flexibility for navigating the complete content.
  • The publisher (or the owner of the content)
    defines the rules for access.
  • The books are linked to PubMed through research
    papers citations within the text.

26
NCBI BookShelf
27
NCBI BookShelf
28
Literature Databases
  • Literature databases
  • PubMed
  • PubMed Central
  • Books
  • OMIM

29
OMIM
  • Online Mendelian Inheritance in Man ( OMIMTM) is
    a timely, authoritative compendium of
    bibliographic material and observations on
    inherited disorders and human genes. It is the
    continuously updated electronic version of
    Mendelian Inheritance in Man (MIM).
  • MIM was last published in 1998 and is authored
    and edited by Dr. Victor A. McKusick and a team
    of science writers, editors, scientists, and
    physicians at The Johns Hopkins University and
    around the world. Curation of the database and
    editorial decisions take place at The Johns
    Hopkins University School of Medicine.

30
OMIM
31
OMIM
32
OMIM
33
OMIM
34
Literature Databases
  • Literature databases
  • PubMed
  • PubMed Central
  • Books
  • OMIM

35
Databases at NCBI
  • Databases at NCBI
  • Literature databases
  • PubMed, PubMed Central, Books, OMIM
  • Molecular databases
  • Sequences
  • EST, STS, GSS, HTGS, HTC, FLIC, UniGene, RefSeq,
    HomoloGene
  • Structures
  • MMDB, CDD,
  • Taxonomy
  • Other databases
  • GEO, SKY/CGH

36
Molecular Databases
  • Sequences databases
  • HTGS, HTCFLIC
  • EST
  • STS
  • GSS
  • UniGene
  • RefSeq
  • HomoloGene
  • Structures databases
  • MMDB
  • CDD
  • Taxonomy

37
HTGS
  • High-throughput genomic sequence (HTGS) entries
    are submitted in bulk by genome centers,
    processed by an automated system, and then
    released to GenBank.
  • To submit sequences in bulk to the HTG processing
    system, a center or group must set up an FTP
    account. Submitters frequently use two tools to
    create HTG submissions, Sequin or fa2htgs.

38
HTGS
  • Phase 0 sequences are one-to-few reads of a
    single clone and are not usually assembled into
    contigs. They are low-quality sequences that are
    often used to check whether another center is
    already sequencing a particular clone.
  • Phase 1 entries are assembled into contigs that
    are separated by sequence gaps, the relative
    order and orientation of which are not known.
  • Phase 2 entries are also unfinished sequences
    that may or may not contain sequence gaps. If
    there are gaps, then the contigs are in the
    correct order and orientation.
  • Phase 3 sequences are of finished quality and
    have no gaps.

NCBI Handbook
39
Genome Sequencing
  • Bacterial artificial chromosome (BAC) Sequencing

http//www.genomenewsnetwork.org/articles/06_00/se
quence_primer.shtml
40
Genome Sequencing
Nature, Vol. 381, 364-366 (1996)?
http//en.wikipedia.org/
41
Genome Sequencing
  • Whole Genome Shotgun (WGS) Sequencing

42
Genome Sequencing
Nature, Vol. 381, 364-366 (1996)?
43
Genome Sequencing
  • BAC sequencing
  • High precision
  • Slow
  • Shotgun sequencing
  • High throughput
  • Consume large computational resource
  • Fast at early stage, but complicated at later
    stage

44
HTGS
  • Submission tools
  • fa2htgs Command-line program
  • tbl2asn Command-line program
  • Sequin Stand-alone bulk submission tool

http//www.ncbi.nlm.nih.gov/Sequin/QuickGuide/sequ
in.htm
45
HTC FLIC
  • HTC records are High-Throughput cDNA/mRNA
    submissions that are similar to ESTs but often
    contain more information.
  • FLIC records, Full-Length Insert cDNA, contain
    the entire sequence of a cloned cDNA/mRNA.
    Therefore, FLICs are generally longer, and
    sometimes even full-length, mRNAs. They are
    usually annotated with genes and coding regions,
    although these may be lab systematic names rather
    than functional names.

46
Molecular Databases
  • Sequences databases
  • HTGS, HTCFLIC
  • EST
  • STS
  • GSS
  • UniGene
  • RefSeq
  • HomoloGene
  • Structures databases
  • MMDB
  • CDD
  • Taxonomy

47
What are Expressed Sequence Tags
  • ESTs are small pieces of DNA sequence (usually
    200 to 500 nucleotides long) that are generated
    by sequencing either one or both ends of an
    expressed gene. The idea is to sequence bits of
    DNA that represent genes expressed in certain
    cells, tissues, or organs from different
    organisms and use these "tags" to fish a gene out
    of a portion of chromosomal DNA by matching base
    pairs. The challenge associated with identifying
    genes from genomic sequences varies among
    organisms and is dependent upon genome size as
    well as the presence or absence of introns, the
    intervening DNA sequences interrupting the
    protein coding sequence of a gene.

http//www.ncbi.nlm.nih.gov/About/primer/est.html
48
What are Expressed Sequence Tags
http//www.ncbi.nlm.nih.gov/About/primer/est.html
49
What are Expressed Sequence Tags
sequencing
sequencing
cDNA
5EST
3EST
  • Usually 200500 nucleotides long

50
What are Expressed Sequence Tags
Chromosome sequence
Mapping back to chromosome sequence
5EST
3EST
51
Expressed Sequence Tags(ESTs)?
52
Molecular Databases
  • Sequences databases
  • HTGS, HTCFLIC
  • EST
  • STS
  • GSS
  • UniGene
  • RefSeq
  • HomoloGene
  • Structures databases
  • MMDB
  • CDD
  • Taxonomy

53
Sequence clustering
  • Because a gene can be expressed as mRNA many,
    many times, ESTs ultimately derived from this
    mRNA may be redundant. That is, there may be many
    identical, or similar, copies of the same EST.
    Such redundancy and overlap means that when
    someone searches dbEST for a particular EST, they
    may retrieve a long list of tags, many of which
    may represent the same gene. Searching through
    all of these identical ESTs can be very time
    consuming.
  • To resolve the redundancy and overlap problem,
    NCBI investigators developed the UniGene
    database.
  • UniGene automatically partitions GenBank
    sequences into a non-redundant set of
    gene-oriented clusters.

http//www.ncbi.nlm.nih.gov/About/primer/est.html
54
Sequence clustering
mRNA
Pre-mRNA
Chromosome
cDNA Library clone No. 1 cDNA Library clone No.
2 cDNA Library clone No. 3 cDNA Library clone No.
4 cDNA Library clone No. 5 cDNA Library clone No.
6
55
Sequence clustering
56
Sequence clustering
57
Sequence clustering
UG No.1
UG No.2
UG No.3
UG No.4
58
Introduction of UniGene database
  • UniGene Build Procedure - Transcriptome
    BasedClustering is the process of finding
    subsets of sequences that belong together within
    a larger set. This is done by converting discrete
    similarity scores to Boolean links between
    sequences. That is, two sequences are considered
    linked if their similarity exceeds a threshold.
    UniGene clustering proceeds in several stages,
    with each stage adding less reliable data to the
    results of the preceding stage. This staged
    clustering affords greater control than a more
    egalitarian treatment of all links between
    sequences.

http//www.ira.cinvestav.mx8080/GenBioMolI_05/DOC
UMENTOS/HTML/NCBI/UniGene20Build20Procedures.htm
59
UniGene database
60
Sequence clustering
61
Sequence clustering
62
UniGene database
63
UniGene database
64
UniGene database
65
Brief of Cancer Genome Anatomy Project
66
Brief of Cancer Genome Anatomy Project
  • The goal of CGAP is to determine the gene
    expression profiles of normal, precancer, and
    cancer cells

67
Brief of Cancer Genome Anatomy Project
68
Digital Differential Display
UniGene
dbEST
CGAP
Gene A
EST No.
Gene A
EST No.
Gene B
EST No.
Gene B
EST No.
Tissue A
Tissue B
Gene C
EST No.
Gene C
EST No.
Gene D
EST No.
Gene D
EST No.
Gene A
EST No.
Gene A
EST No.
Gene B
EST No.
Gene B
EST No.
Tissue C
Tissue D
Gene C
EST No.
Gene C
EST No.
Gene D
EST No.
Gene D
EST No.
69
Digital Differential Display
UniGene
dbEST
CGAP
Gene A
EST No.
Gene A
EST No.
Gene B
EST No.
Gene B
EST No.
Tissue A
Tissue B
Gene C
EST No.
Gene C
EST No.
Gene D
EST No.
Gene D
EST No.
70
Digital Differential Display
  • DDD is a tool for comparing EST-based expression
    profiles among the various libraries, or pools of
    libraries, represented in UniGene. These
    comparisons allow the identification of those
    genes that differ among libraries of different
    tissues, making it possible to determine which
    genes may be contributing to a cell's unique
    characteristics, e.g., those that make a muscle
    cell different from a skin or liver cell.
  • Along similar lines, DDD can be used to try to
    identify genes for which the expression levels
    differ between normal, premalignant, and
    cancerous tissues or different stages of
    embryonic development.

71
Digital Differential Display
72
Digital Differential Display
73
Digital Differential Display
74
Digital Differential Display
75
Digital Differential Display
76
Digital Differential Display
77
Digital Differential Display
78
Digital Differential Display
79
Digital Differential Display
80
Molecular Databases
  • Sequences databases
  • HTGS, HTCFLIC
  • EST
  • STS
  • GSS
  • UniGene
  • RefSeq
  • HomoloGene
  • Structures databases
  • MMDB
  • CDD
  • Taxonomy

81
STS
  • In the National Research Council (NRC)
    Committees discussions, there are 2 problems in
    generating genome map by PCR
  • The difficulty of merging mapping data gathered
    by diverse methods in different laboratories into
    a consensus physical map.
  • The logistics and expense of managing the huge
    collections of cloned segments on which the
    mapping data would depend almost absolutely

82
STS
  • Sequence tagged sites (STSs) are short genomic
    landmark sequences. They are operationally unique
    in that they are specifically amplified from the
    genome by PCR amplification. In addition, they
    define a specific location on the genome and are,
    therefore, useful for mapping.
  • In most instances, 200 to 500 b.p. of sequence
    define an STS that is operationally unique in the
    human genome.

83
STS
84
STS
85
Molecular Databases
  • Sequences databases
  • HTGS, HTCFLIC
  • EST
  • STS
  • GSS
  • UniGene
  • RefSeq
  • HomoloGene
  • Structures databases
  • MMDB
  • CDD
  • Taxonomy

86
GSS
  • The genome survey sequences (GSS) division of
    GenBank is similar to the EST division, with the
    exception that most of the sequences are genomic
    in origin, rather than cDNA (mRNA). It should be
    noted that two classes (exon trapped products and
    gene trapped products) may be derived via a cDNA
    intermediate. Care should be taken when analyzing
    sequences from either of these classes, as a
    splicing event could have occurred and the
    sequence represented in the record may be
    interrupted when compared to genomic sequence.
    The GSS division contains (but is not limited to)
    the following types of data
  • random "single pass read" genome survey
    sequences.
  • cosmid/BAC/YAC end sequences
  • exon trapped genomic sequences
  • Alu PCR sequences
  • transposon-tagged sequences

87
GSS
  • Many labs have approached GenBank over the last
    few months, interested in submitting these types
    of sequences. We have been reluctant to introduce
    them via the existing GenBank divisions. On the
    other hand, such sequences are of value to the
    genome community, and require similar processing
    and access tools as have been provided for EST's
    and STS's. GSS sequences will will be used,
    amongst other things, as a framework for the
    mapping and sequencing of genome size pieces
    which will be present in the standard GenBank
    divisions.
  • Sequence data appropriate for the new GSS
    division are, to date, generated by genome labs
    performing human genome sequencing we expect
    that similar data will be generated for other
    model organisms, such as the mouse.

88
Molecular Databases
  • Sequences databases
  • HTGS, HTCFLIC
  • EST
  • STS
  • GSS
  • UniGene
  • RefSeq
  • HomoloGene
  • Structures databases
  • MMDB
  • CDD
  • Taxonomy

89
RefSeq
  • RefSeq biological sequences (also known as
    RefSeqs) are derived from GenBank records but
    differ in that each RefSeq is a synthesis of
    information, not an archived unit of primary
    research data.
  • RefSeq provides a non-redundant framework of
    information to facilitate database searches,
    whether they are searched via genomic location,
    sequence, or text annotation.

90
RefSeq
  • The RefSeq database is the result of data
    extraction from GenBank, curation, and
    computation, combined with extensive
    collaboration with authoritative groups. Each
    molecule is annotated as accurately as possible
    with the organism name, strain (or breed,
    ecotype, cultivar, or isolate), gene symbol for
    that organism, and informative protein name.
  • In cases when a molecule is represented by
    multiple sequences for an organism in GenBank, an
    effort is made by NCBI staff to select the "best"
    sequence to be presented as a RefSeq. The goal is
    to avoid known mutations, sequencing errors,
    cloning artifacts, and erroneous annotation.

91
RefSeq
92
RefSeq
93
RefSeq
94
RefSeq
95
RefSeq
96
RefSeq
97
RefSeq
98
Molecular Databases
  • Sequences databases
  • HTGS, HTCFLIC
  • EST
  • STS
  • GSS
  • UniGene
  • RefSeq
  • HomoloGene
  • Structures databases
  • MMDB
  • CDD
  • Taxonomy

99
HomoloGene
http//www.ncbi.nlm.nih.gov/Education/BLASTinfo/Or
thology.html
100
HomoloGene
101
HomoloGene
  • HomoloGene Build Procedure
  • The input for HomoloGene processing consists of
    the proteins from the input organisms. These
    sequences are compared to one another (using
    blastp) and then are matched up and put into
    groups, using a tree built from sequence
    similarity to guide the process, where closer
    related organisms are matched up first, and then
    further organisms are added as the tree is
    traversed toward the root. The protein alignments
    are mapped back to their corresponding DNA
    sequences, where distance metrics can be
    calculated (e.g. molecular distance, Ka/Ks
    ratio). Sequences are matched using synteny when
    applicable. Remaining sequences are matched up by
    using an algorithm for maximizing the score
    globally, rather than locally, in a bipartite
    matching. Cutoffs on bits per position and Ks
    values are set to prevent unlikely "orthologs"
    from being grouped together. These cutoffs are
    calculated based on the respective score
    distribution for the given groups of organisms.
    Paralogs are identified by finding sequences that
    are closer within species than other species.

102
HomoloGene
103
HomoloGene
104
Molecular Databases
  • Sequences databases
  • HTGS, HTCFLIC
  • EST
  • STS
  • GSS
  • UniGene
  • RefSeq
  • HomoloGene
  • Structures databases
  • MMDB
  • CDD
  • Taxonomy

105
MMDB
  • Molecular modeling database (MMDB) is based on
    the structures within Protein Data Bank (PDB) and
    can be queried using the Entrez search engine, as
    well as via the more direct but less flexible
    structure summary search. Once found, any
    structure of interest can be viewed using Cn3D, a
    piece of software that can be freely downloaded
    for Mac, PC, and UNIX platforms.

106
MMDB
107
MMDB
108
MMDB
109
MMDB
110
MMDB
  • VAST Search is a WWW service which allows you to
    compare the 3-dimensional structure of an input
    protein with other protein structures in NCBI's
    MMDB, using the VAST algorithm.
  • VAST Search is NCBI's structure-structure
    similarity search service. It compares 3D
    coordinates of a newly determined protein
    structure to those in the MMDB/PDB database. VAST
    Search computes a list of structure neighbors
    that you may browse interactively, viewing
    super-positions and alignments by molecular
    graphics.
  • The output of the pre-computed VAST searches is a
    list of structure records, each representing one
    of the non-redundant PDB chain sets (nr-PDB),
    which can also be downloaded. There are four
    clustered subsets of MMDB that compose nr-PDB,
    each consisting of clusters having a preset level
    of sequence similarity.

111
MMDB
112
MMDB
113
Molecular Databases
  • Sequences databases
  • HTGS, HTCFLIC
  • EST
  • STS
  • GSS
  • UniGene
  • RefSeq
  • HomoloGene
  • Structures databases
  • MMDB
  • CDD
  • Taxonomy

114
CDD
  • The collections of domain alignments in the
    conserved domain database (CDD) are imported
    either from two databases outside of the NCBI,
    named Pfam and simple modular architecture
    research tool (SMART) from the NCBI COG
    database from another NCBI collection named
    library of ancient domain (LOAD) and from a
    database curated by the CDD staff.

115
CDD
116
CDD
117
CDD
118
CDD
119
CDD
  • Given a query sequence, CDART shows the
    functional domains that make up a protein and
    then lists proteins with a similar domain
    architecture. The functional domains for a
    sequence are found by RPS-BLAST, which defines a
    domain by a PSSM (Position-specific scoring
    matrices), a set of probabilities of amino acids
    existing at each position of the domain.
    RPS-BLAST is known as a "profile" search, which
    is a sensitive way to look for sequence
    homologues.

120
CDD
121
CDD
122
Molecular Databases
  • Sequences databases
  • HTGS, HTCFLIC
  • EST
  • STS
  • GSS
  • UniGene
  • RefSeq
  • HomoloGene
  • Structures databases
  • MMDB
  • CDD
  • Taxonomy

123
Taxonomy
  • The NCBI Taxonomy database is a curated set of
    names and classifications for all of the
    organisms that are represented in GenBank. When
    new sequences are submitted to GenBank, the
    submission is checked for new organism names,
    which are then classified and added to the
    Taxonomy database.
  • Of the several different ways to build a
    taxonomy, our group maintains a phylogenetic
    taxonomy. In a phylogenetic classification
    scheme, the structure of the taxonomic tree
    approximates the evolutionary relationships among
    the organisms included in the classification.

124
Taxonomy
125
Taxonomy
126
Taxonomy
127
Molecular Databases
  • Sequences databases
  • HTGS, HTCFLIC
  • EST
  • STS
  • GSS
  • UniGene
  • RefSeq
  • HomoloGene
  • Structures databases
  • MMDB
  • CDD
  • Taxonomy

128
Databases at NCBI
  • Databases at NCBI
  • Literature databases
  • PubMed, PubMed Central, Books, OMIM
  • Molecular databases
  • Sequences
  • EST, STS, GSS, HTGS, HTC, FLIC, UniGene, RefSeq,
    HomoloGene
  • Structures
  • MMDB, CDD,
  • Taxonomy
  • Other databases
  • GEO, SKY/CGH

129
Other Databases
  • Other databases
  • GEO
  • SKY/CGH

130
GEO
  • The Gene Expression Omnibus (GEO) project was
    initiated at NCBI in 1999 in response to the
    growing demand for a public repository for data
    generated from high-throughput microarray
    experiments. GEO has a flexible and open design
    that allows the submission, storage, and
    retrieval of many types of data sets, such as
    those from high-throughput gene expression,
    genomic hybridization, and antibody array
    experiments.

131
GEO
132
GEO
133
GEO
134
GEO
135
GEO
136
GEO
137
Other Databases
  • Other databases
  • GEO
  • SKY/CGH

138
SKY/CGH
  • Spectral Karyotyping (SKY) and Comparative
    Genomic Hybidization (CGH) are complementary
    fluorescent molecular cytogenetic techniques that
    have revolutionized the detection of chromosomal
    abnormalities.
  • SKY permits the simultaneous visualization of all
    human or mouse chromosomes in a different color,
    facilitating the detection of chromosomal
    trans-locations and rearrangements.
  • CGH uses the hybridization of differentially
    labeled tumor and reference DNA to generate a map
    of DNA copy number changes in tumor genomes.

139
SKY/CGH
140
SKY/CGH
141
SKY/CGH
142
SKY/CGH
143
SKY/CGH
144
SKY/CGH
145
SKY/CGH
146
SKY/CGH
147
Other Databases
  • Other databases
  • GEO
  • SKY/CGH

148
Databases at NCBI
  • Databases at NCBI
  • Literature databases
  • PubMed, PubMed Central, Books, OMIM
  • Molecular databases
  • Sequences
  • EST, STS, GSS, HTGS, HTC, FLIC, UniGene, RefSeq,
    HomoloGene
  • Structures
  • MMDB, CDD,
  • Taxonomy
  • Other databases
  • GEO, SKY/CGH

149
Entrez
  • Entrez is the text-based search and retrieval
    system used at NCBI for all of the major
    databases, including PubMed, Nucleotide and
    Protein Sequences, Protein Structures, Complete
    Genomes, Taxonomy, OMIM, and many others. Entrez
    is at once an indexing and retrieval system, a
    collection of data from many sources, and an
    organizing principle for biomedical information.

150
Entrez
151
Entrez
152
Entrez
153
Databases at NCBI
154
(No Transcript)
155
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com