Introduction to Entrez The Life Science Search Engine - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Introduction to Entrez The Life Science Search Engine

Description:

Find the book in the booklist. Search for the disease you need information about ... A search for diabetes will give 500 hits in OMIM ... – PowerPoint PPT presentation

Number of Views:79
Avg rating:5.0/5.0
Slides: 39
Provided by: ubU3
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Entrez The Life Science Search Engine


1
Introduction toEntrez The Life Science Search
Engine
  • Helen Hed
  • Umeå universitetsbibliotek
  • http//www.ub.umu.se/tjanste/hehe/

2
Contents
  • An overview of Entrez
  • From a librarians point of view
  • The databases in Entrez
  • Contents and relationships
  • Short presentation of each database
  • The individual databases and PubMed
  • From data to literature or the other way
    around?
  • Summary / other useful resources

3
PubMed
  • Help-text on searching PubMed
  • This topic was covered yesterday
  • Just a reminder that all relevant literature is
    indexed in PubMed if it has to do with data in
    any of the Entrez databases.

4
Everything is linked to PubMed?
  • All entries in all of the Entrez databases are
    connected to relevant entries in PubMed
  • If there is one or more articles in PubMed about
    data that is to be found in one of the Entrez
    databases - these are linked to relevant data
    entries in the databases

5
(No Transcript)
6
(No Transcript)
7
Relationship between Entrez databases
8
Entrez portal
  • Portal to all NCBI genome related resources
    including PubMed
  • A search here will show how info on a subject is
    available in all or some of the separate databases

9
Bookshelf
  • 50 e-books are available from the PubMed
    starting page (mars 2006)
  • 5 are different handbooks and guides to using
    Entrez
  • Some are textbooks that are used here at Umeå
    University
  • Alberts Mol Biol of the Cell (4th ed)
  • Griffith Introd to Genetic Analysis (7th ed)

10
Genes and Disease
  • Information about genetic diseases
  • an e-book
  • Available from NCBIs Bookshelf
  • Find the book in the booklist
  • Search for the disease you need information about
  • Organised by type of disease and by chromosome

11
OMIM
  • Online Mendelian Inheritance in Man
  • A database with information about human genes,
    genetic diceases and phenotypes
  • Links to nucleotide, protein, structure and
    genome.
  • Now also OMIA animals
  • Info about OMIA

12
OMIM - exemple
  • A search for diabetes will give 500 hits in
    OMIM
  • Click on rekord number -gt full record is
    displayed with a table of contents and links out
    in the blue field
  • Also link to clinical synopsis
  • And to gene map

13
  • Table of contents - in this case for the record
    about diabetes in the previous slide
  • Headings are dependent on available information

14
OMIM example SOD
  • This is the entry for SOD1 (147450)
  • means gene with known sequence
  • Information about symbols and numbering system
    and morehttp//www.ncbi.nlm.nih.gov/entrez/query.
    fcgi?dbOMIM
  • Notice that table of contents does not have
    identical headings for all entries.

15
FAQs
  • http//www.ncbi.nlm.nih.gov/Omim/omimfaq.html
  • There is a help-file or FAQ-file for all the
    databases
  • Can be found in the table of contents on the left
    side of the screen

16
Map viewer organism map
  • Search for organism and find the right chromosome
  • Graphical view http//www.ncbi.nlm.nih.gov/mapview
    /MVgraph.html?
  • FAQ at http//www.ncbi.nlm.nih.gov/mapview/static/
    MapViewerHelp.html

17
Genome
  • Contains data on the nucleotide sequences that
    constitutes a whole genome.
  • There are both whole chromosomes and pieces of
    chromosomes from about 800 different organisms
  • Gives a graphical overview of genomes and
    chromosomes (called sequence maps).

18
Genome - cont
  • Limit search to one species by
  • Write species name in search box
  • Enter the Limit organism
  • In general search will find one (1) entry per
    chromosome (inkluding mitochondrial genome)

19
Genome - cont
  • Alternative entrypoints
  • Choose organism group and type of genome in
    left-hand column
  • Example
  • Eucaryota Genome
  • Choose Homo sapiens
  • Or choose Drosophila melanogaster

20
Genbank - Nucleotide
  • Database of nucleotide sequences from 130 000
    organisms
  • Gathered from DDBJ (DNA Data Bank of Japan), EMBL
    (European Molecular Biology Laboratory), and NCBI
    (National Center for Biotechnology Information).
  • Updated daily
  • Example of a sequence document

21
Nucleotide
  • Contains data about nucleotide sequences (both
    DNA and RNA) from different databases. The
    biggest is Genbank.
  • In every sequence document there are references
    to the article where the sequence was first
    presented.
  • Genetic code scheme. Translation from DNA
    molecule to amino acid.
  • Amino acid abbreviations.

22
Nucleotide - search example
  • Enter name of protein (and maybe more search
    terms) for which you want to know the nucleotide
    sequence
  • insulin AND homo sapiens
  • Gives 9000 hits in the database
  • RefSeq 1181
  • RefSeq the best non-redundant and
    comprehensive collection of naturally occurring
    DNA, RNA, and protein molecules for major
    organisms

23
Search result display of
  • Hits are sorted into groups
  • All ReSeq are collected in one of the lists
  • http//www.ncbi.nlm.nih.gov/RefSeq/

24
RefSeq accession numbers
  • Curated sequences
  • RefSeq accession numbers can be distinguished
    from GenBank accessions by their prefix distinct
    format of 2 charactersunderbar. For example,
    the RefSeq protein accession number for citrate
    synthase is NP_015325.
  • Each sequence has a unique statusPredicted,
    provisional, reviewd, validated, et c.
    http//www.ncbi.nlm.nih.gov/RefSeq/key.html

25
Part of list describing accession formats
26
Nucleotide example cont.
  • Enter accession number - NM_021694
  • Will produce a single hit in the database
  • Click on accession number
  • Displays the complete record
  • At the end of this record you will find the amino
    acid sequence translated part of gene and the
    DNA sequence

27
Reports
  • Different formats in which the information is
    available

28
Links
  • How many links depends on how much data exists
    and in what dbs
  • Wellstudied proteins have many links other can
    have very few

29
Changing database
  • From Nucleotide database to Protein database
  • In the complete record - find CDS protein ID -
    NP_067726.3 -gt
  • By clicking on the link we will move to the
    protein database

30
Protein
  • Contains protein sequence data collected from
    different databases.
  • In every sequence document there is a reference
    to the article where the discovery of the
    sequence was decribed.

31
Structure
  • Contains visalization possibilities for
    3D-representation of structures
  • For viewing the structures you need a plug-in
    that can be downloaded for free (Cn3D)

32
Example - Azurin
  • 53 hits
  • click on the link above the short description or
    on MMDB new page where 3D structure of the
    molecule can be viewed.
  • Tutorial and software
  • http//www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.sh
    tml

33
Gene
  • Search for genes that are defined in either the
    Nucleotide database or in Map Viewer.
  • Search by
  • Free text (human muscular dystrophy)
  • Gene name (BRCA1sym)
  • Chromosome number (YCHR AND humanORGN)
  • EC number (1.9.3.1EC)
  • etc

34
Blast
  • BLAST stands for Basic Local Alignment Search
    Tool http//www.ncbi.nlm.nih.gov/BLAST/
  • A tool for comparing sequences
  • Introduction to BLAST
  • Blast program selection guide
  • There is more than one version

35
Other ways of accessing data
  • DDBJ (DNA Data Bank of Japan),
  • EMBL (European Molecular Biology Laboratory)
    Ensembl very neat genome browser
  • NCBI (National Center for Biotechnology
    Information) Entrez.
  • These three are the largest
  • They exchange data daily
  • Most data end up in these

36
Web resources
  • Looking for an easy way to search for literature
  • Google Scholar a new Google-tool with less junk
  • Scirus specialized search engine
  • OAIster searches digital archives (mostely free
    material)
  • Use Google to find a definition Definegenetics

37
Practice excersises
  • How many chromosomes does Anopheles gambiae have?
  • Search for curated sequences (RefSeq) on actin
    mRNA for a mouse.
  • Find a 3D picture of a stress protein.
  • Search for documents on human albumin and find
    the information about who submitted the
    information.
  • I dont know anything about porfyria but would
    like to know what it is. Is it heritable? And if
    so, on what chromosome is the gene for porphyria
    situated?

38
Homework ?
  • Renata C. Geer (2003) Entrez Making use of its
    power. Briefings in Bioinformatics, vol.4, no.2,
    pp.179-184.
  • http//www.ncbi.nlm.nih.gov/entrez/query/static/he
    lp/entrez_tutorial_BIB.pdf
Write a Comment
User Comments (0)
About PowerShow.com