National Center for Biotechnology Information - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

National Center for Biotechnology Information

Description:

... published books. New collections. New content. 110101. NCBI. Diabetes ... Gene Barrett, President, American Diabetes Association. Obesity. 110101. NCBI. Books ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 31
Provided by: jimo153
Category:

less

Transcript and Presenter's Notes

Title: National Center for Biotechnology Information


1
National Center for Biotechnology Information
  • Created by Public Law 100-607 in 1988 as part of
    National Library of Medicine at NIH to
  • Create automated systems for knowledge about
    molecular biology, biochemistry, and genetics.
  • Perform research into advanced methods of
    analyzing and interpreting molecular biology
    data.
  • Enable biotechnology researchers and medical care
    personnel to use the systems and methods
    developed.
  • Builders and providers of GenBank, Entrez, Blast,
    PubMed. Online systems host about 1.8 million
    users per day at peak rates of 3,200 web hits a
    second.
  • Center for basic research and training in
    computational biology.

2
NCBI is the most heavily site in biomedicine. Why?
3
Data, the Next Intel Inside
4
Comparative Analysis of Genes Enables Innovation
in Assembly
Human 638 RHACVEVQDEIAFIPNDVYFEKDKQMFHIITGPNMGGKS
TYIRQTGVIVLMAQIGCFVPC 697 Yeast 657
RHPVLEMQDDISFISNDVTLESGKGDFLIITGPNMGGKSTYIRQVGVISL
MAQIGCFVPC 716 E.coli 584 RHPVVEQVLNEPFIANPLNLSPQR
R-MLIITGPNMGGKSTYMRQTALIALMAYIGSYVPA 642
Colon cancer gene sequence
5
Ignoring the Central Dogma in Bioinformatics is
Evidence of Stupid Design
6
It Guides Innovative Assembly of Separate
Resources
GenBank RefSeq Human Genome Bacterial
Genome Virus Genome MMDB PubMed UniGene(s) LocusLi
nk OMIM Taxonomy GEO PopSet BLAST Entrez ePCR Sequ
in
7
Entrez Pathway to Discovery
Term frequency statistics
MEDLINE abstracts
Literature citations in sequence databases
Literature citations in sequence databases
Protein sequences
Nucleotide sequences
Amino acid sequence similarity
Nucleotide sequence similarity
Coding region features
8
Entrez Increases Discovery Space
9
Entrez is Intrinsically Components
  • NCBI C Toolkit enforces common modules in
    internal pipelines, external applications, and
    web components.
  • Entrez has common model for Booleans and
    Summaries. Unique models for deep data.
  • New projects can be easily added or extended.
  • Long standing use of the productotype keeps
    NCBI agile, but (fairly) robust.

10
Web Services Provide Access to Entrez
  • Eutils supports about 5 million service requests
    a day
  • SOAP versions support about 38,000 service
    requests a day (0.8) similar to Amazon
    experience with REST and SOAP
  • Eutils allows outside sites to recreate Entrez
    and NCBI does not know who or why
  • Current NCBI Sequence Viewer uses Eutils itself

11
Harnessing Collective Intelligence in BioMedicine
12
Bibliographic Resources
  • PubMed Citations and Abstracts from publishers
    MEDLINE indexing
  • PMC PubMed Central, full text journal articles
    from publishers (and NIHMS).
  • pPMC portable mirror of PMC content
  • NIHMS NIH Manuscript Submission System for
    Public Access policy
  • NLM DTD Modular DTD for bibliographic material
  • pNIHMS portable NIHMS
  • XML Authoring System MS Word/XML authoring
  • Bookshelf Books and monographs in XML from
    publishers and authors.

13
PubMed Central XML
  • Why XML?
  • Preserves structure of an article
  • Lends itself to intelligent processing
  • Human readable not dependent on technology
  • Is based on SGML, a publishing industry standard
  • Portable and migratable

14
PMC2
  • Content is converted to a standard XML format on
    ingest and then stored and rendered from the one
    format.
  • But, What format?

15
Harvard E-journal Archiving Project
  • The Mellon Foundation funded the Harvard Library
    to study the feasibility of using one DTD for
    archiving journal articles.
  • Harvard commissioned Inera, Inc. for the
    E-Journal Archive DTD Feasibility Study.
  • Conclusion yes, it is feasible, but the right
    DTD does not exist.
  • Recommendations from the study were used in
    modified PMC DTD. NCBI collaborated with Harvard
    to broaden the scope of the new PMC DTD to
    accommodate journals from all disciplines (not
    just life sciences).

16
NLM Journal Article DTDsEstablishing Standards
from Practice
  • Archiving and Interchange DTD
  • Purpose is to preserve journals intellectual
    content
  • Written for
  • ease of conversion (from other DTDs)
  • completeness (union of current journal DTDs)
  • Journal Publishing DTD
  • A subset of the Archiving DTD
  • Written for
  • authoring article content
  • initial tagging of non-XML content
  • creating consistent structures

17
Adoption
  • Highwire Press
  • JStors Electronic Archiving Initiative
  • Australias Commonwealth Scientific and
    Industrial Research Organization
  • PLoS and other PMC contributors
  • Atypon Systems (over 150 titles) and other
    conversion vendors and journal service providers
  • Wiley, Nature, Blackwell common format (PXI)

18
Support
  • Complete documentation for both DTDs available
    online.
  • Established public discussion lists for user
    questions
  • Generic transformations to HTML and PDF forms of
    articles
  • Public XML validation tool
  • Working group of leaders in printing and markup
    industries provides advice on changes to Tagset

19
Portable PubMed Central (pPMC)
  • Provides a local mirror of PMC content
  • Updated daily from NCBI
  • Multiple site archiving
  • Provides rendering of PMC XML into HTML
  • Provides searching through NCBI EUtils
  • Provides for controlled local content in
    presentation
  • Provides first step toward collaborative
    archiving
  • Collaboration with Microsoft on support

20
Whats on the Bookshelf?
21
Diabetes
Obesity
  • Health information with links to molecular data
  • NIDDK advisors on content
  • 10,000 users per month
  • a truly valuable resource Gene Barrett,
    President, American Diabetes Association

22
Books
  • Authoring in MS Word
  • Simple mark-up based on Word styles
  • WordML to XML conversion

23
(No Transcript)
24
BioMedicine Moves to the Web
  • Electronic Authoring and Distribution of Articles
  • Linking and annotating factual data as a side
    effect
  • Ability to mine data and text together
  • Richer data between supported databases
  • High Throughput Biology generates large datasets
    stored in public repositories
  • Common factual data roadmap
  • Greater transparency
  • Greater incidental collaboration for discovery
  • New private sites for discussion on this
    armature
  • New products arise from a public infrastructure

25
Influenza Anti-viral Compounds
26
Influenza Anti-viral Compounds
27
Influzena Anti-viral/Protein Binding
28
Influenza Neuraminidase Gene
29
Influenenza Genome Project
30
Influenza Assembly Archive
Write a Comment
User Comments (0)
About PowerShow.com