Entrez Retrieval System ... - PowerPoint PPT Presentation

About This Presentation
Title:

Entrez Retrieval System ...

Description:

Part 3: Essentials Why Are Small Molecules Important? Constituents to all macromolecules (DNA, RNA, protein, carbohydrates, etc.) Serve as cofactors and signaling ... – PowerPoint PPT presentation

Number of Views:183
Avg rating:3.0/5.0
Slides: 54
Provided by: X718
Category:

less

Transcript and Presenter's Notes

Title: Entrez Retrieval System ...


1
Part 3
Essentials
2
Global Entrez Search Page
AllFilter
3
Overall Goal An on-line resource providing
comprehensive information on the biological
activities of small molecules
4
Why Are Small Molecules Important?
  • Constituents to all macromolecules(DNA, RNA,
    protein, carbohydrates, etc.)
  • Serve as cofactors and signaling molecules to
    thousands of proteins
  • The chemistry part of biochemistry
  • Most drug entities and drug types are small
    molecules
  • Most biomarkers used in clinical chemistry are
    small molecules

5
PubChem Databases and Tools
http//pubchem.ncbi.nlm.nih.gov/
6
The Molecular Libraries RoadmapAn Integrated
Initiative
Technology Development
Screening
Informatics
Chem-informatics Research Centers
Molecular Libraries Screening Centers Network (
M L S C N )
Assay Development
Instrumentation
Compound Repository (MLSMR)
Chemical Diversity
Predictive ADMET
7
PubChem
  • Repository for small molecules and bioactivity
    assay data
  • Part of Entrez search and linking system
  • Links to other NCBI databases, e.g.,
  • PubMed, MeSH
  • Protein structures (MMDB)
  • Protein/Nucleotide sequences (GenPept/GenBank)
  • Contains complete chemical structures
  • Standardized for uniformity
  • Small set of computed properties
  • Structure similarity searching

8
Other Depositors to PubChem
and more
9
PubChem Birds Eye View
Depositors
PubChemSubstance
PubChemBioAssays
PubChemCompound
Chemical Structure Similarity
10
How does data get into PubChem?
11
PubChem integration in Entrez
VAST Structure Similarity
Term Frequency Statistics
Literature
3D Structures
Bioactivity Assay Results
Small Molecule Structures
Chemical Structure Similarity
Protein Sequences
Activity Profile Similarity
12
(No Transcript)
13
Primary Database
14
Depositor Data
  • No Global rules or standards
  • Based on organizational needs
  • Lots of data overlap
  • Often based on individual Scientist preferences
  • PubChem accepts data from many organizations
  • Previously unseen data representation
  • Combinatorial explosion of ways for drawing the
    same structure

15
Redundancy, mixtures
Mixture
16
Derivative Database
17
Chemical Structures may be representedin many
different ways
18
Chemical Structures may be representedin many
different ways
19
Substance
Compound
20
Substance
Compound
Unknown E/Z isomers
Unknown stereo
Knownstereochemistry
21
(No Transcript)
22
PubChem Compound Processing
  • Chemical Data Verification
  • Atom description (label, element?)
  • Functional group clean-up
  • Atom valence verification to prevent non-sense
  • Normalize and Standardize
  • Valence-Bond canonicalize (for Tautomer
    invariance)
  • Aromaticity detection and self-consistency
  • Stereochemistry detection
  • Explicit hydrogen assignment
  • Calculation
  • 2-D Coordinate generation
  • Image Depictions
  • Fingerprints
  • IUPAC Name
  • SMILES, InChI, Hash Codes
  • xLogP, TPSA, HBD, HBA, MW, MF

23
Chemical Structure Sanitization
  • Chemical Structures that fail Sanitization
  • Are not part of the aggregated PubChem Compound
    Database
  • Still searchable via PubChem Substance Database
  • Keeps the PubChem Compound Database Clean for
    Chemical Informatic Analysis
  • Collapses structures represented in various ways
    into a uniform, identical representation

24
Compound for mixture
Component compounds
25
Components of a mixture
26
Substance vs. Compound
Substance summary
Compound summary
27
Substance vs. Compound
28
Examples of queries
  • 200MW
  • dopamineCompleteSynonym
  • 300500MW
  • pcsubstance structure"Filter
  • ca"Element AND 300500MW AND
    "chemidplus"SourceName

"InChI1/Ca.3H2O/h31H2/q 2/p-3/fCa.3HO/h31
h/qm3-1"InChI
  • "lipinski"Filter AND "antineoplastic
    agents"PharmAction
  • Lipinski rule of 5 -- a molecule is likely to
    be bioactive if it has
  • not more than 5 hydrogen bond donors (OH and NH
    groups)
  • lt10 hydrogen bond acceptors (N or O)
  • a molecular weight under 500
  • a LogP under 5

29
Examples of PubChem Index Fields
All ALL -- All of the following fields are
searched default search field. UidUID -- The
integer represents SID for PCSubstance database.
By default, an integer without a field alias is
recognized as a UID. Same as SID.Filter
Filter -- Limits the records to various indexed
filters. ActiveAid AA -- Active BioAssay
identifier, integer. ActiveAidCount AC, ACNT
-- bioassays where tested active.
AtomChiralCount ACC, ACCNT -- Total count of
chiral atoms in a given compound.BioAssayID
BAID, AID -- BioAssay identifier.BondChiralCoun
t BCC, BCCNT - Number of chiral bonds.Comment
CMT -- Substance or bioassay comment.
CompleteSynonym CSYN, CSYNO exactly matching
name for substance/compound. CompoundID CID --
Compound identifier, integer. DepositDate DDAT,
DEPDAT -- Deposition timestamp for a
substance. Element ELMT, EL -- Chemical element
in a substance/compound. ExactMass EMAS,
EXMASS-- The calculated mass of an ion or a
molecule containing most likely isotopic
composition for a single random molecule,
corresponding to mass of most intense
ion/molecule peak in a MS spec. A real
number.HeavyAtomCount HAC, HACNT -- Atom count
in a compound except hydrogen, integer.
HydrogenBondAcceptorCount HBAC, HBACNT --
Hydrogen bond acceptors for a compound, integer.
HydrogenBondDonorCount HBDC, HBDCNT --
Hydrogen bond donors for a compound, integer.
InChI inchi -- IUPAC International Chemical
Identifier.
30
Examples of PubChem Index Fields, contd.
IUPACName UPAC, IUPAC -- Standard IUPAC name
for compound. MeSHDescription MHDMeSHTerm
MSHT, MESHT -- Medical Subject Heading
term.MeSHTreeNode MSHN, MESHTN -- Medical
Subject Heading tree node (tree
structures).MolecularWeight MW, MWT, MOLWT --
Mass of a molecule calculated using the average
mass of each element weighted for its natural
isotopic abundance. E.g., Carbon has two natural
isotopes 12 and 13 with relative abundances of
98.9 and 1.1 to yield an average mass of 12.011
g/mol. A real number. MonoisotopicMass MMAS,
MIMASS -- Mass of a molecule calculated using
the mass of the most abundant isotope of each
element. E.g., Carbon has a monoisotopic mass of
12.000 g/mol. A real number. PharmAction PHMA,
PHARMA -- MeSH pharmacological actions
heading.RotatableBondCount RBC, RBCNT Number
of rotatable bonds. SourceCategory SRCC,
SRCCAT, SRCCATG -- Depositor categories.SourceID
SRID, SRCID -- Depositor's external
id.SourceName SRC, SRCNAM, SRCNAME -- official
depositor name.SubstanceID SID -- Substance
ID. Same as UID.Synonym SYNO -- Synonyms for
substance. TautomerCount TC, TCNT, TTMC --
Possible tautomer count for each given structure,
200.  TotalFormalCharge TFC, CHG, CHRG --
Total formula charge.TPSA TPSA -- Topological
Polar Surface Area.XLogP XLGP, LOGP
31
Preview/Index Tab
32
History Tab
Substances of MW 300-500Da having antineoplastic
properties and obeying Lipinski rule of 5
33
(No Transcript)
34
Property Report
35
SDF format
36
(No Transcript)
37
(No Transcript)
38
(No Transcript)
39
Medical Subject Headings (MeSH)
  • MeSH is the National Library of Medicine's
    controlled vocabulary thesaurus.
  • Consists of sets of terms naming descriptors in a
    hierarchical and alphabetic structure, e.g.
  • "Mental Disorders, Pharmacological action,
  • Catecholamine hormones , etc.
  • Permits searching at various levels of
    specificity
  • MeSH thesaurus is used for indexing articles for
    the MEDLINE/PubMed database
  • MeSH is continually updated
  • PubChem assigns MeSH headings to Compound records

40
Primary Database
  • Contains bioactivity screens of chemical
    substances described in PubChem Substance
  • Provides searchable descriptions of each
    bioassay, including descriptions of the
    conditions and readouts specific to a screening
    protocol
  • Depositor decides on data definitions and
    interpretation
  • Data can be plotted as graphs of statistical
    histograms
  • Cross-indexed to other Entrez databases

41
(No Transcript)
42
(No Transcript)
43
(No Transcript)
44
(No Transcript)
45
(No Transcript)
46
(No Transcript)
47
Click to view structure
48
(No Transcript)
49
NCBI FTP gtgt PubChem Folder
50
Entrez PubChem Help and Tabs
51
Brief Summary
  • PubChem is part of NIH Molecular Libraries
    Roadmap for Medicine Initiative
  • PubChem consists of 3 databases, Substance,
    Compound and BioAssay, and a poweful
    Structure Search engine
  • Substance samples
  • Compounds calculated structures, properties
  • PubChem is integrated into NCBIs Entrez Search
    and Linking system of databases
  • Records are indexed using number of terms
  • Records are linked to each other and to other
    databases at NCBI

52
For More Information
53
For More Information
E-mail addresses
  • General Help info_at_ncbi.nlm.nih.gov
  • BLAST blast-help_at_ncbi.nlm.nih.gov
  • Telephone
  • Voice 1 (301) 496-2475 Fax     1
    (301) 480-9241

The (free!) NCBI Newsletter
http//www.ncbi.nih.gov/About/newsletter.html
The NCBI Handbook
Follow the link from the NCBI Home Page
The NCBI Education Page
http//www.ncbi.nih.gov/Education/index.html
Write a Comment
User Comments (0)
About PowerShow.com