Mopping up the Flood of Data with Web Services - PowerPoint PPT Presentation

1 / 95
About This Presentation
Title:

Mopping up the Flood of Data with Web Services

Description:

NIH-funded Projects Underway or Planned at Indiana University ... SIAM Conference on Data Mining 2006. http://www.siam.org/meetings/sdm06/proceedings.htm ... – PowerPoint PPT presentation

Number of Views:228
Avg rating:3.0/5.0
Slides: 96
Provided by: iuin
Category:
Tags: data | flood | mopping | services | siam | web

less

Transcript and Presenter's Notes

Title: Mopping up the Flood of Data with Web Services


1
Mopping up the Flood of Data with Web Services
  • Gary Wiggins
  • Indiana University
  • School of Informatics
  • wiggins_at_indiana.edu

2
Overview of the Talk
  • Data Mining and Knowledge Discovery
  • DMKD in Bioinformatics
  • DMKD in Chemistry
  • Public Chemistry Databases for DMKD
  • Overview of Web Services
  • NIH-funded Projects Underway or Planned at
    Indiana University
  • Educational Opportunities at IU

3
Data Mining and Knowledge Discovery (DMKD)
  • Techniques began to be used around 1989
  • Rapid growth in the mid 1990s, with DMKD field
    emerging around 1995
  • Built on DM tools such as Machine Learning

4
Data Mining
  • One of the steps in Knowledge Discovery
  • Concerned with the actual extraction of knowledge
    from data
  • Efficient and scalable methods for mining
    interesting patterns and knowledge and
    discovering hidden facts contained in large
    databases

5
Data Mining Techniques
  • Efficient classification methods
  • Clustering
  • Outlier analysis
  • Frequent, sequential, and structured pattern
    analysis
  • Visualization and spatial/temporal analysis tools

6
Knowledge Discovery (KD)
  • KD is a nontrivial process of identifying valid,
    novel, potentially useful, and ultimately
    understandable patterns from large collections of
    data.
  • --Fayyad et al., as quoted by Cios and Kurgan
  • The KD process involves
  • Understanding and preparation of the data
  • Data Mining (DM)
  • Verification and application of the discovered
    knowledge

7
Framework for KD Process
  • Steps range from very few, e.g.,
  • Data collection and understanding
  • Data mining
  • Implementation
  • To multi-step models, e.g., Cios and Kurgans
    six-step DMKD process model

8
Cios and Kurgans Six-Step DMKD Process Model
  • Understanding the problem domain
  • Understanding the data
  • Preparation of the data
  • 50 or more of effort spent on this step
  • Data mining
  • Evaluation of the discovered knowledge
  • Using the discovered knowledge

9
General Data Mining/Data Analysis Systems
  • SAS Enterprise Miner
  • SPSS
  • Insightful S-Plus
  • IBM DB2 Intelligent Miner
  • Microsoft SQLServer 2005
  • SGI MLC and MineSet Tree Visualizer
  • Inxight VizServer

10
Trends Major Conferences
  • Knowledge Discovery and Data Mining (KDD) 2005
  • http//www.informatik.uni-trier.de/ley/db/conf/kd
    d/kdd2005.html
  • International Conference on Machine Learning
    (ICML) 2006
  • http//www.icml2006.org/icml2006/technical/accepte
    d.html
  • SIAM Conference on Data Mining 2006
  • http//www.siam.org/meetings/sdm06/proceedings.htm

11
12th Annual SIGKDD International Conference
onKnowledge Discovery and Data Mining,
Philadelphia, August 20-23, 2006
  • Areas of Interest on the Research Track
  • Applications of data mining (biomedicine,
    business, e-commerce, defense)
  • Data and result visualization
  • Data warehousing
  • Data mining for community generation, social
    network analysis and graph-structured data
  • Foundations of data mining
  • Interactive and online data mining
  • KDD framework and process
  • Mining data streams
  • Mining high-dimensional data
  • Mining sensor data
  • Mining text and semi-structured data
  • Mining multi-media data
  • Novel data mining algorithms
  • Privacy and data mining
  • Robust and scalable statistical methods
  • Pre-processing and post-processing for data
    mining
  • Security issues
  • Spatial and temporal data mining

12
Trends in DMKD
  • OLAP (On-Line Analytical Processing)
  • Data warehousing
  • Association rules
  • High Performance DMKD systems
  • Visualization techniques
  • Applications of DM
  • More recently
  • Database products that incorporate DM tools
  • New developments in design and implementation of
    the DMKD process
  • Information visualization products as end-user
    queries
  • XML

13
XML the Key to DM and KD?
  • Or simply a data exchange protocol?
  • Allows for the description and storage of
    structured or semi-structured data and their
    relationships
  • Can be used to exchange data in a
    platform-independent way
  • BUTonly one paper at the major conferences
    listed earlier that dealt with XML

14
XML helps
  • Standardize communication between diverse DM
    tools and databases (I/O procedures)
  • Build standard data repositories sharing data
    between different DM tools that work on different
    software platforms
  • Implement communication protocols between DM
    tools
  • Provide a framework for integration of and
    communication between different DMKD steps

15
Predictive Model Markup Language (PMML) and Other
Tools
  • In conjunction with XML, PMML enables the
    automation of sharing of discovered knowledge
    between different domains and tools
  • XML-RPC
  • SOAP (Simple Object Access Protocol)
  • UDDI
  • OLAP
  • OLE DB-DM

16
Discovery Informatics Definition
  • "Discovery Informatics is the study and practice
    of employing the full spectrum of computing and
    analytical science and technology to the singular
    pursuit of discovering new information by
    identifying and validating patterns in data."
    --William W. Agresti in 2003

17
Discovery Informatics
  • Discovery and Application of Information
  • Data Mining and Machine Learning are two aspects
    of Discovery Informatics.

18
Overview of the Talk
  • Data Mining and Knowledge Discovery
  • DMKD in Bioinformatics
  • DMKD in Chemistry
  • Public Chemistry Databases for DMKD
  • Overview of Web Services
  • NIH-funded Projects Underway or Planned at
    Indiana University
  • Educational Opportunities at IU

19
Trends Bioinformatics Conferences
  • International Conference on Intelligent Systems
    for Molecular Biology (ISMB) 2006
  • http//ismb2006.cbi.cnptia.embrapa.br/papers.html
  • Research in Computational Molecular Biology
    (RECOMB) 2006
  • http//www.informatik.uni-trier.de/ley/db/conf/re
    comb/recomb2006.html
  • Pacific Symposium on Biocomputing (PSB) 2006
  • http//helix-web.stanford.edu/psb06/

20
Main Areas of Research in Bioinformatics
  • Sequence alignment
  • Alternative splicing
  • Microarray analysis
  • Functional analysis
  • Analysis of single nucleotide polymorphisms
    (SNPs)
  • Natural language text analysis

21
DMKD Sessions at Major Bioinformatics Conferences
  • Databases and Data Integration
  • Text Mining and Information Extraction
  • Semantic Webs

22
Data Mining in Bioinformatics (Bajcsy)
  • Data cleaning, data preprocessing, and semantic
    integration of heterogeneous, distributed
    biomedical databases
  • Existing data mining tools for biodata analysis
  • Development of advanced, effective, and scalable
    data mining methods in biodata analysis

23
Preprocessing of Biodata
  • Integration of multiple microarray gene
    experiments must resolve inconsistent labels of
    genes to form a coherent data store.
  • Focus on quantitative quality metrics based on
    analytical and statistical data descriptors and
    on relationships among variables.

24
Semantic Integration of Heterogeneous Biomedical
Databases
  • Combine multiple sources into a coherent data
    store
  • Find semantically equivalent real-world entities
    from several biomedical sources
  • Problems
  • Different labels for the same concept gene_id
    vs. g_id
  • Time asynchronization same gene analyzed at
    multiple development stages

25
Approaches for Semantic Integration of Biodata
  • Construction of integrated biodata warehouses or
    biodatabases
  • Construction of a federation of heterogeneous
    distributed biodatabases
  • Must build up mapping rules or semantic ambiguity
    resolution rules across multiple databases

26
Existing Data Mining Tools for Biodata Analysis-I
  • Sequence Analysis, e.g.,
  • NCBI/BLAST, ClustalW, HMMER, PHYLIP, MEME,
    TRANSFAC, MDScan, Vector NTI, Sequencher,
    MacVector
  • Structure Prediction and Visualization, e.g.,
  • RasMol, Raster3D, Swiss-Model, Scope, MolScript,
    Cn3D

27
Existing Data Mining Tools for Biodata Analysis-II
  • Genome Analysis, e.g.,
  • CAP3, Paracel GenomeAssembler, GenomeScan,
    GeneMark, GenScan, X-Grail, ORF Finder,
    GeneBuilder
  • Pathway Analysis and Visualization, e.g.,
  • KEGG, EcoCyc/MetaCyc, GenMapp
  • Microarray Analysis, e.g.,
  • ScanAlyze/Cluster/TreeView, Scanalytics
    MicroArray Suite, Profiler, Silicon Genetics

28
Biospecific Data Analysis Software Systems
  • Agilent GeneSpring
  • Spotfire
  • Invitrogen VectorNTI

29
Text Mining in Bioinformatics
  • Techniques have progressed from simple
    recognition of terms to extraction of interaction
    relationships in complex sentences.
  • Search objectives have broadened to a range of
    problems, e.g.,
  • Improving homology search
  • Identifying cellular location
  • Deriving genetic network technologies

30
Current Work in Biomedical Text Mining (Cohen and
Hersh)
  • Text mining operates at a finer level of
    granularity than information retrieval and text
    summarization.
  • TM examines relationships between specific kinds
    of information contained within and between
    documents.
  • Areas of active research
  • Named entity recognition (genes, proteins, etc.)
  • Text classification
  • Synonym and abbreviation extraction
  • Relationship extraction
  • Hypothesis generation
  • Integrated frameworks

31
Systems Biology
  • Requires a shift in focus from genes and proteins
    to the systems structure and dynamics
  • Four key properties
  • System structures
  • System dynamics
  • Control method
  • Design method
  • Systems Biology Markup Language (SBML) and CellML

32
iSpecies.org
33
Overview of the Talk
  • Data Mining and Knowledge Discovery
  • DMKD in Bioinformatics
  • DMKD in Chemistry
  • Public Chemistry Databases for DMKD
  • Overview of Web Services
  • NIH-funded Projects Underway or Planned at
    Indiana University
  • Educational Opportunities at IU

34
Data Mining in Chemistry
  • Modern experimentation (whether classical or
    high-throughput) should be based on the
    productive interplay of statistical techniques
    (design-of-experiments), molecular modeling as
    well as cheminformatics.
  • --Ulrich S. Schubert

35
Session on Integration of Informatics and
Knowledge Management Informatics
  • Integration of Informatics at the Systems Level
    and at the Data LevelChris L. Waller, Ph.D.,
    Director, World Wide Chemistry Informatics,
    Pfizer Global Research Development 
  • Integrated Knowledge Management at Bayer
    HealthCare Pharmacophore Informatics William J.
    Scott, Ph.D., Team Leader, Department for
    Chemistry Research, Bayer Pharmaceuticals
    Corporation
  • Building a Knowledge Enabled OrganizationCory R.
    Brouwer, Ph.D., Associate Director, Knowledge
    Management Informatics, Pfizer Global Research
    Development
  • Knowledge Management Building a Knowledge
    Enabled OrganizationVictor Lobanov, Ph.D.,
    Principal Scientist, MDI, Johnson Johnson
    Pharmaceutical RD
  • 10th Annual Cheminformatics Conference, May
    23-16, 2006, Philadelphia

36
Impact of HTS and Combinatorial Chemistry Research
  • Most impact in
  • the pharmaceutical industry
  • medical research
  • catalyst research
  • More recently
  • polymer and materials research.

37
Diversity of Data Mining in Chemistry
  • On 5/7/2006 there were 4072 references to either
    datamining or data mining in Chemical
    Abstracts.
  • 3416 different index terms were assigned to those
    records.
  • 2772 used 1-5 times (81)
  • 298 used 6-10 times (9)
  • 103 used 11-15 times (3)
  • 71 used 16-20 times (2)
  • 38 used 21-25 times (1)
  • 24 used 26-30 times (1)
  • 110 for 31-480 times (3)
  • Most frequent co-term bioinformatics with 480
    hits or 12 of the occurrences

38
SFS graph
39
Components of the Semantic Web for Chemistry
  • XML eXtensible Markup Language
  • RDF Resource Description Framework
  • RSS Rich Site Summary
  • Dublin Core allows metadata-based newsfeeds
  • OWL for ontologies
  • BPEL4WS for workflow and web services
  • Murray-Rust et al. Org. Biomol. Chem. 2004, 2,
    3192-3203.

40
Chemical Markup Language (CML)
  • Much of the semantics in a chemical article can
    be supported by CML
  • Molecules
  • Structures
  • Reactions and reaction schemes
  • Spectra (including annotations)
  • Physicochemical data
  • XML dictionaries and lexicons provide linguistic
    and semantic support for markup
  • Will lead to quicker authoring and higher quality
    of embedded structures and data through machine
    validation

41
Key Factors in the Success of the Chemical
Semantic Web
  • Institutional Repositories services deployed and
    supported at an institutional level to offer
    dissemination management, stewardship, and where
    appropriate, long-term preservation of both the
    intellectual work created by an institutional
    community and the records of the intellectual and
    cultural life of the institutional community
  • Open Access Movement

42
Knowledge-Driven Bioinformatics Enhanced with
Chemistry
43
Text Mining (Banville)
  • In the pharmaceutical field, it is ideally the
    marriage of biological and chemical information
    that needs to be the ultimate focus of text data
    mining applications.
  • Problems
  • Lack of universal publication standards for
    identifying each unique chemical entity
  • Selective indexing policies of AI services
  • Need to understand how chemical structures link
    to biological processes

44
OSCAR3 Service
  • Open Java source application under development by
    Peter Murray-Rust group at Cambridge (Not
    published yet)
  • Extracts chemical information from either a
    paragraph of experimental data or a full paper
    (e.g. melting points, infra-red and NMR data, and
    mass spectral information)
  • Produces an XML instance highlighting the
    chemical information with an Extensible
    Stylesheet Language (XSL) file
  • At IU, we are attaching SOAP input/output engine
    for a web service based on OSCAR3.

45
OSCAR at Work in the Future
46
Semantic Scholars Grid I
Local MDStore
Local HarvestStore
Fetch MD and Documents
Gatherer
Query and Get list
Indexer
Analyzer
Index all Local MD
Run filter such asOSCAR onharvested MDand
documents Store new MD
47
Semantic Scholars Grid II
Local MDStore
Plug-in
SynchronizeSSG andforeign MD
Updater
CommunityTools
SSGViewer
Instant Citation Index etc.
Update local MD Control foreign interactions View
all MD Access Community Tools
Update and viewforeign MD
48
Chemical Datamining Software
  • SureChem
  • http//surechem.reeltwo.com/
  • CLiDE
  • Recognizes structures, reactions, and text
  • http//www.simbiosys.ca/clide/
  • OSCAR
  • OSCAR1 to check experimental data
  • http//www.ch.cam.ac.uk/magnus/checker.html
  • http//www.rsc.org/Publishing/ReSourCe/AuthorGuide
    lines/AuthoringTools/ExperimentalDataChecker/
  • CSR (Chemical Structure Reconstruction)
  • http//www.scai.fraunhofer.de/uploads/media/MZ-ERC
    IM05_04.pdf
  • MDL DocSearchcombines MDLs Isentris platform
    and EMCs Documentum

49
Overview of the Talk
  • Data Mining and Knowledge Discovery
  • DMKD in Bioinformatics
  • DMKD in Chemistry
  • Public Chemistry Databases for DMKD
  • Overview of Web Services
  • NIH-funded Projects Underway or Planned at
    Indiana University
  • Educational Opportunities at IU

50
ChemDB http//cdb.ics.uci.edu/CHEM/Web/
51
ChEBI, Chemical Entities of Biological Interest
  • Dictionary of molecular entities focused on small
    chemical compounds
  • Features an ontological classification, showing
    the relationships between molecular entities or
    classes of entities and their parents and/or
    children

52
Vioxx Entry in ChEBI
53
The IUPAC International Chemical Identifier
(InChI)
  • Open source, non-proprietary, public-domain
    identifier for chemicals
  • String of characters that uniquely represent a
    molecular substance
  • Independent of the way the chemical structure is
    drawn
  • Enables reliable structure recognition and easy
    linking of diverse data compilations
  • Accepts as input MOLfiles (or SDfiles) and CML
    files
  • Download the program to your computer at
  • http//www.iupac.org/inchi/license.html

54
Generation of InChI for Vioxx with wInChI
55
Vioxx Entry in PubChem Compounds Found with InChI
56
Vioxx Bioassay Data in PubChem
57
Vioxx PubChem Link to External Sources of
Information
58
PubChem Link to Elsevier MDL
  • DiscoveryGate www.discoverygate.com
  • provides access to integrated scientific content
    from databases, journal articles, patent
    publications and reference works
  • information providers include Elsevier,
    Thomson-Derwent, FIZ CHEMIE, the U.S. FDA, Prous
    Science and Thieme
  • MDL Compound Index (the master list of substances
    included in DiscoveryGate data sources) now
    exceeds 14 million unique chemical structures
    with the addition of 5 million chemical
    structures from the PubChem database.

59
The Elsevier MDL/NIH Link via PubChem and
DiscoveryGate
  • Cross-indexes PubChem to the Compound Index
    hosted on Elsevier MDLs DiscoveryGate platform
  • MDL added 5 million structures from PubChem to
    their index, resulting in over 14 million unique
    chemical structures
  • Links go both ways
  • Can move from biological data in PubChem to
    bioactivity, chemical sourcing, synthetic
    methodology, and EHS data in DiscoveryGate
    sources

60
Elsevier MDLs xPharm
  • Comprehensive set of records linking
  • Agents (compounds) (2300)
  • Targets (600)
  • Disorders (450)
  • Principles that govern their interactions (180)
  • Answers questions such as
  • What targets are associated with control of blood
    pressure?
  • What adverse effects are associated with
    monoamine oxidase inhibitors?

61
Web Guide for Essential Cheminformatics Resources
  • http//www.chembiogrid.org
  • http//www.indiana.edu/cheminfo/cicc/

62
ChemBioGrid Chemical Databases
63
Overview of the Talk
  • Data Mining and Knowledge Discovery
  • DMKD in Bioinformatics
  • DMKD in Chemistry
  • Public Chemistry Databases for DMKD
  • Overview of Web Services
  • NIH-funded Projects Underway or Planned at
    Indiana University
  • Educational Opportunities at IU

64
Web Services Overview
  • What are Web Services?
  • A distributed invocation system built on Grid
    computing
  • Independent of platform and programming language
  • Built on existing Web standards
  • A service oriented architecture with
  • Interfaces based on Internet protocols
  • Messages in XML (except for binary data
    attachments)

65
Web Services for Chemistry Problems
  • Performance and scalability
  • Proprietary data
  • Competition from high-performance desktop
    applications
  • -- Geoff Hutchison, its a puzzle blog,
    2005-01-05
  • ALSO
  • Lack of a substantial body of trustworthy Open
    Access databases
  • Non-standard chemical data formats (over 40 in
    regular use and requiring normalization to one
    another)

66
DM Internet Toolbox Architecture
67
Overview of the Talk
  • Data Mining and Knowledge Discovery
  • DMKD in Bioinformatics
  • DMKD in Chemistry
  • Public Chemistry Databases for DMKD
  • Overview of Web Services
  • NIH-funded Projects Underway or Planned at
    Indiana University
  • Educational Opportunities at IU

68
Indiana University Planned Projectshttp//www.ch
embiogrid.org
  • Application of a Grid-based distributed data
    architecture to chemistry
  • Development of tools for HTS data analysis and
    virtual screening
  • Database for quantum mechanical simulation data
  • Chemical prototype projects
  • Novel routes to enzymatic reaction mechanisms
  • Mechanism-based drug design
  • Data-inquiry-based development of new methods in
    natural product synthesis

69
Web Services for Chemistry at IU
70
NCI Developmental Therapeutics Program (DTP)
  • Downloadable data
  • In vitro 60 cell line results
  • in vitro anti-HIV results
  • Yeast assay
  • 200,000 chemical structures
  • molecular targets
  • microarray data
  • Or search the database at
  • http//dtp.nci.nih.gov/docs/dtp_search.html

71
IU Database of NIH DTP Data
  • Contains over 200,000 chemical structures tested
    in 60 cellular assays from different human tumor
    cell lines
  • Also includes microarray assay profiles for the
    untreated cell lines (14,000 datapoints)
  • A local PostgreSQL database containing the data
    that is exposed as a web service
  • Using workflows and complex SQL queries, we can
    do advanced data mining that exploits the
    chemical, biological and genomic information for
    particular audiences (chemists, biologists, etc)

72
Mining the NIH DTP database
14,000 gene expression values
60 cell lines
Cell lines can be clustered based on gene
expression similarity
200,000 compounds
Compounds can be clustered based on similarity of
profile across cell lines, or by chemical
structure fingerprint similarity
73
Use of Taverna at IU
  • A protein implicated in tumor growth is supplied
    to the docking program (in this case HSP90 taken
    from the PDB 1Y4 complex)
  • The workflow employs our local NIH DTP database
    service to search 200,000 compounds tested in
    human tumor cellular assays for similar
    structures to the ligand.
  • Client portlets are used to browse these
    structures
  • Once docking is complete, the user visualizes the
    high-scoring docked structures in a portlet using
    the JMOL applet.
  • Similar structures are filtered for drugability,
    and are automatically passed to the OpenEye FRED
    docking program for docking into the target
    protein.
  • A 2D structure is supplied for input into the
    similarity search (in this case, the extracted
    bound ligand from the PDB IY4 complex)
  • Correlation of docking results and biological
    fingerprints across the human tumor cell lines
    can help identify potential mechanisms of action
    of DTP compounds

74
Taverna Workflow
Workflow definition
Available web services (WSDL)
Visual depiction of workflow
75
Taverna in Action
76
CGL Contributions to CICC
  • Build Web/Grid services for connecting
  • Data sources
  • Applications (simulation, data mining, data
    assimilation, imaging, etc).
  • Computing resources
  • Information services.
  • Third party tool evaluation
  • Workflow (Taverna)
  • Grid tools Globus and Condor (for interacting
    with TeraGrid)
  • Building standards-based Web portal environments.
  • OGCE grid portal project
  • JSR 168 Java standards.
  • This activity will begin in earnest over the
    summer.

77
Digital Chemistry (BCI) Clustering Service Methods
78
Local Web Service Methods for WWMM of PMRs Group
79
More Services
80
ToxTree
  • An in silico toxicology prediction suite
  • Based on the CDK toolkit
  • Built on CML
  • Released as OpenSource under the GPL
  • Standalone PC software
  • User Manual http//ecb.jrc.it/DOCUMENTS/QSAR/TOXT
    REE/toxTree_user_manual.pdf

81
ToxTree Service
  • An open Java source application by Nina
    Jeliazkova
  • Estimates toxic hazard by applying a decision
    tree approach.
  • Encodes the Cramer scheme
  • (Cramer G. M., R. A. Ford, R. L. Hall,
    Estimation of Toxic Hazard - A Decision Tree
    Approach, J. Cosmet. Toxicol., Vol.16, pp.
    255-276, Pergamon Press, 1978)
  • Could be applied to datasets from various
    compatible file types.
  • We are converting this GUI application to a
    text-based web service

82
Overview of the Talk
  • Data Mining and Knowledge Discovery
  • DMKD in Bioinformatics
  • DMKD in Chemistry
  • Public Chemistry Databases for DMKD
  • Overview of Web Services
  • NIH-funded Projects Underway or Planned at
    Indiana University
  • Educational Opportunities at IU

83
Chemoinformatics Education at IU
  • School of Informatics degree programs
  • BS, MS, PhD
  • Programs offered at both the Indianapolis (IUPUI)
    and Bloomington (IUB) campuses

84
Other Educational Activities
  • Graduate Certificate Program in Chemical
    Informatics (4 courses by Distance Education)
  • I571 Chemical Information Technology (3 cr.)
  • I572 Computational Chemistry and Molecular
    Modeling (3 cr.)
  • I573 Programming Techniques for Chemical and Life
    Science Informatics (3 cr.)
  • I553 Independent Study in Chemical Informatics (3
    cr.)
  • I571 as CIC Courseshare offering w. Michigan
  • Experiments with teleconferencing as a distance
    education tool

85
PhD in Informatics
  • Began in August 2005
  • Tracks
  • bioinformatics chemical informatics health
    informatics human-computer interaction design
    social and organizational informatics
  • Under development
  • complex systems, networks, modeling and
    simulation cybersecurity discovery and
    application of information logical and
    mathematical foundations music informatics

86
Graduate Enrollment Chemo-, Laboratory, Bio-,
Health Informatics
87
Software/DBs Used in the Program
  • Company Products and/or (Target
    Area)
  • ArrgusLab (Molecular modeling)
  • Digital Chemistry Toolkit (Clustering)
  • Cambridge Cryst Data Ctr Cambridge Structrual DB
    GOLD
  • CambridgeSoft ChemDraw Ultra
  • Chemical Abstracts Service SciFinder Scholar
  • Chemaxon Marvin (and other software)
  • Daylight Chemical Info System Toolkit
  • FIZ Karlsruhe Inorganic Crystal Structure DB
  • IO-Informatics Sentient
  • MDLCrossFire Beilstein and Gmelin
  • OpenEye Toolkit (and other software)
  • Sage Informatics ChemTK
  • Serena Software PCMODEL
  • Spotfire DecisionSite
  • STN International STN Express with Discover
    (Anal Ed)
  • Wavefunction Spartan

88
Closing quote
  • The future of chemistry depends on the
    automated analysis of chemical knowledge,
    combining disparate data sources in a single
    resource, . . . which can be analysed using
    computational techniques to assess and build on
    these data.
  • Townsend et al. Org. Biomol. Chem. 2004, 2, 3299.

89
We all need help when overloaded!
90
Bibliography
  • Agresti, William W. Discovery informatics.
    Communications of the ACM 2003, 46(8), 25-28.
  • Bajcsy, Peter Han, Jiawei Liu, Lei Yang,
    Jiong. "Survey of bio-data analysis from a data
    mining perspective." Chapter 2 in Wang, Jason T.
    L. Zaki, Mohammed J. Toivonen, Hannu T. T.
    Shasha, Dennis (eds.), Data Mining in
    Bioinformatics. London, Springer Verlag, 2005,
    pp.9-39.
  • Banville, Debra L. Mining chemical structural
    information from the drug literature. Drug
    Discovery Today, 2006, 11(1/2), 35-42.
  • Cios, Krzysztof J. Kurgan, Lukasz A. Trends in
    data mining and knowledge discovery. Chapter 1
    in Pal, N.R. Jain, L.C. Teodoresku, N. (eds.),
    Knowledge Discovery in Advanced Information
    Systems. N.Y., Springer Verlag, 2002, pp. 1-26.
  • Cohen, Aaron M. Hersh, W.illiam R. "A survey of
    current work in biomedical text mining."
    Briefings in Bioinformatics March 2005, 6(1),
    57-71.
  • Corbett, Peter T. Murray-Rust, Peter Day, Nick
    E. Townsend, Joe A. Rzepa, Henry S.
    Chemistry publications in CML. Abstracts of
    Papers, 231st ACS National Meeting, Atlanta, GA,
    United States, March 26-30, 2006, CINF-055.

91
Bibliography
  • Fayyad, U.M. Piatesky-Shapiro, G. Smyth, P.
    Uthurusamy, R. Advances in Knowledge Discovery
    and Data Mining. AAAi/MIT Press, 1996. (quoted by
    Cios and Kurgan)
  • Gardner, Stephen P. Ontologies and semantic data
    integration. Drug Discovery Today 2005 10(14),
    1001-1007.
  • Guha, R. Howard, M.T. Hutchison, G.R.
    Murray-Rust, P. Rzepa, H. Steinbeck, C Wegner,
    J. Willighagen, E.L. The Blue
    ObeliskInteroperability in chemical
    informatics. Journal of Chemical Information and
    Modeling 2006 Web Release Date 22-Feb-2006 DOI
    10.1021/ci050400b
  • Holliday, Gemma L. Murray-Rust, Peter Rzepa,
    Henry S. Chemical Markup, XML, and the World
    Wide Web. 6. CMLReact, an XML Vocabulary for
    Chemical Reactions. Journal of Chemical
    Information and Modeling 2006, 46(1), 145-157.
  • Jónsdóttir, S.O. Jorgensen, F.S. Brunak, S.
    Prediction methods and databases within
    chemoinformatics emphasis on drugs and drug
    candidates. Bioinformatics 2005 May 15 21(10)
    2145-60.

92
Bibliography
  • Karthikeyan, M. Krishnan, S. Pankey, Anil
    Kumar. Harvesting chemical information from the
    Internet using a distributed approach
    ChemXtreme. Journal of Chemical Information and
    Modeling. DOI 10.1021/ci050329.
  • Krallinger, Martin Alonso-Allende Erhardt,
    Ramon Valencia, Alfonso. Text-mining approaches
    in molecular biology and biomedicine. Drug
    Discovery Today 2005, 10(6), 439-445.Scherf Uwe,
    Ross Douglas T., Waltham Mark, Smith Lawrence H.,
    Lee Jae K., Tanabe Lorraine, Kohn Kurt W.,
    Reinhold William C., Myers Timothy G., Andrews
    Darren T., Scudiero Dominic A., Eisen Michael B.,
    Sausville Edward A., Pommier Yves, Botstein
    David, Brown Patrick O., Weinstein John N. A
    gene expression database for the molecular
    pharmacology of cancer. Nature Genetics 2000,
    24, 236-244.
  • Schubert, Ulrich S. "Materials informatics from
    data to knowledge towards integrated escience
    approaches." QSAR Combinatorial Science 2005,
    24(1), 5. (NB Entire issue is devoted to this
    topic.)
  • SIAM International Conference on Data Mining
    (5th 2005 Newport Beach, CA) Data Mining
    Proceedings. Kargupta, Hillol et al., eds. SIAM,
    2005.
  • Torr-Brown, Sheryl. Advances in knowledge
    management for pharmaceutical research and
    development. Current Opinion in Drug Discovery
    Development 2005, 8(3), 316-322.

93
Web 2.0
  • Social Software allows group interactions
  • Enables groups to form and organize themselves
  • Examples
  • Wikis
  • Blogs
  • RSS (now found on chemistry.org)
  • Podcasting/Coursecasting
  • Webcasting/Webinars
  • Flickr
  • Jybe
  • FURL

94
FURL (Frame Uniform Resource Locater)
  • For archiving and sharing of web pages
  • Furler can capture the pages for a discussion
    group
  • Tracks useful pages for a discussion
  • http//www.furl.net/home.jsp

95
Jybe (Join Your Browser with Everyone)
  • Collaboration and communication in real time with
    IE and Firefox
  • Screen-sharing AND editing
  • Privacy protected must be invited
  • Upload documents to convert to html
  • http//www.jybe.com
Write a Comment
User Comments (0)
About PowerShow.com