Mopping up the Flood of Data with Web Services - PowerPoint PPT Presentation

1 / 95
About This Presentation

Mopping up the Flood of Data with Web Services


NIH-funded Projects Underway or Planned at Indiana University ... SIAM Conference on Data Mining 2006. ... – PowerPoint PPT presentation

Number of Views:228
Avg rating:3.0/5.0
Slides: 96
Provided by: iuin
Tags: data | flood | mopping | services | siam | web


Transcript and Presenter's Notes

Title: Mopping up the Flood of Data with Web Services

Mopping up the Flood of Data with Web Services
  • Gary Wiggins
  • Indiana University
  • School of Informatics

Overview of the Talk
  • Data Mining and Knowledge Discovery
  • DMKD in Bioinformatics
  • DMKD in Chemistry
  • Public Chemistry Databases for DMKD
  • Overview of Web Services
  • NIH-funded Projects Underway or Planned at
    Indiana University
  • Educational Opportunities at IU

Data Mining and Knowledge Discovery (DMKD)
  • Techniques began to be used around 1989
  • Rapid growth in the mid 1990s, with DMKD field
    emerging around 1995
  • Built on DM tools such as Machine Learning

Data Mining
  • One of the steps in Knowledge Discovery
  • Concerned with the actual extraction of knowledge
    from data
  • Efficient and scalable methods for mining
    interesting patterns and knowledge and
    discovering hidden facts contained in large

Data Mining Techniques
  • Efficient classification methods
  • Clustering
  • Outlier analysis
  • Frequent, sequential, and structured pattern
  • Visualization and spatial/temporal analysis tools

Knowledge Discovery (KD)
  • KD is a nontrivial process of identifying valid,
    novel, potentially useful, and ultimately
    understandable patterns from large collections of
  • --Fayyad et al., as quoted by Cios and Kurgan
  • The KD process involves
  • Understanding and preparation of the data
  • Data Mining (DM)
  • Verification and application of the discovered

Framework for KD Process
  • Steps range from very few, e.g.,
  • Data collection and understanding
  • Data mining
  • Implementation
  • To multi-step models, e.g., Cios and Kurgans
    six-step DMKD process model

Cios and Kurgans Six-Step DMKD Process Model
  • Understanding the problem domain
  • Understanding the data
  • Preparation of the data
  • 50 or more of effort spent on this step
  • Data mining
  • Evaluation of the discovered knowledge
  • Using the discovered knowledge

General Data Mining/Data Analysis Systems
  • SAS Enterprise Miner
  • SPSS
  • Insightful S-Plus
  • IBM DB2 Intelligent Miner
  • Microsoft SQLServer 2005
  • SGI MLC and MineSet Tree Visualizer
  • Inxight VizServer

Trends Major Conferences
  • Knowledge Discovery and Data Mining (KDD) 2005
  • http//
  • International Conference on Machine Learning
    (ICML) 2006
  • http//
  • SIAM Conference on Data Mining 2006
  • http//

12th Annual SIGKDD International Conference
onKnowledge Discovery and Data Mining,
Philadelphia, August 20-23, 2006
  • Areas of Interest on the Research Track
  • Applications of data mining (biomedicine,
    business, e-commerce, defense)
  • Data and result visualization
  • Data warehousing
  • Data mining for community generation, social
    network analysis and graph-structured data
  • Foundations of data mining
  • Interactive and online data mining
  • KDD framework and process
  • Mining data streams
  • Mining high-dimensional data
  • Mining sensor data
  • Mining text and semi-structured data
  • Mining multi-media data
  • Novel data mining algorithms
  • Privacy and data mining
  • Robust and scalable statistical methods
  • Pre-processing and post-processing for data
  • Security issues
  • Spatial and temporal data mining

Trends in DMKD
  • OLAP (On-Line Analytical Processing)
  • Data warehousing
  • Association rules
  • High Performance DMKD systems
  • Visualization techniques
  • Applications of DM
  • More recently
  • Database products that incorporate DM tools
  • New developments in design and implementation of
    the DMKD process
  • Information visualization products as end-user
  • XML

XML the Key to DM and KD?
  • Or simply a data exchange protocol?
  • Allows for the description and storage of
    structured or semi-structured data and their
  • Can be used to exchange data in a
    platform-independent way
  • BUTonly one paper at the major conferences
    listed earlier that dealt with XML

XML helps
  • Standardize communication between diverse DM
    tools and databases (I/O procedures)
  • Build standard data repositories sharing data
    between different DM tools that work on different
    software platforms
  • Implement communication protocols between DM
  • Provide a framework for integration of and
    communication between different DMKD steps

Predictive Model Markup Language (PMML) and Other
  • In conjunction with XML, PMML enables the
    automation of sharing of discovered knowledge
    between different domains and tools
  • SOAP (Simple Object Access Protocol)
  • UDDI
  • OLAP

Discovery Informatics Definition
  • "Discovery Informatics is the study and practice
    of employing the full spectrum of computing and
    analytical science and technology to the singular
    pursuit of discovering new information by
    identifying and validating patterns in data."
    --William W. Agresti in 2003

Discovery Informatics
  • Discovery and Application of Information
  • Data Mining and Machine Learning are two aspects
    of Discovery Informatics.

Overview of the Talk
  • Data Mining and Knowledge Discovery
  • DMKD in Bioinformatics
  • DMKD in Chemistry
  • Public Chemistry Databases for DMKD
  • Overview of Web Services
  • NIH-funded Projects Underway or Planned at
    Indiana University
  • Educational Opportunities at IU

Trends Bioinformatics Conferences
  • International Conference on Intelligent Systems
    for Molecular Biology (ISMB) 2006
  • http//
  • Research in Computational Molecular Biology
    (RECOMB) 2006
  • http//
  • Pacific Symposium on Biocomputing (PSB) 2006
  • http//

Main Areas of Research in Bioinformatics
  • Sequence alignment
  • Alternative splicing
  • Microarray analysis
  • Functional analysis
  • Analysis of single nucleotide polymorphisms
  • Natural language text analysis

DMKD Sessions at Major Bioinformatics Conferences
  • Databases and Data Integration
  • Text Mining and Information Extraction
  • Semantic Webs

Data Mining in Bioinformatics (Bajcsy)
  • Data cleaning, data preprocessing, and semantic
    integration of heterogeneous, distributed
    biomedical databases
  • Existing data mining tools for biodata analysis
  • Development of advanced, effective, and scalable
    data mining methods in biodata analysis

Preprocessing of Biodata
  • Integration of multiple microarray gene
    experiments must resolve inconsistent labels of
    genes to form a coherent data store.
  • Focus on quantitative quality metrics based on
    analytical and statistical data descriptors and
    on relationships among variables.

Semantic Integration of Heterogeneous Biomedical
  • Combine multiple sources into a coherent data
  • Find semantically equivalent real-world entities
    from several biomedical sources
  • Problems
  • Different labels for the same concept gene_id
    vs. g_id
  • Time asynchronization same gene analyzed at
    multiple development stages

Approaches for Semantic Integration of Biodata
  • Construction of integrated biodata warehouses or
  • Construction of a federation of heterogeneous
    distributed biodatabases
  • Must build up mapping rules or semantic ambiguity
    resolution rules across multiple databases

Existing Data Mining Tools for Biodata Analysis-I
  • Sequence Analysis, e.g.,
    TRANSFAC, MDScan, Vector NTI, Sequencher,
  • Structure Prediction and Visualization, e.g.,
  • RasMol, Raster3D, Swiss-Model, Scope, MolScript,

Existing Data Mining Tools for Biodata Analysis-II
  • Genome Analysis, e.g.,
  • CAP3, Paracel GenomeAssembler, GenomeScan,
    GeneMark, GenScan, X-Grail, ORF Finder,
  • Pathway Analysis and Visualization, e.g.,
  • KEGG, EcoCyc/MetaCyc, GenMapp
  • Microarray Analysis, e.g.,
  • ScanAlyze/Cluster/TreeView, Scanalytics
    MicroArray Suite, Profiler, Silicon Genetics

Biospecific Data Analysis Software Systems
  • Agilent GeneSpring
  • Spotfire
  • Invitrogen VectorNTI

Text Mining in Bioinformatics
  • Techniques have progressed from simple
    recognition of terms to extraction of interaction
    relationships in complex sentences.
  • Search objectives have broadened to a range of
    problems, e.g.,
  • Improving homology search
  • Identifying cellular location
  • Deriving genetic network technologies

Current Work in Biomedical Text Mining (Cohen and
  • Text mining operates at a finer level of
    granularity than information retrieval and text
  • TM examines relationships between specific kinds
    of information contained within and between
  • Areas of active research
  • Named entity recognition (genes, proteins, etc.)
  • Text classification
  • Synonym and abbreviation extraction
  • Relationship extraction
  • Hypothesis generation
  • Integrated frameworks

Systems Biology
  • Requires a shift in focus from genes and proteins
    to the systems structure and dynamics
  • Four key properties
  • System structures
  • System dynamics
  • Control method
  • Design method
  • Systems Biology Markup Language (SBML) and CellML

Overview of the Talk
  • Data Mining and Knowledge Discovery
  • DMKD in Bioinformatics
  • DMKD in Chemistry
  • Public Chemistry Databases for DMKD
  • Overview of Web Services
  • NIH-funded Projects Underway or Planned at
    Indiana University
  • Educational Opportunities at IU

Data Mining in Chemistry
  • Modern experimentation (whether classical or
    high-throughput) should be based on the
    productive interplay of statistical techniques
    (design-of-experiments), molecular modeling as
    well as cheminformatics.
  • --Ulrich S. Schubert

Session on Integration of Informatics and
Knowledge Management Informatics
  • Integration of Informatics at the Systems Level
    and at the Data LevelChris L. Waller, Ph.D.,
    Director, World Wide Chemistry Informatics,
    Pfizer Global Research Development 
  • Integrated Knowledge Management at Bayer
    HealthCare Pharmacophore Informatics William J.
    Scott, Ph.D., Team Leader, Department for
    Chemistry Research, Bayer Pharmaceuticals
  • Building a Knowledge Enabled OrganizationCory R.
    Brouwer, Ph.D., Associate Director, Knowledge
    Management Informatics, Pfizer Global Research
  • Knowledge Management Building a Knowledge
    Enabled OrganizationVictor Lobanov, Ph.D.,
    Principal Scientist, MDI, Johnson Johnson
    Pharmaceutical RD
  • 10th Annual Cheminformatics Conference, May
    23-16, 2006, Philadelphia

Impact of HTS and Combinatorial Chemistry Research
  • Most impact in
  • the pharmaceutical industry
  • medical research
  • catalyst research
  • More recently
  • polymer and materials research.

Diversity of Data Mining in Chemistry
  • On 5/7/2006 there were 4072 references to either
    datamining or data mining in Chemical
  • 3416 different index terms were assigned to those
  • 2772 used 1-5 times (81)
  • 298 used 6-10 times (9)
  • 103 used 11-15 times (3)
  • 71 used 16-20 times (2)
  • 38 used 21-25 times (1)
  • 24 used 26-30 times (1)
  • 110 for 31-480 times (3)
  • Most frequent co-term bioinformatics with 480
    hits or 12 of the occurrences

SFS graph
Components of the Semantic Web for Chemistry
  • XML eXtensible Markup Language
  • RDF Resource Description Framework
  • RSS Rich Site Summary
  • Dublin Core allows metadata-based newsfeeds
  • OWL for ontologies
  • BPEL4WS for workflow and web services
  • Murray-Rust et al. Org. Biomol. Chem. 2004, 2,

Chemical Markup Language (CML)
  • Much of the semantics in a chemical article can
    be supported by CML
  • Molecules
  • Structures
  • Reactions and reaction schemes
  • Spectra (including annotations)
  • Physicochemical data
  • XML dictionaries and lexicons provide linguistic
    and semantic support for markup
  • Will lead to quicker authoring and higher quality
    of embedded structures and data through machine

Key Factors in the Success of the Chemical
Semantic Web
  • Institutional Repositories services deployed and
    supported at an institutional level to offer
    dissemination management, stewardship, and where
    appropriate, long-term preservation of both the
    intellectual work created by an institutional
    community and the records of the intellectual and
    cultural life of the institutional community
  • Open Access Movement

Knowledge-Driven Bioinformatics Enhanced with
Text Mining (Banville)
  • In the pharmaceutical field, it is ideally the
    marriage of biological and chemical information
    that needs to be the ultimate focus of text data
    mining applications.
  • Problems
  • Lack of universal publication standards for
    identifying each unique chemical entity
  • Selective indexing policies of AI services
  • Need to understand how chemical structures link
    to biological processes

OSCAR3 Service
  • Open Java source application under development by
    Peter Murray-Rust group at Cambridge (Not
    published yet)
  • Extracts chemical information from either a
    paragraph of experimental data or a full paper
    (e.g. melting points, infra-red and NMR data, and
    mass spectral information)
  • Produces an XML instance highlighting the
    chemical information with an Extensible
    Stylesheet Language (XSL) file
  • At IU, we are attaching SOAP input/output engine
    for a web service based on OSCAR3.

OSCAR at Work in the Future
Semantic Scholars Grid I
Local MDStore
Local HarvestStore
Fetch MD and Documents
Query and Get list
Index all Local MD
Run filter such asOSCAR onharvested MDand
documents Store new MD
Semantic Scholars Grid II
Local MDStore
SynchronizeSSG andforeign MD
Instant Citation Index etc.
Update local MD Control foreign interactions View
all MD Access Community Tools
Update and viewforeign MD
Chemical Datamining Software
  • SureChem
  • http//
  • CLiDE
  • Recognizes structures, reactions, and text
  • http//
  • OSCAR1 to check experimental data
  • http//
  • http//
  • CSR (Chemical Structure Reconstruction)
  • http//
  • MDL DocSearchcombines MDLs Isentris platform
    and EMCs Documentum

Overview of the Talk
  • Data Mining and Knowledge Discovery
  • DMKD in Bioinformatics
  • DMKD in Chemistry
  • Public Chemistry Databases for DMKD
  • Overview of Web Services
  • NIH-funded Projects Underway or Planned at
    Indiana University
  • Educational Opportunities at IU

ChemDB http//
ChEBI, Chemical Entities of Biological Interest
  • Dictionary of molecular entities focused on small
    chemical compounds
  • Features an ontological classification, showing
    the relationships between molecular entities or
    classes of entities and their parents and/or

Vioxx Entry in ChEBI
The IUPAC International Chemical Identifier
  • Open source, non-proprietary, public-domain
    identifier for chemicals
  • String of characters that uniquely represent a
    molecular substance
  • Independent of the way the chemical structure is
  • Enables reliable structure recognition and easy
    linking of diverse data compilations
  • Accepts as input MOLfiles (or SDfiles) and CML
  • Download the program to your computer at
  • http//

Generation of InChI for Vioxx with wInChI
Vioxx Entry in PubChem Compounds Found with InChI
Vioxx Bioassay Data in PubChem
Vioxx PubChem Link to External Sources of
PubChem Link to Elsevier MDL
  • DiscoveryGate
  • provides access to integrated scientific content
    from databases, journal articles, patent
    publications and reference works
  • information providers include Elsevier,
    Thomson-Derwent, FIZ CHEMIE, the U.S. FDA, Prous
    Science and Thieme
  • MDL Compound Index (the master list of substances
    included in DiscoveryGate data sources) now
    exceeds 14 million unique chemical structures
    with the addition of 5 million chemical
    structures from the PubChem database.

The Elsevier MDL/NIH Link via PubChem and
  • Cross-indexes PubChem to the Compound Index
    hosted on Elsevier MDLs DiscoveryGate platform
  • MDL added 5 million structures from PubChem to
    their index, resulting in over 14 million unique
    chemical structures
  • Links go both ways
  • Can move from biological data in PubChem to
    bioactivity, chemical sourcing, synthetic
    methodology, and EHS data in DiscoveryGate

Elsevier MDLs xPharm
  • Comprehensive set of records linking
  • Agents (compounds) (2300)
  • Targets (600)
  • Disorders (450)
  • Principles that govern their interactions (180)
  • Answers questions such as
  • What targets are associated with control of blood
  • What adverse effects are associated with
    monoamine oxidase inhibitors?

Web Guide for Essential Cheminformatics Resources
  • http//
  • http//

ChemBioGrid Chemical Databases
Overview of the Talk
  • Data Mining and Knowledge Discovery
  • DMKD in Bioinformatics
  • DMKD in Chemistry
  • Public Chemistry Databases for DMKD
  • Overview of Web Services
  • NIH-funded Projects Underway or Planned at
    Indiana University
  • Educational Opportunities at IU

Web Services Overview
  • What are Web Services?
  • A distributed invocation system built on Grid
  • Independent of platform and programming language
  • Built on existing Web standards
  • A service oriented architecture with
  • Interfaces based on Internet protocols
  • Messages in XML (except for binary data

Web Services for Chemistry Problems
  • Performance and scalability
  • Proprietary data
  • Competition from high-performance desktop
  • -- Geoff Hutchison, its a puzzle blog,
  • ALSO
  • Lack of a substantial body of trustworthy Open
    Access databases
  • Non-standard chemical data formats (over 40 in
    regular use and requiring normalization to one

DM Internet Toolbox Architecture
Overview of the Talk
  • Data Mining and Knowledge Discovery
  • DMKD in Bioinformatics
  • DMKD in Chemistry
  • Public Chemistry Databases for DMKD
  • Overview of Web Services
  • NIH-funded Projects Underway or Planned at
    Indiana University
  • Educational Opportunities at IU

Indiana University Planned Projectshttp//
  • Application of a Grid-based distributed data
    architecture to chemistry
  • Development of tools for HTS data analysis and
    virtual screening
  • Database for quantum mechanical simulation data
  • Chemical prototype projects
  • Novel routes to enzymatic reaction mechanisms
  • Mechanism-based drug design
  • Data-inquiry-based development of new methods in
    natural product synthesis

Web Services for Chemistry at IU
NCI Developmental Therapeutics Program (DTP)
  • Downloadable data
  • In vitro 60 cell line results
  • in vitro anti-HIV results
  • Yeast assay
  • 200,000 chemical structures
  • molecular targets
  • microarray data
  • Or search the database at
  • http//

IU Database of NIH DTP Data
  • Contains over 200,000 chemical structures tested
    in 60 cellular assays from different human tumor
    cell lines
  • Also includes microarray assay profiles for the
    untreated cell lines (14,000 datapoints)
  • A local PostgreSQL database containing the data
    that is exposed as a web service
  • Using workflows and complex SQL queries, we can
    do advanced data mining that exploits the
    chemical, biological and genomic information for
    particular audiences (chemists, biologists, etc)

Mining the NIH DTP database
14,000 gene expression values
60 cell lines
Cell lines can be clustered based on gene
expression similarity
200,000 compounds
Compounds can be clustered based on similarity of
profile across cell lines, or by chemical
structure fingerprint similarity
Use of Taverna at IU
  • A protein implicated in tumor growth is supplied
    to the docking program (in this case HSP90 taken
    from the PDB 1Y4 complex)
  • The workflow employs our local NIH DTP database
    service to search 200,000 compounds tested in
    human tumor cellular assays for similar
    structures to the ligand.
  • Client portlets are used to browse these
  • Once docking is complete, the user visualizes the
    high-scoring docked structures in a portlet using
    the JMOL applet.
  • Similar structures are filtered for drugability,
    and are automatically passed to the OpenEye FRED
    docking program for docking into the target
  • A 2D structure is supplied for input into the
    similarity search (in this case, the extracted
    bound ligand from the PDB IY4 complex)
  • Correlation of docking results and biological
    fingerprints across the human tumor cell lines
    can help identify potential mechanisms of action
    of DTP compounds

Taverna Workflow
Workflow definition
Available web services (WSDL)
Visual depiction of workflow
Taverna in Action
CGL Contributions to CICC
  • Build Web/Grid services for connecting
  • Data sources
  • Applications (simulation, data mining, data
    assimilation, imaging, etc).
  • Computing resources
  • Information services.
  • Third party tool evaluation
  • Workflow (Taverna)
  • Grid tools Globus and Condor (for interacting
    with TeraGrid)
  • Building standards-based Web portal environments.
  • OGCE grid portal project
  • JSR 168 Java standards.
  • This activity will begin in earnest over the

Digital Chemistry (BCI) Clustering Service Methods
Local Web Service Methods for WWMM of PMRs Group
More Services
  • An in silico toxicology prediction suite
  • Based on the CDK toolkit
  • Built on CML
  • Released as OpenSource under the GPL
  • Standalone PC software
  • User Manual http//

ToxTree Service
  • An open Java source application by Nina
  • Estimates toxic hazard by applying a decision
    tree approach.
  • Encodes the Cramer scheme
  • (Cramer G. M., R. A. Ford, R. L. Hall,
    Estimation of Toxic Hazard - A Decision Tree
    Approach, J. Cosmet. Toxicol., Vol.16, pp.
    255-276, Pergamon Press, 1978)
  • Could be applied to datasets from various
    compatible file types.
  • We are converting this GUI application to a
    text-based web service

Overview of the Talk
  • Data Mining and Knowledge Discovery
  • DMKD in Bioinformatics
  • DMKD in Chemistry
  • Public Chemistry Databases for DMKD
  • Overview of Web Services
  • NIH-funded Projects Underway or Planned at
    Indiana University
  • Educational Opportunities at IU

Chemoinformatics Education at IU
  • School of Informatics degree programs
  • BS, MS, PhD
  • Programs offered at both the Indianapolis (IUPUI)
    and Bloomington (IUB) campuses

Other Educational Activities
  • Graduate Certificate Program in Chemical
    Informatics (4 courses by Distance Education)
  • I571 Chemical Information Technology (3 cr.)
  • I572 Computational Chemistry and Molecular
    Modeling (3 cr.)
  • I573 Programming Techniques for Chemical and Life
    Science Informatics (3 cr.)
  • I553 Independent Study in Chemical Informatics (3
  • I571 as CIC Courseshare offering w. Michigan
  • Experiments with teleconferencing as a distance
    education tool

PhD in Informatics
  • Began in August 2005
  • Tracks
  • bioinformatics chemical informatics health
    informatics human-computer interaction design
    social and organizational informatics
  • Under development
  • complex systems, networks, modeling and
    simulation cybersecurity discovery and
    application of information logical and
    mathematical foundations music informatics

Graduate Enrollment Chemo-, Laboratory, Bio-,
Health Informatics
Software/DBs Used in the Program
  • Company Products and/or (Target
  • ArrgusLab (Molecular modeling)
  • Digital Chemistry Toolkit (Clustering)
  • Cambridge Cryst Data Ctr Cambridge Structrual DB
  • CambridgeSoft ChemDraw Ultra
  • Chemical Abstracts Service SciFinder Scholar
  • Chemaxon Marvin (and other software)
  • Daylight Chemical Info System Toolkit
  • FIZ Karlsruhe Inorganic Crystal Structure DB
  • IO-Informatics Sentient
  • MDLCrossFire Beilstein and Gmelin
  • OpenEye Toolkit (and other software)
  • Sage Informatics ChemTK
  • Serena Software PCMODEL
  • Spotfire DecisionSite
  • STN International STN Express with Discover
    (Anal Ed)
  • Wavefunction Spartan

Closing quote
  • The future of chemistry depends on the
    automated analysis of chemical knowledge,
    combining disparate data sources in a single
    resource, . . . which can be analysed using
    computational techniques to assess and build on
    these data.
  • Townsend et al. Org. Biomol. Chem. 2004, 2, 3299.

We all need help when overloaded!
  • Agresti, William W. Discovery informatics.
    Communications of the ACM 2003, 46(8), 25-28.
  • Bajcsy, Peter Han, Jiawei Liu, Lei Yang,
    Jiong. "Survey of bio-data analysis from a data
    mining perspective." Chapter 2 in Wang, Jason T.
    L. Zaki, Mohammed J. Toivonen, Hannu T. T.
    Shasha, Dennis (eds.), Data Mining in
    Bioinformatics. London, Springer Verlag, 2005,
  • Banville, Debra L. Mining chemical structural
    information from the drug literature. Drug
    Discovery Today, 2006, 11(1/2), 35-42.
  • Cios, Krzysztof J. Kurgan, Lukasz A. Trends in
    data mining and knowledge discovery. Chapter 1
    in Pal, N.R. Jain, L.C. Teodoresku, N. (eds.),
    Knowledge Discovery in Advanced Information
    Systems. N.Y., Springer Verlag, 2002, pp. 1-26.
  • Cohen, Aaron M. Hersh, W.illiam R. "A survey of
    current work in biomedical text mining."
    Briefings in Bioinformatics March 2005, 6(1),
  • Corbett, Peter T. Murray-Rust, Peter Day, Nick
    E. Townsend, Joe A. Rzepa, Henry S.
    Chemistry publications in CML. Abstracts of
    Papers, 231st ACS National Meeting, Atlanta, GA,
    United States, March 26-30, 2006, CINF-055.

  • Fayyad, U.M. Piatesky-Shapiro, G. Smyth, P.
    Uthurusamy, R. Advances in Knowledge Discovery
    and Data Mining. AAAi/MIT Press, 1996. (quoted by
    Cios and Kurgan)
  • Gardner, Stephen P. Ontologies and semantic data
    integration. Drug Discovery Today 2005 10(14),
  • Guha, R. Howard, M.T. Hutchison, G.R.
    Murray-Rust, P. Rzepa, H. Steinbeck, C Wegner,
    J. Willighagen, E.L. The Blue
    ObeliskInteroperability in chemical
    informatics. Journal of Chemical Information and
    Modeling 2006 Web Release Date 22-Feb-2006 DOI
  • Holliday, Gemma L. Murray-Rust, Peter Rzepa,
    Henry S. Chemical Markup, XML, and the World
    Wide Web. 6. CMLReact, an XML Vocabulary for
    Chemical Reactions. Journal of Chemical
    Information and Modeling 2006, 46(1), 145-157.
  • Jónsdóttir, S.O. Jorgensen, F.S. Brunak, S.
    Prediction methods and databases within
    chemoinformatics emphasis on drugs and drug
    candidates. Bioinformatics 2005 May 15 21(10)

  • Karthikeyan, M. Krishnan, S. Pankey, Anil
    Kumar. Harvesting chemical information from the
    Internet using a distributed approach
    ChemXtreme. Journal of Chemical Information and
    Modeling. DOI 10.1021/ci050329.
  • Krallinger, Martin Alonso-Allende Erhardt,
    Ramon Valencia, Alfonso. Text-mining approaches
    in molecular biology and biomedicine. Drug
    Discovery Today 2005, 10(6), 439-445.Scherf Uwe,
    Ross Douglas T., Waltham Mark, Smith Lawrence H.,
    Lee Jae K., Tanabe Lorraine, Kohn Kurt W.,
    Reinhold William C., Myers Timothy G., Andrews
    Darren T., Scudiero Dominic A., Eisen Michael B.,
    Sausville Edward A., Pommier Yves, Botstein
    David, Brown Patrick O., Weinstein John N. A
    gene expression database for the molecular
    pharmacology of cancer. Nature Genetics 2000,
    24, 236-244.
  • Schubert, Ulrich S. "Materials informatics from
    data to knowledge towards integrated escience
    approaches." QSAR Combinatorial Science 2005,
    24(1), 5. (NB Entire issue is devoted to this
  • SIAM International Conference on Data Mining
    (5th 2005 Newport Beach, CA) Data Mining
    Proceedings. Kargupta, Hillol et al., eds. SIAM,
  • Torr-Brown, Sheryl. Advances in knowledge
    management for pharmaceutical research and
    development. Current Opinion in Drug Discovery
    Development 2005, 8(3), 316-322.

Web 2.0
  • Social Software allows group interactions
  • Enables groups to form and organize themselves
  • Examples
  • Wikis
  • Blogs
  • RSS (now found on
  • Podcasting/Coursecasting
  • Webcasting/Webinars
  • Flickr
  • Jybe
  • FURL

FURL (Frame Uniform Resource Locater)
  • For archiving and sharing of web pages
  • Furler can capture the pages for a discussion
  • Tracks useful pages for a discussion
  • http//

Jybe (Join Your Browser with Everyone)
  • Collaboration and communication in real time with
    IE and Firefox
  • Screen-sharing AND editing
  • Privacy protected must be invited
  • Upload documents to convert to html
  • http//
Write a Comment
User Comments (0)