Mopping up the Flood of Data with Web Services presentation

About This Presentation

Transcript and Presenter's Notes

Title: Mopping up the Flood of Data with Web Services

1
Mopping up the Flood of Data with Web Services

Gary Wiggins
Indiana University
School of Informatics
wiggins_at_indiana.edu

2
Overview of the Talk

Data Mining and Knowledge Discovery
DMKD in Bioinformatics
DMKD in Chemistry
Public Chemistry Databases for DMKD
Overview of Web Services
NIH-funded Projects Underway or Planned at
Indiana University
Educational Opportunities at IU

3
Data Mining and Knowledge Discovery (DMKD)

Techniques began to be used around 1989
Rapid growth in the mid 1990s, with DMKD field
emerging around 1995
Built on DM tools such as Machine Learning

4
Data Mining

One of the steps in Knowledge Discovery
Concerned with the actual extraction of knowledge
from data
Efficient and scalable methods for mining
interesting patterns and knowledge and
discovering hidden facts contained in large
databases

5
Data Mining Techniques

Efficient classification methods
Clustering
Outlier analysis
Frequent, sequential, and structured pattern
analysis
Visualization and spatial/temporal analysis tools

6
Knowledge Discovery (KD)

KD is a nontrivial process of identifying valid,
novel, potentially useful, and ultimately
understandable patterns from large collections of
data.
--Fayyad et al., as quoted by Cios and Kurgan
The KD process involves
Understanding and preparation of the data
Data Mining (DM)
Verification and application of the discovered
knowledge

7
Framework for KD Process

Steps range from very few, e.g.,
Data collection and understanding
Data mining
Implementation
To multi-step models, e.g., Cios and Kurgans
six-step DMKD process model

8
Cios and Kurgans Six-Step DMKD Process Model

Understanding the problem domain
Understanding the data
Preparation of the data
50 or more of effort spent on this step
Data mining
Evaluation of the discovered knowledge
Using the discovered knowledge

9
General Data Mining/Data Analysis Systems

SAS Enterprise Miner
SPSS
Insightful S-Plus
IBM DB2 Intelligent Miner
Microsoft SQLServer 2005
SGI MLC and MineSet Tree Visualizer
Inxight VizServer

10
Trends Major Conferences

Knowledge Discovery and Data Mining (KDD) 2005
http//www.informatik.uni-trier.de/ley/db/conf/kd
d/kdd2005.html
International Conference on Machine Learning
(ICML) 2006
http//www.icml2006.org/icml2006/technical/accepte
d.html
SIAM Conference on Data Mining 2006
http//www.siam.org/meetings/sdm06/proceedings.htm

11
12th Annual SIGKDD International Conference
onKnowledge Discovery and Data Mining,
Philadelphia, August 20-23, 2006

Areas of Interest on the Research Track
Applications of data mining (biomedicine,
business, e-commerce, defense)
Data and result visualization
Data warehousing
Data mining for community generation, social
network analysis and graph-structured data
Foundations of data mining
Interactive and online data mining
KDD framework and process
Mining data streams
Mining high-dimensional data
Mining sensor data
Mining text and semi-structured data
Mining multi-media data
Novel data mining algorithms
Privacy and data mining
Robust and scalable statistical methods
Pre-processing and post-processing for data
mining
Security issues
Spatial and temporal data mining

12
Trends in DMKD

OLAP (On-Line Analytical Processing)
Data warehousing
Association rules
High Performance DMKD systems
Visualization techniques
Applications of DM
More recently
Database products that incorporate DM tools
New developments in design and implementation of
the DMKD process
Information visualization products as end-user
queries
XML

13
XML the Key to DM and KD?

Or simply a data exchange protocol?
Allows for the description and storage of
structured or semi-structured data and their
relationships
Can be used to exchange data in a
platform-independent way
BUTonly one paper at the major conferences
listed earlier that dealt with XML

14
XML helps

Standardize communication between diverse DM
tools and databases (I/O procedures)
Build standard data repositories sharing data
between different DM tools that work on different
software platforms
Implement communication protocols between DM
tools
Provide a framework for integration of and
communication between different DMKD steps

15
Predictive Model Markup Language (PMML) and Other
Tools

In conjunction with XML, PMML enables the
automation of sharing of discovered knowledge
between different domains and tools
XML-RPC
SOAP (Simple Object Access Protocol)
UDDI
OLAP
OLE DB-DM

16
Discovery Informatics Definition

"Discovery Informatics is the study and practice
of employing the full spectrum of computing and
analytical science and technology to the singular
pursuit of discovering new information by
identifying and validating patterns in data."
--William W. Agresti in 2003

17
Discovery Informatics

Discovery and Application of Information
Data Mining and Machine Learning are two aspects
of Discovery Informatics.

18
Overview of the Talk

Data Mining and Knowledge Discovery
DMKD in Bioinformatics
DMKD in Chemistry
Public Chemistry Databases for DMKD
Overview of Web Services
NIH-funded Projects Underway or Planned at
Indiana University
Educational Opportunities at IU

19
Trends Bioinformatics Conferences

International Conference on Intelligent Systems
for Molecular Biology (ISMB) 2006
http//ismb2006.cbi.cnptia.embrapa.br/papers.html
Research in Computational Molecular Biology
(RECOMB) 2006
http//www.informatik.uni-trier.de/ley/db/conf/re
comb/recomb2006.html
Pacific Symposium on Biocomputing (PSB) 2006
http//helix-web.stanford.edu/psb06/

20
Main Areas of Research in Bioinformatics

Sequence alignment
Alternative splicing
Microarray analysis
Functional analysis
Analysis of single nucleotide polymorphisms
(SNPs)
Natural language text analysis

21
DMKD Sessions at Major Bioinformatics Conferences

Databases and Data Integration
Text Mining and Information Extraction
Semantic Webs

22
Data Mining in Bioinformatics (Bajcsy)

Data cleaning, data preprocessing, and semantic
integration of heterogeneous, distributed
biomedical databases
Existing data mining tools for biodata analysis
Development of advanced, effective, and scalable
data mining methods in biodata analysis

23
Preprocessing of Biodata

Integration of multiple microarray gene
experiments must resolve inconsistent labels of
genes to form a coherent data store.
Focus on quantitative quality metrics based on
analytical and statistical data descriptors and
on relationships among variables.

24
Semantic Integration of Heterogeneous Biomedical
Databases

Combine multiple sources into a coherent data
store
Find semantically equivalent real-world entities
from several biomedical sources
Problems
Different labels for the same concept gene_id
vs. g_id
Time asynchronization same gene analyzed at
multiple development stages

25
Approaches for Semantic Integration of Biodata

Construction of integrated biodata warehouses or
biodatabases
Construction of a federation of heterogeneous
distributed biodatabases
Must build up mapping rules or semantic ambiguity
resolution rules across multiple databases

26
Existing Data Mining Tools for Biodata Analysis-I

Sequence Analysis, e.g.,
NCBI/BLAST, ClustalW, HMMER, PHYLIP, MEME,
TRANSFAC, MDScan, Vector NTI, Sequencher,
MacVector
Structure Prediction and Visualization, e.g.,
RasMol, Raster3D, Swiss-Model, Scope, MolScript,
Cn3D

27
Existing Data Mining Tools for Biodata Analysis-II

Genome Analysis, e.g.,
CAP3, Paracel GenomeAssembler, GenomeScan,
GeneMark, GenScan, X-Grail, ORF Finder,
GeneBuilder
Pathway Analysis and Visualization, e.g.,
KEGG, EcoCyc/MetaCyc, GenMapp
Microarray Analysis, e.g.,
ScanAlyze/Cluster/TreeView, Scanalytics
MicroArray Suite, Profiler, Silicon Genetics

28
Biospecific Data Analysis Software Systems

Agilent GeneSpring
Spotfire
Invitrogen VectorNTI

29
Text Mining in Bioinformatics

Techniques have progressed from simple
recognition of terms to extraction of interaction
relationships in complex sentences.
Search objectives have broadened to a range of
problems, e.g.,
Improving homology search
Identifying cellular location
Deriving genetic network technologies

30
Current Work in Biomedical Text Mining (Cohen and
Hersh)

Text mining operates at a finer level of
granularity than information retrieval and text
summarization.
TM examines relationships between specific kinds
of information contained within and between
documents.
Areas of active research
Named entity recognition (genes, proteins, etc.)
Text classification
Synonym and abbreviation extraction
Relationship extraction
Hypothesis generation
Integrated frameworks

31
Systems Biology

Requires a shift in focus from genes and proteins
to the systems structure and dynamics
Four key properties
System structures
System dynamics
Control method
Design method
Systems Biology Markup Language (SBML) and CellML

32
iSpecies.org
33
Overview of the Talk

Data Mining and Knowledge Discovery
DMKD in Bioinformatics
DMKD in Chemistry
Public Chemistry Databases for DMKD
Overview of Web Services
NIH-funded Projects Underway or Planned at
Indiana University
Educational Opportunities at IU

34
Data Mining in Chemistry

Modern experimentation (whether classical or
high-throughput) should be based on the
productive interplay of statistical techniques
(design-of-experiments), molecular modeling as
well as cheminformatics.
--Ulrich S. Schubert

35
Session on Integration of Informatics and
Knowledge Management Informatics

Integration of Informatics at the Systems Level
and at the Data LevelChris L. Waller, Ph.D.,
Director, World Wide Chemistry Informatics,
Pfizer Global Research Development
Integrated Knowledge Management at Bayer
HealthCare Pharmacophore Informatics William J.
Scott, Ph.D., Team Leader, Department for
Chemistry Research, Bayer Pharmaceuticals
Corporation
Building a Knowledge Enabled OrganizationCory R.
Brouwer, Ph.D., Associate Director, Knowledge
Management Informatics, Pfizer Global Research
Development
Knowledge Management Building a Knowledge
Enabled OrganizationVictor Lobanov, Ph.D.,
Principal Scientist, MDI, Johnson Johnson
Pharmaceutical RD
10th Annual Cheminformatics Conference, May
23-16, 2006, Philadelphia

36
Impact of HTS and Combinatorial Chemistry Research

Most impact in
the pharmaceutical industry
medical research
catalyst research
More recently
polymer and materials research.

37
Diversity of Data Mining in Chemistry

On 5/7/2006 there were 4072 references to either
datamining or data mining in Chemical
Abstracts.
3416 different index terms were assigned to those
records.
2772 used 1-5 times (81)
298 used 6-10 times (9)
103 used 11-15 times (3)
71 used 16-20 times (2)
38 used 21-25 times (1)
24 used 26-30 times (1)
110 for 31-480 times (3)
Most frequent co-term bioinformatics with 480
hits or 12 of the occurrences

38
SFS graph
39
Components of the Semantic Web for Chemistry

XML eXtensible Markup Language
RDF Resource Description Framework
RSS Rich Site Summary
Dublin Core allows metadata-based newsfeeds
OWL for ontologies
BPEL4WS for workflow and web services
Murray-Rust et al. Org. Biomol. Chem. 2004, 2,
3192-3203.

40
Chemical Markup Language (CML)

Much of the semantics in a chemical article can
be supported by CML
Molecules
Structures
Reactions and reaction schemes
Spectra (including annotations)
Physicochemical data
XML dictionaries and lexicons provide linguistic
and semantic support for markup
Will lead to quicker authoring and higher quality
of embedded structures and data through machine
validation

41
Key Factors in the Success of the Chemical
Semantic Web

Institutional Repositories services deployed and
supported at an institutional level to offer
dissemination management, stewardship, and where
appropriate, long-term preservation of both the
intellectual work created by an institutional
community and the records of the intellectual and
cultural life of the institutional community
Open Access Movement

42
Knowledge-Driven Bioinformatics Enhanced with
Chemistry
43
Text Mining (Banville)

In the pharmaceutical field, it is ideally the
marriage of biological and chemical information
that needs to be the ultimate focus of text data
mining applications.
Problems
Lack of universal publication standards for
identifying each unique chemical entity
Selective indexing policies of AI services
Need to understand how chemical structures link
to biological processes

44
OSCAR3 Service

Open Java source application under development by
Peter Murray-Rust group at Cambridge (Not
published yet)
Extracts chemical information from either a
paragraph of experimental data or a full paper
(e.g. melting points, infra-red and NMR data, and
mass spectral information)
Produces an XML instance highlighting the
chemical information with an Extensible
Stylesheet Language (XSL) file
At IU, we are attaching SOAP input/output engine
for a web service based on OSCAR3.

45
OSCAR at Work in the Future
46
Semantic Scholars Grid I
Local MDStore
Local HarvestStore
Fetch MD and Documents
Gatherer
Query and Get list
Indexer
Analyzer
Index all Local MD
Run filter such asOSCAR onharvested MDand
documents Store new MD
47
Semantic Scholars Grid II
Local MDStore
Plug-in
SynchronizeSSG andforeign MD
Updater
CommunityTools
SSGViewer
Instant Citation Index etc.
Update local MD Control foreign interactions View
all MD Access Community Tools
Update and viewforeign MD
48
Chemical Datamining Software

SureChem
http//surechem.reeltwo.com/
CLiDE
Recognizes structures, reactions, and text
http//www.simbiosys.ca/clide/
OSCAR
OSCAR1 to check experimental data
http//www.ch.cam.ac.uk/magnus/checker.html
http//www.rsc.org/Publishing/ReSourCe/AuthorGuide
lines/AuthoringTools/ExperimentalDataChecker/
CSR (Chemical Structure Reconstruction)
http//www.scai.fraunhofer.de/uploads/media/MZ-ERC
IM05_04.pdf
MDL DocSearchcombines MDLs Isentris platform
and EMCs Documentum

49
Overview of the Talk

Data Mining and Knowledge Discovery
DMKD in Bioinformatics
DMKD in Chemistry
Public Chemistry Databases for DMKD
Overview of Web Services
NIH-funded Projects Underway or Planned at
Indiana University
Educational Opportunities at IU

50
ChemDB http//cdb.ics.uci.edu/CHEM/Web/
51
ChEBI, Chemical Entities of Biological Interest

Dictionary of molecular entities focused on small
chemical compounds
Features an ontological classification, showing
the relationships between molecular entities or
classes of entities and their parents and/or
children

52
Vioxx Entry in ChEBI
53
The IUPAC International Chemical Identifier
(InChI)

Open source, non-proprietary, public-domain
identifier for chemicals
String of characters that uniquely represent a
molecular substance
Independent of the way the chemical structure is
drawn
Enables reliable structure recognition and easy
linking of diverse data compilations
Accepts as input MOLfiles (or SDfiles) and CML
files
Download the program to your computer at
http//www.iupac.org/inchi/license.html

54
Generation of InChI for Vioxx with wInChI
55
Vioxx Entry in PubChem Compounds Found with InChI
56
Vioxx Bioassay Data in PubChem
57
Vioxx PubChem Link to External Sources of
Information
58
PubChem Link to Elsevier MDL

DiscoveryGate www.discoverygate.com
provides access to integrated scientific content
from databases, journal articles, patent
publications and reference works
information providers include Elsevier,
Thomson-Derwent, FIZ CHEMIE, the U.S. FDA, Prous
Science and Thieme
MDL Compound Index (the master list of substances
included in DiscoveryGate data sources) now
exceeds 14 million unique chemical structures
with the addition of 5 million chemical
structures from the PubChem database.

59
The Elsevier MDL/NIH Link via PubChem and
DiscoveryGate

Cross-indexes PubChem to the Compound Index
hosted on Elsevier MDLs DiscoveryGate platform
MDL added 5 million structures from PubChem to
their index, resulting in over 14 million unique
chemical structures
Links go both ways
Can move from biological data in PubChem to
bioactivity, chemical sourcing, synthetic
methodology, and EHS data in DiscoveryGate
sources

60
Elsevier MDLs xPharm

Comprehensive set of records linking
Agents (compounds) (2300)
Targets (600)
Disorders (450)
Principles that govern their interactions (180)
Answers questions such as
What targets are associated with control of blood
pressure?
What adverse effects are associated with
monoamine oxidase inhibitors?

61
Web Guide for Essential Cheminformatics Resources

http//www.chembiogrid.org
http//www.indiana.edu/cheminfo/cicc/

62
ChemBioGrid Chemical Databases
63
Overview of the Talk

Data Mining and Knowledge Discovery
DMKD in Bioinformatics
DMKD in Chemistry
Public Chemistry Databases for DMKD
Overview of Web Services
NIH-funded Projects Underway or Planned at
Indiana University
Educational Opportunities at IU

64
Web Services Overview

What are Web Services?
A distributed invocation system built on Grid
computing
Independent of platform and programming language
Built on existing Web standards
A service oriented architecture with
Interfaces based on Internet protocols
Messages in XML (except for binary data
attachments)

65
Web Services for Chemistry Problems

Performance and scalability
Proprietary data
Competition from high-performance desktop
applications
-- Geoff Hutchison, its a puzzle blog,
2005-01-05
ALSO
Lack of a substantial body of trustworthy Open
Access databases
Non-standard chemical data formats (over 40 in
regular use and requiring normalization to one
another)

66
DM Internet Toolbox Architecture
67
Overview of the Talk

Data Mining and Knowledge Discovery
DMKD in Bioinformatics
DMKD in Chemistry
Public Chemistry Databases for DMKD
Overview of Web Services
NIH-funded Projects Underway or Planned at
Indiana University
Educational Opportunities at IU

68
Indiana University Planned Projectshttp//www.ch
embiogrid.org

Application of a Grid-based distributed data
architecture to chemistry
Development of tools for HTS data analysis and
virtual screening
Database for quantum mechanical simulation data
Chemical prototype projects
Novel routes to enzymatic reaction mechanisms
Mechanism-based drug design
Data-inquiry-based development of new methods in
natural product synthesis

69
Web Services for Chemistry at IU
70
NCI Developmental Therapeutics Program (DTP)

Downloadable data
In vitro 60 cell line results
in vitro anti-HIV results
Yeast assay
200,000 chemical structures
molecular targets
microarray data
Or search the database at
http//dtp.nci.nih.gov/docs/dtp_search.html

71
IU Database of NIH DTP Data

Contains over 200,000 chemical structures tested
in 60 cellular assays from different human tumor
cell lines
Also includes microarray assay profiles for the
untreated cell lines (14,000 datapoints)
A local PostgreSQL database containing the data
that is exposed as a web service
Using workflows and complex SQL queries, we can
do advanced data mining that exploits the
chemical, biological and genomic information for
particular audiences (chemists, biologists, etc)

72
Mining the NIH DTP database
14,000 gene expression values
60 cell lines
Cell lines can be clustered based on gene
expression similarity
200,000 compounds
Compounds can be clustered based on similarity of
profile across cell lines, or by chemical
structure fingerprint similarity
73
Use of Taverna at IU

A protein implicated in tumor growth is supplied
to the docking program (in this case HSP90 taken
from the PDB 1Y4 complex)
The workflow employs our local NIH DTP database
service to search 200,000 compounds tested in
human tumor cellular assays for similar
structures to the ligand.
Client portlets are used to browse these
structures
Once docking is complete, the user visualizes the
high-scoring docked structures in a portlet using
the JMOL applet.
Similar structures are filtered for drugability,
and are automatically passed to the OpenEye FRED
docking program for docking into the target
protein.
A 2D structure is supplied for input into the
similarity search (in this case, the extracted
bound ligand from the PDB IY4 complex)
Correlation of docking results and biological
fingerprints across the human tumor cell lines
can help identify potential mechanisms of action
of DTP compounds

74
Taverna Workflow
Workflow definition
Available web services (WSDL)
Visual depiction of workflow
75
Taverna in Action
76
CGL Contributions to CICC

Build Web/Grid services for connecting
Data sources
Applications (simulation, data mining, data
assimilation, imaging, etc).
Computing resources
Information services.
Third party tool evaluation
Workflow (Taverna)
Grid tools Globus and Condor (for interacting
with TeraGrid)
Building standards-based Web portal environments.
OGCE grid portal project
JSR 168 Java standards.
This activity will begin in earnest over the
summer.

77
Digital Chemistry (BCI) Clustering Service Methods
78
Local Web Service Methods for WWMM of PMRs Group
79
More Services
80
ToxTree

An in silico toxicology prediction suite
Based on the CDK toolkit
Built on CML
Released as OpenSource under the GPL
Standalone PC software
User Manual http//ecb.jrc.it/DOCUMENTS/QSAR/TOXT
REE/toxTree_user_manual.pdf

81
ToxTree Service

An open Java source application by Nina
Jeliazkova
Estimates toxic hazard by applying a decision
tree approach.
Encodes the Cramer scheme
(Cramer G. M., R. A. Ford, R. L. Hall,
Estimation of Toxic Hazard - A Decision Tree
Approach, J. Cosmet. Toxicol., Vol.16, pp.
255-276, Pergamon Press, 1978)
Could be applied to datasets from various
compatible file types.
We are converting this GUI application to a
text-based web service

82
Overview of the Talk

Data Mining and Knowledge Discovery
DMKD in Bioinformatics
DMKD in Chemistry
Public Chemistry Databases for DMKD
Overview of Web Services
NIH-funded Projects Underway or Planned at
Indiana University
Educational Opportunities at IU

83
Chemoinformatics Education at IU

School of Informatics degree programs
BS, MS, PhD
Programs offered at both the Indianapolis (IUPUI)
and Bloomington (IUB) campuses

84
Other Educational Activities

Graduate Certificate Program in Chemical
Informatics (4 courses by Distance Education)
I571 Chemical Information Technology (3 cr.)
I572 Computational Chemistry and Molecular
Modeling (3 cr.)
I573 Programming Techniques for Chemical and Life
Science Informatics (3 cr.)
I553 Independent Study in Chemical Informatics (3
cr.)
I571 as CIC Courseshare offering w. Michigan
Experiments with teleconferencing as a distance
education tool

85
PhD in Informatics

Began in August 2005
Tracks
bioinformatics chemical informatics health
informatics human-computer interaction design
social and organizational informatics
Under development
complex systems, networks, modeling and
simulation cybersecurity discovery and
application of information logical and
mathematical foundations music informatics

86
Graduate Enrollment Chemo-, Laboratory, Bio-,
Health Informatics
87
Software/DBs Used in the Program

Company Products and/or (Target
Area)
ArrgusLab (Molecular modeling)
Digital Chemistry Toolkit (Clustering)
Cambridge Cryst Data Ctr Cambridge Structrual DB
GOLD
CambridgeSoft ChemDraw Ultra
Chemical Abstracts Service SciFinder Scholar
Chemaxon Marvin (and other software)
Daylight Chemical Info System Toolkit
FIZ Karlsruhe Inorganic Crystal Structure DB
IO-Informatics Sentient
MDLCrossFire Beilstein and Gmelin
OpenEye Toolkit (and other software)
Sage Informatics ChemTK
Serena Software PCMODEL
Spotfire DecisionSite
STN International STN Express with Discover
(Anal Ed)
Wavefunction Spartan

88
Closing quote

The future of chemistry depends on the
automated analysis of chemical knowledge,
combining disparate data sources in a single
resource, . . . which can be analysed using
computational techniques to assess and build on
these data.
Townsend et al. Org. Biomol. Chem. 2004, 2, 3299.

89
We all need help when overloaded!
90
Bibliography

Agresti, William W. Discovery informatics.
Communications of the ACM 2003, 46(8), 25-28.
Bajcsy, Peter Han, Jiawei Liu, Lei Yang,
Jiong. "Survey of bio-data analysis from a data
mining perspective." Chapter 2 in Wang, Jason T.
L. Zaki, Mohammed J. Toivonen, Hannu T. T.
Shasha, Dennis (eds.), Data Mining in
Bioinformatics. London, Springer Verlag, 2005,
pp.9-39.
Banville, Debra L. Mining chemical structural
information from the drug literature. Drug
Discovery Today, 2006, 11(1/2), 35-42.
Cios, Krzysztof J. Kurgan, Lukasz A. Trends in
data mining and knowledge discovery. Chapter 1
in Pal, N.R. Jain, L.C. Teodoresku, N. (eds.),
Knowledge Discovery in Advanced Information
Systems. N.Y., Springer Verlag, 2002, pp. 1-26.
Cohen, Aaron M. Hersh, W.illiam R. "A survey of
current work in biomedical text mining."
Briefings in Bioinformatics March 2005, 6(1),
57-71.
Corbett, Peter T. Murray-Rust, Peter Day, Nick
E. Townsend, Joe A. Rzepa, Henry S.
Chemistry publications in CML. Abstracts of
Papers, 231st ACS National Meeting, Atlanta, GA,
United States, March 26-30, 2006, CINF-055.

91
Bibliography

Fayyad, U.M. Piatesky-Shapiro, G. Smyth, P.
Uthurusamy, R. Advances in Knowledge Discovery
and Data Mining. AAAi/MIT Press, 1996. (quoted by
Cios and Kurgan)
Gardner, Stephen P. Ontologies and semantic data
integration. Drug Discovery Today 2005 10(14),
1001-1007.
Guha, R. Howard, M.T. Hutchison, G.R.
Murray-Rust, P. Rzepa, H. Steinbeck, C Wegner,
J. Willighagen, E.L. The Blue
ObeliskInteroperability in chemical
informatics. Journal of Chemical Information and
Modeling 2006 Web Release Date 22-Feb-2006 DOI
10.1021/ci050400b
Holliday, Gemma L. Murray-Rust, Peter Rzepa,
Henry S. Chemical Markup, XML, and the World
Wide Web. 6. CMLReact, an XML Vocabulary for
Chemical Reactions. Journal of Chemical
Information and Modeling 2006, 46(1), 145-157.
Jónsdóttir, S.O. Jorgensen, F.S. Brunak, S.
Prediction methods and databases within
chemoinformatics emphasis on drugs and drug
candidates. Bioinformatics 2005 May 15 21(10)
2145-60.

92
Bibliography

Karthikeyan, M. Krishnan, S. Pankey, Anil
Kumar. Harvesting chemical information from the
Internet using a distributed approach
ChemXtreme. Journal of Chemical Information and
Modeling. DOI 10.1021/ci050329.
Krallinger, Martin Alonso-Allende Erhardt,
Ramon Valencia, Alfonso. Text-mining approaches
in molecular biology and biomedicine. Drug
Discovery Today 2005, 10(6), 439-445.Scherf Uwe,
Ross Douglas T., Waltham Mark, Smith Lawrence H.,
Lee Jae K., Tanabe Lorraine, Kohn Kurt W.,
Reinhold William C., Myers Timothy G., Andrews
Darren T., Scudiero Dominic A., Eisen Michael B.,
Sausville Edward A., Pommier Yves, Botstein
David, Brown Patrick O., Weinstein John N. A
gene expression database for the molecular
pharmacology of cancer. Nature Genetics 2000,
24, 236-244.
Schubert, Ulrich S. "Materials informatics from
data to knowledge towards integrated escience
approaches." QSAR Combinatorial Science 2005,
24(1), 5. (NB Entire issue is devoted to this
topic.)
SIAM International Conference on Data Mining
(5th 2005 Newport Beach, CA) Data Mining
Proceedings. Kargupta, Hillol et al., eds. SIAM,
2005.
Torr-Brown, Sheryl. Advances in knowledge
management for pharmaceutical research and
development. Current Opinion in Drug Discovery
Development 2005, 8(3), 316-322.

93
Web 2.0

Social Software allows group interactions
Enables groups to form and organize themselves
Examples
Wikis
Blogs
RSS (now found on chemistry.org)
Podcasting/Coursecasting
Webcasting/Webinars
Flickr
Jybe
FURL

94
FURL (Frame Uniform Resource Locater)

For archiving and sharing of web pages
Furler can capture the pages for a discussion
group
Tracks useful pages for a discussion
http//www.furl.net/home.jsp

95
Jybe (Join Your Browser with Everyone)

Collaboration and communication in real time with
IE and Firefox
Screen-sharing AND editing
Privacy protected must be invited
Upload documents to convert to html
http//www.jybe.com

Write a Comment

User Comments (0)

About PowerShow.com

Mopping up the Flood of Data with Web Services PowerPoint PPT Presentation