Computational Discovery of Communicable Knowledge - PowerPoint PPT Presentation

About This Presentation
Title:

Computational Discovery of Communicable Knowledge

Description:

... new candidate values through random jumps along dimensions of the parameter ... 4. If no improvement occurs after N jumps, it restarts the search from a new ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 36
Provided by: Lang8
Learn more at: http://www.isle.org
Category:

less

Transcript and Presenter's Notes

Title: Computational Discovery of Communicable Knowledge


1
Computational Discovery of Communicable
Scientific Models
Pat Langley Center for the Study of Language and
Information Stanford University, Stanford,
California http//cll.stanford.edu/langley langle
y_at_csli.stanford.edu
Thanks to N. Asgharbeygi, K. Arrigo, S. Bay, S.
Dzeroski, J. Sanchez, Oren Shiran, and L.
Todorovski for their contributions to this
research, which is funded by a grant from the
National Science Foundation.
2
Data Mining vs. Scientific Discovery
There exist two computational paradigms for
discovering explicit knowledge from data
  • Data mining generates knowledge cast as decision
    trees, logical rules, or other notations
    invented by AI researchers
  • Computational scientific discovery instead uses
    equations, structural models, reaction pathways,
    or other formalisms invented by scientists and
    engineers.

Both approaches draw on heuristic search to find
regularities in data, but they differ
considerably in their emphases.
3
Lesson 1
Traditional notations from machine learning are
not communicated easily to domain scientists.
Ecosystem model
Gene regulation model
NPPc Smonth max (E IPAR, 0) E 0.56 T1
T2 W T1 0.8 0.02 Topt 0.0005
Topt2 T2 1.18 / (1 e 0.2 (Topt
Tempc 10) ) (1 e 0.3 (Tempc Topt 10)
) W 0.5 0.5 EET / PET PET
1.6 (10 Tempc / AHI)A PET-TW-M if Tempc gt
0 PET 0 if Tempc lt 0 A
0.00000068 AHI3 0.000077 AHI2 0.018 AHI
0.49 IPAR 0.5 FPAR-FAS Monthly-Solar
Sol-Conver FPAR-FAS min (SR-FAS 1.08)
/ SR (UMD-VEG) , 0.95 SR-FAS
(Mon-FAS-NDVI 1000) / (Mon-FAS-NDVI 1000)
4
Lesson 2
Scientists often have initial models that should
influence the discovery process.
Discovery
Initial model
Observations
m
Revised model
5
Lesson 3
Scientific data are often rare and difficult to
obtain rather than being plentiful.
Ecosystem model
Gene regulation model
Number of variables Number of initial links
Number of possible links Number of samples
9 11 ?70 20
6
Lesson 4
Scientists want models that move beyond
description to provide explanations of their data.
Ecosystem model
Gene regulation model
7
Lesson 5
Scientists want computational assistance rather
than automated discovery systems.
Discovery
Initial model
Observations
Revised model
8
The Nature of Systems Science
Disciplines like Earth science and computational
biology differ from traditional fields in that
they
  • focus on synthesis rather than analysis in their
    operation
  • rely on computer modeling as one of their central
    methods
  • develop system-level models with many variables
    and relations
  • require that models make contact with known
    mechanisms.

However, existing methods for computational
scientific discovery were not designed with
systems science in mind.
9
Time Series from the Ross Sea Ecosystem
10
Inductive Process Modeling
Our approach is to design and implement
computational methods for inductive process
modeling, which
  • represent scientific models as sets of
    quantitative processes
  • use these models to predict and explain
    observational data
  • search a space of process models to find good
    candidates
  • utilize background knowledge to constrain this
    search.

This framework has great potential both for
modeling scientific reasoning and aiding
practicing scientists.
11
Existing Formalisms Are Inadequate
12
A Process Model for an Aquatic Ecosystem
model AquaticEcosystem variables phyto, zoo,
nitro, residue observables phyto, nitro process
phyto_loss equations dphyto,t,1 ? 0.307 ?
phyto dresidue,t,1 0.307 ? phyto process
zoo_loss equations dzoo,t,1 ? 0.251 ?
zoo dresidue,t,1 0.251 process
zoo_phyto_grazing equations dzoo,t,1 0.615
? 0.495 ? zoo dresidue,t,1 0.385 ? 0.495 ?
zoo dphyto,t,1 ? 0.495 ? zoo process
nitro_uptake conditions nitro gt 0
equations dphyto,t,1 0.411 ?
phyto dnitro,t,1 ? 0.098 ? 0.411 ?
phyto process nitro_remineralization
equations dnitro,t,1 0.005 ?
residue dresidue,t,1 ? 0.005 ? residue
13
Advantages of Quantitative Process Models
Process models offer scientists a promising
framework because
  • they embed quantitative relations within
    qualitative structure
  • that refer to notations and mechanisms familiar
    to experts
  • they provide dynamical predictions of changes
    over time
  • they offer causal and explanatory accounts of
    phenomena
  • while retaining the modularity needed for
    induction/abduction.

Quantitative process models provide an important
alternative to formalisms used currently in
computational discovery.
14
Challenges of Inductive Process Modeling
Process model induction differs from typical
learning tasks in that
  • process models characterize behavior of dynamical
    systems
  • variables are continuous but can have
    discontinuous behavior
  • observations are not independently and
    identically distributed
  • models may contain unobservable processes and
    variables
  • multiple processes can interact to produce
    complex behavior.

Compensating factors include a focus on
deterministic systems and the availability of
background knowledge.
15
Encoding Background Knowledge
To constrain candidate models, we can utilize
available backround knowledge about the domain.
Previous work has encoded background knowledge in
terms of
  • Horn clause programs (e.g., Towell Shavlik,
    1990)
  • context-free grammars (e.g., Dzeroski
    Todorovski, 1997)
  • prior probability distributions (e.g., Friedman
    et al., 2000)

However, none of these notations are familiar to
domain scientists, which suggests the need for
another approach.
16
Generic Processes as Background Knowledge
We cast background knowledge as generic processes
that specify
  • the variables involved in a process and their
    types
  • the parameters appearing in a process and their
    ranges
  • the forms of conditions on the process and
  • the forms of associated equations and their
    parameters.

Generic processes are building blocks from which
one can compose a specific process model.
17
Generic Processes for Aquatic Ecosystems
generic process exponential_loss generic process
remineralization variables Sspecies,
Ddetritus variables Nnutrient,
Ddetritus parameters ? 0, 1 parameters
? 0, 1 equations dS,t,1 ?1 ? ? ? S
equations dN, t,1 ? ? D dD,t,1 ? ?
S dD, t,1 ?1 ? ? ? D generic process
grazing generic process constant_inflow
variables S1species, S2species, Ddetritus
variables Nnutrient parameters ? 0, 1, ?
0, 1 parameters ? 0, 1
equations dS1,t,1 ? ? ? ? S1
equations dN,t,1 ? dD,t,1 (1 ? ?) ? ? ?
S1 dS2,t,1 ?1 ? ? ? S1 generic process
nutrient_uptake variables Sspecies,
Nnutrient parameters ? 0, ?, ? 0, 1, ?
0, 1 conditions N gt ? equations dS,t,1
? ? S dN,t,1 ?1 ? ? ? ? ? S
18
Inducing Process Models
training data
process model
Induction
generic processes
19
A Method for Process Model Construction
The IPM algorithm constructs explanatory models
from generic elements components in four stages
1. Find all ways to instantiate known generic
processes with specific variables, subject to
type constraints 2. Combine instantiated
processes into candidate generic models subject
to additional constraints (e.g., number of
processes) 3. For each generic model, carry
out search through parameter space to find good
coefficients 4. Return the parameterized model
with the best overall score.
Our typical evaluation metric is squared error,
but we have also explored other measures of
explanatory adequacy.
20
Estimating Parameters in Process Models
To estimate the parameters for each generic model
structure, the IPM algorithm
1. Selects random initial values that fall within
ranges specified in the generic processes 2.
Improves these parameters using the
Levenberg-Marquardt method until it reaches a
local optimum 3. Generates new candidate values
through random jumps along dimensions of the
parameter vector and continue search 4. If no
improvement occurs after N jumps, it restarts the
search from a new random initial point.
This multi-level method gives reasonable fits to
time-series data from a number of domains, but it
is computationally intensive.
21
Observations from the Ross Sea
22
Results on Training Data from Ross Sea
23
Results on Test Data from Ross Sea
24
Results on a Protist Ecosystem
25
Results on Rinkobing Fjord
26
Results on Biochemical Kinetics
observed trajectories
predicted trajectories
27
Interfacing with Scientists
Because few scientists want to be replaced, we
are developing an interactive environment,
PROMETHEUS, that lets users
  • specify a quantitative process model of the
    target system
  • display and edit the models structure and
    details graphically
  • simulate the models behavior over time and
    situations
  • compare the models predicted behavior to
    observations
  • invoke a revision module in response to detected
    anomalies.

The environment offers computational assistance
in forming and evaluating models but lets the
user retain control.
28
Viewing a Process Model Graphically
29
Indicating Processes to Consider Adding
30
Specifying Data and Search Parameters
31
Inspecting Revised Process Models
32
Intellectual Influences
Our approach to computational discovery
incorporates ideas from many traditions
  • computational scientific discovery (e.g., Langley
    et al., 1983)
  • theory revision in machine learning (e.g.,
    Towell, 1991)
  • qualitative physics and simulation (e.g., Forbus,
    1984)
  • languages for scientific simulation (e.g.,
    STELLA, MATLAB)
  • interactive tools for data analysis (e.g.,
    Schneiderman, 2001).

Our work combines, in novel ways, insights from
machine learning, AI, programming languages, and
human-computer interaction.
33
Contributions of the Research
In summary, our work on computational scientific
discovery has, in responding to various
challenges, produced
  • a new formalism for representing scientific
    process models
  • a computational method for simulating these
    models behavior
  • an encoding for background knowledge as generic
    processes
  • an algorithm for inducing process models from
    time-series data
  • an interactive environment for model
    construction/utilization.

We have demonstrated this approach to model
creation on domains from Earth science,
microbiology, and engineering.
34
Some Recent Extensions
In recent work, we have extended our approach to
incorporate
  • heuristic beam search through the space of
    process models
  • hierarchical generic processes that further
    constrain search
  • an ensemble-like method that mitigates
    overfitting effects
  • metrics for explanatory adequacy based on
    trajectory shapes.

Inductive process modeling has great potential to
speed progress in systems science and engineering.
35
End of Presentation
Write a Comment
User Comments (0)
About PowerShow.com