Title: Scientific Models, and an overview of Computational approaches to Modelling
1Scientific Models, and an overview of
Computational approaches to Modelling Model
RefinementDerek SleemanComputing Science
DepartmentThe University of ABERDEENwww
csd.abdn.ac.uk/sleeman
2SYSTEMS BIOLOGY SEMINAR(OVERVIEW)
- An overview of scientific model building
- Types of knowledge which need to be accommodated
- Modelling numerical statistical approaches
- Modelling with symbolic knowledge
- Refinement of symbolic models
- Towards a check list what modellers would like
to know about a scientists data set / model
goals.
3- THE SCIENTIFIC PROCESS (A NAÏVE VIEW)
THEORY
Scientist suggests experiments on basis of
Discrepancies and Gaps
Merge / Reconcile Data Theory
Experiments
Data
Analysis
This will comprise of different types of
Knowledge, numerical, symbolic,
procedural, causality, uncertainty
4RELATIONSHIP BETWEEN DATA, EXPERIMENTS THEORIES
- Experiments run to challenge / extend / confirm
an existing model. - Exploratory experiments, (e.g., where the
scientist believes that parameter X is influenced
by variables V1.Vn) Classical Physics
experiments Reaction Mechanism equations from
Chemistry, - ..
- Comments
- - In all cases, the major purpose of a model is
that it be used predictively. - - In theory and in practice, refining models is
generally easier than creating (new) ones.
5TYPE OF MODEL being developed
- CLASSICAL NUMERIC
- Simple Algebraic Equations
- V u at (Newtons law of motion)
- f m x a
- 1a) More complex types of functions
- Ordinary Differential Equations
- Partial Differential Equations
- Complex functions (e.g. Bessel)
- 1b) NUMERICAL ANALYSIS TECHNIQUES
- - Finding the MINIMUM or MAXIMUM of a function
(where in theory there are many local minima). - - Integration by summation of areas under the
curve. -
6TYPE OF MODEL Contd
- STATISTICAL MODELLING
- a) HYPOTHESIS TESTING (within a given confidence
level) - b) CORRELATION OF VARIABLES
- (Variant on curve(line) fitting
- y mx c)
- c) DATA CLUSTERING
7 - PROCESS MODELLING
- a) Explaining how a lavatory cistern works
- - RELEASE OF WATER
- - HOW THE VALVE OPENS
- - THE MECHANISM used to close a valve
- b) How a JET ENGINE WORKS. (e.g. Combustion
Chamber) -
8PROCESS MODELLING Contd
c1) PROCESS-TYPE-EXPLANATIONS for why a certain
amount of salt dissolved in water produces a
different freezing point (FP) than
expected. THEORY change in FP proportional to
(molar) concentration of salt. OBSERVATION the
change is greater than expected. QUALITATIVE /
PROCESS explanation that the salt molecules
dissociate (i.e. split into smaller components)
in the liquid. c2) Some chemicals have a lesser
effect than expected so we argue the molecules
associate in the liquid (i.e. huddle
together). d) PROCESS models for how Na K
ions move between cells.
9- 4. QUALITATIVE REASONING
- (Not a Numerical not a conventional Symbolic
formalism) - if a? then b?
- if a? then b?
- if a? and c? then b (no change)
- Detailed mathematical relationships may not be
available, e.g. - The physics of Snooker
- Why a car is harder to start in cold weather
- Effects of EXERCISE on HYPERTENSION
- BUT qualitative explanations may be available
105. TAXONOMIC / SYMBOLIC REASONINGAfter a
brief discussion of Moving between different
types of models
11MOVING BETWEEN DIFFERENT TYPES OF MODELS
So often a scientists primary model is a PROCESS
model, but to see if it is consistent with
experimental data it is often necessary to
convert the process into some kind of
QUANTITATIVE equations. But to make this
possible/tractable the expert often has to
introduce ASSUMPTIONS. Example much of classical
Physics assumes that objects are WEIGHTLESS,
FRICTIONLESS, RIGID Clearly important to
record all assumptions made, as in some
circumstances they may not apply. So the
CORRELATION between THEORY DATA is often done
using QUANTITATIVE equations and the types
outlined above. (Algebraic, quantitative,
statistical, etc.) THE RESULTS then often have to
be reinterpreted by the scientist in terms of
their PROCESS models. (Mention Simons experiment
on length and TIME)
12MOVING BETWEEN DIFFERENT TYPES OF MODEL
(footnote)Process models themselves might be
subject to major constraints higher level
principles of the subject domain.For example,
any model of Reaction Kinetics (in chemistry)
should predict that solutions will be neutral and
not have an overall POSITIVE or NEGATIVE charge.
135. TAXONOMIC/SYMBOLIC MODELLING IF member
(herbivore, mammal) THEN is_a (herbivore, mammal)
IF member (zebra, herbivore) THEN is_a (zebra,
herbivore) gt is a (zebra, mammal) IF part-of
(hand, arm) AND part-of (arm, body) THEN
part-of (hand, body) IF part-of (x, y) AND
part-of (y, z) then part-of (x, z)
14- SYMBOLIC / LOGIC REASONING
- ? Theorem Provers
- ? Reasoning Engines which given symbolic
statements /rules information / data will
produce decisions - Diagnosis of patients illness (eg MYCIN)
- Design of a Lift
- Critique of a design
- Classification of a biopsy
- Tutoring / Training systems (eg NeoMYCIN)
- Frequently important to persuade users that the
inferences are correct, and so these systems
often have EXPLANATION mechanisms.
15HANDLING UNCERTAINTY IN KBSs
- Not all our knowledge is 100 certain
- Different approaches to uncertainty can be looked
at along the following dimensions - What knowledge and data has uncertainty
associated with it? - How is this information represented?
- How are different pieces of evidence combined?
- How do different levels of certainty affect what
the system does?
16UNCERTAINTY IN MYCIN
- Uncertainty is represented using Certainty
Factors (CF) whose value is in the range - -1 ? CF ? 1
- CF 1 the fact or rule is certainly true
- CF 0 we know nothing about whether the fact
or rule is true or not - CF -1 the fact or rule is certainly not true
17UNCERTAINTY IN MYCIN
- Both knowledge and data can be represented as
being uncertain. - Rules (knowledge)
- IF
- the stain of the organism is Gram negative
- AND the morphology of the organism is rod
- AND the aerobicity of the organism is aerobic
- THEN the class of the organism is
enterobacteriaceae with confidence 0.9
18UNCERTAINTY IN MYCIN DATA
- - The stain of the organism is definitely Gram
negative (1.0). - the morphology is rod, with confidence 0.8.
- the morphology is coccus, with confidence 0.2
- the aerobicity is aerobic, with confidence 0.6
- the aerobicity is anaerobic, with confidence
0.3 - the aerobicity is BOTH, with confidence 0.1
- THEN the class of the organism is
enterobacteriaceae with confidence 0.9
19Uncertainty in MYCIN conclusion
CFconclusion CFrule CF (data) CFdata min
(CFd1, CFd2.. CFdn) In this instance CFconclus
ion .9 min (1.0, .8, .6) CFconclusion .9
.6 CFconclusion .54 IF several rules reach
the same conclusion then there is an algorithm
for combining the CFs
20UNCERTAINTY IN MYCIN MULTIPLE CONCLUSIONS/RULES
- Example of several rules hence several
conclusions - if A then B
- if C and D and E then B
- Calculate the CF for each Conclusion (for B)
call these CFm CFn - Combine the various CFs for 2 Conclusions (CFm
CFn) - If CFm gt 0 CFn gt 0 then
- CF CFm CFn CFm CFn
- Process stops when CF 1 OR CF -1
- OTHER formulae if CFm CFn are negative and if
they have different signs
21REFINEMENT of KNOWLEDGE BASES
- These systems inevitably require a sizeable
amount of domain knowledge. This can be
acquired from - domain experts (KA)
- detailed examples (using ML techniques) etc
- However for complex tasks these KBs are
inevitably - incomplete when further Knowledge-Acquisition is
needed - inconsistent when the KB needs to be refined.
- also it is likely that background knowledge will
be incomplete thus requiring an expert to be
present to act as an ORACLE. - Hence the need for Co-operative Knowledge
Acquisition Knowledge Refinement Systems
22CO-OPERATIVE KNOWLEDGE ACQUISITION KNOWLEDGE
REFINEMENT SYSTEMS
KRUST (Classical KB Classification) (Susan
Craw) STALKER (Efficient ATM-based
implementation Classification)(Leo
Carbonara) REFINER/Refiner (Case-base
Classification) (Sunil Sharma Mark
Winter) CRIMSON (Refinement of
Constraints) (Mark Winter) RETAX (Revision of
Taxonomies) (Eugenio
Alberdi) TIGON Time Series Data/Causal Model
(Diagnosis) (Fraser Mitchell) SALT Rules
Constraints Propose Revise (Piero Leo)
23KRUST A KNOWLEDGE-BASED REFINEMENT SYSTEM
Rule-based Inference Engine
Task, T
Expert Solutions, E
KB for particular domain
System Solution, S
24REFINEMENT MECHANISM
Task, T System Solution, S Experts Solution,
E Current KB Set of Standard Tasks - Solutions
KB Revision System
Revised KB
(which will solve Task T correctly and all the
set of standard Tasks)
We have implemented a series of such refinement
systems including KRUST which refines MYCIN like
rules
25APPLICATIONS OF KNOWLEDGE REFINEMENT SYSTEMS
a) Re-TAX Given an existing (faulty) TAXONOMY
and the re-classified entities, the system
produced the same revised Taxonomy as the
Botanists. References www.csd.abdn.ac.uk/sleeman
TIGON Revised a qualitative process model of a
turbine engine on the basis of feedback provided
by the engineer. References www.csd.abdn.ac.uk/s
leeman Planned Application Develop a
3-compartmental model for dialysis which will be
able to predict some observable measurement for
the patient. Compare that with the actual
patient data (provided by the dialysis machine)
and fine tune the generic model to fit the actual
patients data set.
Dialysis Extra Inter Liquid Blood
Cellular Cellular Fluid Fluid
(ECF) (ICF)
26GENERAL PROBLEM
DATA set
THEORY
- Various ways to make the Data set and the Theory
consistent - Modify data only
- Modify theory only
- Modify data theory
Cooperative KA Kn Refinement Systems Theory
Refinement References www.abdn.ac.uk/Sleeman/ In
itiaal KRUST Ref S CRAW D. SLEEMAN, 1990.
Automating the Refinement of Knowledge-Based
Systems. In Proceedings of ECCAI-90. Luigia
Aiello (Ed). London Pitman, pp 167-172.
27INFERRING NEW KNOWLEDGE
- NUMERICAL EXTRAPOLATION
- (Assume function is continuous well behaved)
- SYMBOLIC / LOGIC INFERENCE
- Likes (Mary, apples) AND Likes (Mary, oranges)
- THEN
- Likes (Mary, fruit)
- NB uses Background Knowledge
- Likes (Mary, Golden-Delicious) THEN likes (Mary,
apples) - This area of AI is known as Machine Learning.
(Some Machine Learning techniques are used in
Co-operative Kn Acquisition Kn Refinement
systems)
28CHECK LIST FOR BIOLOGY PRESENTERS (to be extended
refined)
- What is the phenomena being studied? Describe it
in terms of processes etc - Are you trying to extend an existing model OR
develop a new one? - On the other hand may be you are planning to
analyse an experimental data set (blind or
semi-independently) and then see how it fits with
an existing model? - What is the nature of the actual data set(s)
which you have? - Will the result of your analysis be expressed
as - - numerical equations
- - qualitative relationships
- - symbolic information
- - a combination of the above
29OTHER COMPUTING SCIENCE TOPICS which the SYSTEMS
BIOLOGY group might like to explore
- Qualitative Reasoning
- Machine Learning / Relational Learning /
Data Mining - Scientific Discovery (i.e. system which
discover scientific laws e.g. the BACON
system) - Analyses of Real-Time data sets
- Cognitive Studies of Scientific Reasoning
(individuals and at the group / lab level)
30QUESTIONS and DISCUSSION
31THREE STAGES TO COMBINING EVIDENCE
- Multiple conclusion combination
- suppose a fact is known with a certainty CFp and
another rule brings evidence CFn. - How the combined CF is calculated depends on the
signs of CFp and CFn - If CFp gt 0 CFn gt 0 then CF CFp CFn CFp
CFn - If CFp lt 0 CFn lt 0 then CF CFpCFn CFpxCFn
- OTHERWISE CF CFpCFn/(1-min(abs(CFp),
abs(CFn)))