CIS 830 Advanced Topics in AI Lecture 45 of 45

About This Presentation

Title:

CIS 830 Advanced Topics in AI Lecture 45 of 45

Description:

Applications: control systems, robotics, game playing ... Applications: pattern recognition, time series prediction ... up to massive real-world data sets (e.g. ... – PowerPoint PPT presentation

Number of Views:138

Avg rating:3.0/5.0

Slides: 29

Provided by: willia48

Category:

more less

Transcript and Presenter's Notes

Title: CIS 830 Advanced Topics in AI Lecture 45 of 45

1
Lecture 45
Course Review and Future Research Directions
Friday, May 5, 2000 William H. Hsu Department of
Computing and Information Sciences,
KSU http//www.cis.ksu.edu/bhsu Readings Chapte
rs 1-10, 13, Mitchell Chapters 14-21, Russell and
Norvig
2
Main ThemesArtificial Intelligence and KDD

Analytical Learning Combining Symbolic and
Numerical AI
Inductive learning
Role of knowledge and deduction in integrated
inductive and analytical learning
Artificial Neural Networks (ANNs) for KDD
Common neural representations current
limitations
Incorporating knowledge into ANN learning
Uncertain Reasoning in Decision Support
Probabilistic knowledge representation
Bayesian knowledge and data engineering (KDE)
elicitation, causality
Data mining KDD applications
Role of causality and explanations in KDD
Framework for data mining wrappers for
performance enhancement
Genetic Algorithms (GAs) for KDD
Evolutionary algorithms (GAs, GP) as optimization
wrappers
Introduction to classifier systems

3
Class 0A Brief Overview of Machine Learning

Overview Topics, Applications, Motivation
Learning Improving with Experience at Some Task
Improve over task T,
with respect to performance measure P,
based on experience E.
Brief Tour of Machine Learning
A case study
A taxonomy of learning
Intelligent systems engineering specification of
learning problems
Issues in Machine Learning
Design choices
The performance element intelligent systems
Some Applications of Learning
Database mining, reasoning (inference/decision
support), acting
Industrial usage of intelligent systems

4
Class 1Integrating Analytical and Inductive
Learning

Learning Specification (Inductive, Analytical)
Instances X, target function (concept) c X ? H,
hypothesis space H
Training examples D positive, negative examples
of target function c
Analytical learning also given domain theory T
for explaining examples
Domain Theories
Expressed in formal language propositional
logic, predicate logic
Set of assertions (e.g., well-formed formulae)
for reasoning about domain
Expresses constraints over relations (predicates)
within model
Example Ancestor (x, y) ? Parent (x, z) ?
Ancestor (z, y).
Determine
Hypothesis h ? H such that h(x) c(x) for all x
? D
Such h are consistent with training data and
domain theory T
Integration Approaches
Explanation (proof and derivation)-based
learning EBL
Pseudo-experience incorporating knowledge of
environment, actuators
Top-down decomposition programmatic (procedural)
knowledge, advice

5
Classes 2-3Explanation-Based Neural Networks

Paper
Topic Explanation-Based and Inductive Learning
in ANNs
Title Integrating Inductive Neural Network
Learning and EBL
Authors Thrun and Mitchell
Presenter William Hsu
Key Strengths
Idea (state, action)-to-state mappings as steps
in generalizable proof (explanation) for observed
episode
Generalizable approach (significant for RL, other
learning-to-predict inducers)
Key Weaknesses
Other numerical learning models (HMMs, DBNs) may
be more suited to EBG
Tradeoff domain theory of EBNN lacks semantic
clarity of symbolic EBL
Future Research Issues
How to get the best of both worlds (clear DT,
ability to generate explanations)?
Applications to explanation in commercial,
military, legal decision support
See work by Thrun, Mitchell, Shavlik, Towell,
Pearl, Heckerman

6
Classes 4-5Phantom Induction

Paper
Topic Distal Supervised Learning and Phantom
Induction
Title Iterated Phantom Induction a Little
Knowledge Can Go a Long Way
Authors Brodie and Dejong
Presenter Steve Gustafson
Key Strengths
Idea apply knowledge to generate
(pseudo-experiential) training data
Speedup learning curve significantly shortened
with respect to RL by application of small
amount of knowledge
Key Weaknesses
Havent yet seen how to produce plausible,
comprehensible explanations
How much knowledge is a small amount? (How to
measure?)
Future Research Issues
Control, planning domains similar (but not
identical) to robot games
Applications adaptive (e.g., ANN, BBN, MDP, GA)
agent control, planning
See work by Brodie, Dejong, Rumelhart,
McClelland, Sutton, Barto

7
Classes 6-7Top-Down Hybrid Learning

Paper
Topic Learning with Prior Knowledge
Title A Divide-and-Conquer Approach to Learning
from Prior Knowledge
Authors Chown and Dietterich
Presenter Aiming Wu
Key Strengths
Idea apply programmatic (procedural) knowledge
to select training data
Uses simulation to boost inductive learning
performance (cf. model checking)
Divide-and-conquer approach (multiple experts)
Key Weaknesses
Doesnt illustrate form, structure of
programmatic knowledge clearly
Doesnt systematize and formalize model checking
/ simulation approach
Future Research Issues
Model checking and simulation-driven hybrid
learning
Applications consensus under uncertainty,
simulation-based optimization
See work by Dietterich, Frawley, Mitchell,
Darwiche, Pearl

8
Classes 8-9Learning Using Prior Knowledge

Paper
Topic Refinement of Approximate Domain-Theoretic
Knowledge
Title Refinement of Approximate Domain Theories
by Knowledge-Based Neural Networks
Authors Towell, Shavlik, and Noordewier
Presenter Li-Jun Wang
Key Strengths
Idea build relational explanations compile into
ANN representation
Applies structural, functional, constraint-based
knowledge
Uses ANN to further refine domain theory
Key Weaknesses
Cant get refined domain theory back!
Explanations also no longer clear after
compilation (transformation) process
Future Research Issues
How to retain semantic clarity of explanations,
DT, knowledge representation
Applications intelligent filters (e.g., fraud
detection), decision support
See work by Shavlik, Towell, Maclin, Sun,
Schwalb, Heckerman

9
Class 10Introduction to Artificial Neural
Networks

Architectures
Nonlinear transfer functions
Multi-layer networks of nonlinear units (sigmoid,
hyperbolic tangent)
Hidden layer representations
Backpropagation of Error
The backpropagation algorithm
Relation to error gradient function for nonlinear
units
Derivation of training rule for feedfoward
multi-layer networks
Training issues local optima, overfitting
References Chapter 4, Mitchell Chapter 4,
Bishop Rumelhart et al
Research Issues How to
Learn from observation, rewards and penalties,
and advice
Distribute rewards and penalties through learning
model, over time
Generate pseudo-experiential training instances
in pattern recognition
Partition learning problems on the fly, via
(mixture) parameter estimation

10
Classes 11-12Reinforcement Learning and Advice

Paper
Topic Knowledge and Reinforcement Learning in
Intelligent Agents
Title Incorporating Advice into Agents that
Learn from Reinforcements
Authors Maclin and Shavlik
Presenter Kiranmai Nandivada
Key Strengths
Idea compile advice into ANN representation for
RL
Advice expressed in terms of constraint-based
knowledge
Like KBANN, achieves knowledge refinement through
ANN training
Key Weaknesses
Like KBANN, lose semantic clarity of advice,
policy, explanations
How to evaluate refinement effectively?
Quantitatively? Logically?
Future Research Issues
How to retain semantic clarity of explanations,
DT, knowledge representation
Applications intelligent agents, web mining
(spiders, search engines), games
See work by Shavlik, Maclin, Stone, Veloso, Sun,
Sutton, Pearl, Kuipers

11
Classes 13-14Reinforcement Learning Over Time

Paper
Topic Temporal-Difference Reinforcement Learning
Title TD Models Modeling the World at a Mixture
of Time Scales
Author Sutton
Presenter Vrushali Koranne
Key Strengths
Idea combine state-action evaluation function
(Q) estimates over multiple time steps of
lookahead
Effective temporal credit assignment (TCA)
Biologically plausible (simulates TCA aspects of
dopaminergic system)
Key Weaknesses
TCA methodology is effective but semantically
hard to comprehend
Slow convergence can knowledge help? How will
we judge?
Future Research Issues
How to retain clarity, improve convergence speed,
of multi-time RL models
Applications control systems, robotics, game
playing
See work by Sutton, Barto, Mitchell, Kaelbling,
Smyth, Shafer, Goldberg

12
Classes 15-16Generative Neural Models

Paper
Topic Pattern Recognition using Unsupervised
ANNs
Title The Wake-Sleep Algorithm for Unsupervised
Neural Networks
Authors Hinton, Dayan, Frey, and Neal
Presenter Prasanna Jayaraman
Key Strengths
Idea use 2-phase algorithm to generate training
instances (dream stage) and maximize
conditional probability of data given model
(wake stage)
Compare expectation-maximization (EM) algorithm
Good for image recognition
Key Weaknesses
Not all data admits this approach (small samples,
ill-defined features)
Not immediately clear how to use for
problem-solving performance elements
Future Research Issues
Studying information theoretic properties of
Helmholtz machine
Applications image/speech/signal recognition,
document categorization
See work by Hinton, Dayan, Frey, Neal,
Kirkpatrick, Hajek, Gharahmani

13
Classes 17-18Modularity in Neural Systems

Paper
Topic Combining Models using Modular ANNs
Title Modular and Hierarchical Learning Systems
Authors Jordan and Jacobs
Presenter Afrand Agah
Key Strengths
Idea use interleaved EM update steps to update
expert, gating components
Effect forces specialization among ANN
components (GLIMs) boosts performance of single
experts very fast convergence in some cases
Explores modularity in neural systems (artificial
and biological)
Key Weaknesses
Often cannot achieve higher accuracy than ML,
MAP, Bayes optimal estimation
Doesnt provide experts that specialize in
spatial, temporal pattern recognition
Future Research Issues
Constructing, selecting mixtures of other ANN
components (not just GLIMs)
Applications pattern recognition, time series
prediction
See work by Jordan, Jacobs, Nowlan, Hinton,
Barto, Jaakola, Hsu

14
Class 19Introduction to Probabilistic Reasoning

Architectures
Bayesian (Belief) Networks
Tree structured, polytrees
General
Decision networks
Temporal variants (beyond scope of this course)
Parameter Estimation
Maximum likelihood (MLE), maximum a posteriori
(MAP)
Bayes optimal classification, Bayesian learning
References Chapter 6, Mitchell Chapters 14-15,
19, Russell and Norvig
Research Issues How to
Learn from observation, rewards and penalties,
and advice
Distribute rewards and penalties through learning
model, over time
Generate pseudo-experiential training instances
in pattern recognition
Partition learning problems on the fly, via
(mixture) parameter estimation

15
Classes 20-21Approaches to Uncertain Reasoning

Paper
Topic The Case for Probability
Title In Defense of Probability
Author Cheeseman
Presenter Pallavi Paranjape
Key Strengths
Idea probability is mathematically sound way to
represent uncertainty
Views of probability considered objectivist,
frequentist, logicist, subjectivist
Argument made for meta-subjectivist belief
measure concept of probability
Key Weaknesses
Highly dogmatic view without concrete
justification for all assertions
Does not quantitatively, empirically compare
Bayesian, non-Bayesian methods
Future Research Issues
Integrating symbolic and numerical (statistical)
models of uncertainty
Applications uncertain reasoning, pattern
recognition, learning
See work by Cheeseman, Cox, Good, Pearl, Zadeh,
Dempster, Shafer

16
Classes 22-23Learning Bayesian Network Structure

Paper
Topic Learning Bayesian Networks from Data
Title Learning Bayesian Network Structure from
Massive Datasets
Authors Friedman, Pe'er, Nachman
Presenter Jincheng Gao
Key Strengths
Idea can use graph constraints, scoring
functions to select candidate parents in
constructing directed graph model of probability
(BBN)
Tabu search, greedy score-based methods (K2),
etc. also considered
Key Weaknesses
Optimal Bayesian network structure learning still
intractable for conventional (single-instruction
sequential) architectures
More empirical comparison among alternative
methods warranted
Future Research Issues
Scaling up to massive real-world data sets (e.g.,
medical, agricultural, DSS)
Applications diagnosis, troubleshooting, user
modeling, intelligent HCI
See work by Friedman, Goldszmidt, Heckerman,
Cooper, Beinlich, Koller

17
Classes 24-25Bayesian Networks for User Modeling

Paper
Topic Decision Support Systems and Bayesian User
Modeling
Title The Lumiere Project Bayesian User
Modeling for Inferring the Goals and Needs of
Software Users
Authors Horvitz, Breese, Heckerman, Hovel,
Rommelse
Presenter Yuhui (Cathy) Liu
Key Strengths
Idea BBN model is developed from user logs, used
to infer mode of usage
Can infer goals, skill level of user
Key Weaknesses
Need high accuracy in inferring goals to deliver
meaningful content
May be better to use next-generation search
engine (more interactivity, less passive
monitoring)
Future Research Issues
Designing better interactive user modeling
Applications clickstream monitoring, e-commerce,
web search, help
See work by Horvitz, Breese, Heckerman, Lee,
Huang

18
Classes 26-27Causal Reasoning

Paper
Topic KDD and Causal Reasoning
Title Symbolic Causal Networks for Reasoning
about Actions and Plans
Authors Darwiche and Pearl
Presenter Yue Jiao
Key Strengths
Idea use BBN to represent symbolic constraint
knowledge
Can use to generate mechanistic explanations
Model actions
Model sequences of actions (plans)
Key Weaknesses
Integrative methods (numerical, symbolic BBNs)
still need exploration
Unclear how to incorporate methods for learning
to plan
Future Research Issues
Reasoning about systems
Applications uncertain reasoning, pattern
recognition, learning
See work by Horvitz, Breese, Heckerman, Lee,
Huang

19
Classes 28-29Knowledge Discovery from
Scientific Data

Paper
Topic KDD for Scientific Data Analysis
Title KDD for Science Data Analysis Issues and
Examples
Authors Fayyad, Haussler, and Stolorz
Presenter Arulkumar Elumalai
Key Strengths
Idea investigate how and whether KDD techniques
(OLAP, learning) scale up to huge data sets
Answer it depends on computational
complexity, many other factors
Key Weaknesses
Havent developed clear theory yet of how to
assess how much data is really needed
No technical treatment or characterization of
data cleaning
Future Research Issues
Data cleaning (aka data cleansing), pre- and
post-processing (OLAP)
Applications intelligent databases,
visualization, high-performance CSE
See work by Fayyad, Smyth, Uthurusamy, Haussler,
Foster

20
Classes 30-31Relevance Determination

Paper
Topic Relevance Determination in KDD
Title Irrelevant Features and the Subset
Selection Problem
Authors John, Kohavi, and Pfleger
Presenter DingBing Yang
Key Strengths
Idea cast problem of choosing relevant
attributes (given top-level learning problem
specification) as search
Effective state space search (A/A-based)
approach demonstrated
Key Weaknesses
May not have good enough heuristics!
Can either develop them (via information theory)
or use MCMC methods
Future Research Issues
Selecting relevant data channels from continuous
sources (e.g., sensors)
Applications bioinformatics (genomics,
proteomics, etc.), prognostics
See work by Kohavi, John, Rendell, Donoho, Hsu,
Provost

21
Classes 32-33Learning for Text Document
Categorization

Paper
Topic Text Documents and Information Retrieval
(IR)
Title Hierarchically Classifying Documents using
Very Few Words
Authors Koller and Sahami
Presenter Yan Song
Key Strengths
Idea use rank-frequency scoring methods to find
keywords that make a difference
Break into meaningful hierarchy
Key Weaknesses
Sometimes need to derive semantically meaningful
cluster labels
How to integrate this method with dynamic cluster
segmentation, labeling?
Future Research Issues
Bayesian architectures using non-Bayesian
learning algorithms (e.g., GAs)
Applications digital libraries (hierarchical,
distributed dynamic indexing), intelligent search
engines, intelligent displays (and help indices)
See work by Koller, Sahami, Roth, Charniak,
Brill, Yarowsky

22
Classes 34-35Web Mining

Paper
Topic KDD and The Web
Title Learning to Extract Symbolic Knowledge
from the World Wide Web
Authors Craven, DiPasquo, Freitag, McCallum,
Mitchell, Nigam, and Slattery
Presenter Ping Zou
Key Strengths
Idea build probabilistic model of web documents
using keywords that matter
Use probabilistic model to represent knowledge
for indexing into web database
Key Weaknesses
How to account for concept drift?
How to explain and express constraints (e.g.,
proper nouns that are person names dont
matter)? Not considered here
Future Research Issues
Using natural language processing (NLP), image /
audio / signal processing
Applications searchable hypermedia, digital
libraries, spiders, other agents
See work by McCallum, Mitchell, Roth, Sahami,
Pratt, Lee

23
Class 36Introduction to Evolutionary Computation

Architectures
Genetic algorithms (GAs), genetic programming
(GP), genetic wrappers
Simple vs. parameterless GAs
Issues
Loss of diversity
Consequence collapse of Pareto front
Solutions niching (sharing, preselection,
crowding)
Parameterless GAs
Other issues (not covered) genetic drift,
population sizing, etc.
References Chapter 9, Mitchell Chapters 1-6,
Goldberg Chapter 1-5, Koza
Research Issues How to
Design GAs based on credit assignment system (in
performance element)
Build hybrid analytical / inductive learning GP
systems
Use GAs to perform relevance determination in KDD
Control diversity in GA solutions for hard
optimization problems

24
Class 37-38Genetic Algorithms and Classifier
Systems

Paper
Topic Classifier Systems and Inductive Learning
Title Generalization in the XCS Classifier
System
Author Wilson
Presenter Elizabeth Loza-Garay
Key Strengths
Idea incorporate performance element (classifier
system) into GA design
Solid theoretical foundation advanced building
block (aka schema) theory
Can use to engineer more efficient GA model, tune
parameters
Key Weaknesses
Need to progress from toy problems (e.g., MUX
learning) to real-world ones
Need to investigate scaling up of GA principles
(e.g., building block mixing)
Future Research Issues
Building block scalability in classifier systems
Applications reinforcement learning, mobile
robotics, other animats, a-life
See work by Wilson, Goldberg, Holland, Booker

25
Class 39-40Knowledge-Based Genetic Programming

Paper
Topic Genetic Programming and Multistrategy
Learning
Title Genetic Programming and Deductive-Inductive
Learning A Multistrategy Approach
Authors Aler, Borrajo, and Isasi
Presenter Yuhong Cheng
Key Strengths
Idea use knowledge-based system to calibrate
starting state of MCMC optimization system (here,
GP)
Can incorporate knowledge (as in CIS830 Part 1 of
5)
Key Weaknesses
Generalizability of HAMLET population seeding
method not well established
General-purpose problem solving systems can
become Rube Goldberg-ian
Future Research Issues
Using multistrategy GP systems to provide
knowledge-based decision support
Applications logistics (military, industrial,
commercial), other problem solving
See work by Aler, Borrajo, Isasi, Carbonell,
Minton, Koza, Veloso

26
Class 41-42Genetic Wrappers for Inductive
Learning

Paper
Topic Genetic Wrappers for KDD Performance
Enhancement
Title Simultaneous Feature Extraction and
Selection Using a Masking Genetic Algorithm
Authors Raymer, Punch, Goodman, Sanschagrin,
Kuhn
Presenter Karthik K. Krishnakumar
Key Strengths
Idea use GA to empirically (statistically)
validate inducer
Can use to select, synthesize attributes (aka
features)
Can also use to tune other GA parameters (hence
wrapper)
Key Weaknesses
Systematic experimental studies of genetic
wrappers have not yet been done
Wrappers dont yet take performance element into
explicit account
Future Research Issues
Improving supervised learning inducers (e.g., in
MLC)
Applications better combiners feature subset
selection, construction
See work by Raymer, Punch, Cherkauer, Shavlik,
Freitas, Hsu, Cantu-Paz

27
Class 43-44Genetic Algorithms for Optimization

Paper
Topic Genetic Optimization and Decision Support
Title A Niched Pareto Optimal Genetic Algorithm
for Multiobjective Optimization
Authors Horn, Nafpliotis, and Goldberg
Presenter Li Lian
Key Strengths
Idea control representation of neighborhoods
Pareto optimal front by niching
Gives abstract and concrete case studies of
niching (sharing) effects
Key Weaknesses
Need systematic exploration, characterization of
sweet spot
Shows static comparisons, not small-multiple
visualizations that led to them
Future Research Issues
Biologically (ecologically) plausible models
Applications engineering (ag / bio, civil,
computational, environmental, industrial,
mechanical, nuclear) optimization computational
life sciences
See work by Goldberg, Horn, Schwefel, Punch,
Minsker, Kargupta