Title: CIS 830 Advanced Topics in AI Lecture 45 of 45
1Lecture 45
Course Review and Future Research Directions
Friday, May 5, 2000 William H. Hsu Department of
Computing and Information Sciences,
KSU http//www.cis.ksu.edu/bhsu Readings Chapte
rs 1-10, 13, Mitchell Chapters 14-21, Russell and
Norvig
2Main ThemesArtificial Intelligence and KDD
- Analytical Learning Combining Symbolic and
Numerical AI - Inductive learning
- Role of knowledge and deduction in integrated
inductive and analytical learning - Artificial Neural Networks (ANNs) for KDD
- Common neural representations current
limitations - Incorporating knowledge into ANN learning
- Uncertain Reasoning in Decision Support
- Probabilistic knowledge representation
- Bayesian knowledge and data engineering (KDE)
elicitation, causality - Data mining KDD applications
- Role of causality and explanations in KDD
- Framework for data mining wrappers for
performance enhancement - Genetic Algorithms (GAs) for KDD
- Evolutionary algorithms (GAs, GP) as optimization
wrappers - Introduction to classifier systems
3Class 0A Brief Overview of Machine Learning
- Overview Topics, Applications, Motivation
- Learning Improving with Experience at Some Task
- Improve over task T,
- with respect to performance measure P,
- based on experience E.
- Brief Tour of Machine Learning
- A case study
- A taxonomy of learning
- Intelligent systems engineering specification of
learning problems - Issues in Machine Learning
- Design choices
- The performance element intelligent systems
- Some Applications of Learning
- Database mining, reasoning (inference/decision
support), acting - Industrial usage of intelligent systems
4Class 1Integrating Analytical and Inductive
Learning
- Learning Specification (Inductive, Analytical)
- Instances X, target function (concept) c X ? H,
hypothesis space H - Training examples D positive, negative examples
of target function c - Analytical learning also given domain theory T
for explaining examples - Domain Theories
- Expressed in formal language propositional
logic, predicate logic - Set of assertions (e.g., well-formed formulae)
for reasoning about domain - Expresses constraints over relations (predicates)
within model - Example Ancestor (x, y) ? Parent (x, z) ?
Ancestor (z, y). - Determine
- Hypothesis h ? H such that h(x) c(x) for all x
? D - Such h are consistent with training data and
domain theory T - Integration Approaches
- Explanation (proof and derivation)-based
learning EBL - Pseudo-experience incorporating knowledge of
environment, actuators - Top-down decomposition programmatic (procedural)
knowledge, advice
5Classes 2-3Explanation-Based Neural Networks
- Paper
- Topic Explanation-Based and Inductive Learning
in ANNs - Title Integrating Inductive Neural Network
Learning and EBL - Authors Thrun and Mitchell
- Presenter William Hsu
- Key Strengths
- Idea (state, action)-to-state mappings as steps
in generalizable proof (explanation) for observed
episode - Generalizable approach (significant for RL, other
learning-to-predict inducers) - Key Weaknesses
- Other numerical learning models (HMMs, DBNs) may
be more suited to EBG - Tradeoff domain theory of EBNN lacks semantic
clarity of symbolic EBL - Future Research Issues
- How to get the best of both worlds (clear DT,
ability to generate explanations)? - Applications to explanation in commercial,
military, legal decision support - See work by Thrun, Mitchell, Shavlik, Towell,
Pearl, Heckerman
6Classes 4-5Phantom Induction
- Paper
- Topic Distal Supervised Learning and Phantom
Induction - Title Iterated Phantom Induction a Little
Knowledge Can Go a Long Way - Authors Brodie and Dejong
- Presenter Steve Gustafson
- Key Strengths
- Idea apply knowledge to generate
(pseudo-experiential) training data - Speedup learning curve significantly shortened
with respect to RL by application of small
amount of knowledge - Key Weaknesses
- Havent yet seen how to produce plausible,
comprehensible explanations - How much knowledge is a small amount? (How to
measure?) - Future Research Issues
- Control, planning domains similar (but not
identical) to robot games - Applications adaptive (e.g., ANN, BBN, MDP, GA)
agent control, planning - See work by Brodie, Dejong, Rumelhart,
McClelland, Sutton, Barto
7Classes 6-7Top-Down Hybrid Learning
- Paper
- Topic Learning with Prior Knowledge
- Title A Divide-and-Conquer Approach to Learning
from Prior Knowledge - Authors Chown and Dietterich
- Presenter Aiming Wu
- Key Strengths
- Idea apply programmatic (procedural) knowledge
to select training data - Uses simulation to boost inductive learning
performance (cf. model checking) - Divide-and-conquer approach (multiple experts)
- Key Weaknesses
- Doesnt illustrate form, structure of
programmatic knowledge clearly - Doesnt systematize and formalize model checking
/ simulation approach - Future Research Issues
- Model checking and simulation-driven hybrid
learning - Applications consensus under uncertainty,
simulation-based optimization - See work by Dietterich, Frawley, Mitchell,
Darwiche, Pearl
8Classes 8-9Learning Using Prior Knowledge
- Paper
- Topic Refinement of Approximate Domain-Theoretic
Knowledge - Title Refinement of Approximate Domain Theories
by Knowledge-Based Neural Networks - Authors Towell, Shavlik, and Noordewier
- Presenter Li-Jun Wang
- Key Strengths
- Idea build relational explanations compile into
ANN representation - Applies structural, functional, constraint-based
knowledge - Uses ANN to further refine domain theory
- Key Weaknesses
- Cant get refined domain theory back!
- Explanations also no longer clear after
compilation (transformation) process - Future Research Issues
- How to retain semantic clarity of explanations,
DT, knowledge representation - Applications intelligent filters (e.g., fraud
detection), decision support - See work by Shavlik, Towell, Maclin, Sun,
Schwalb, Heckerman
9Class 10Introduction to Artificial Neural
Networks
- Architectures
- Nonlinear transfer functions
- Multi-layer networks of nonlinear units (sigmoid,
hyperbolic tangent) - Hidden layer representations
- Backpropagation of Error
- The backpropagation algorithm
- Relation to error gradient function for nonlinear
units - Derivation of training rule for feedfoward
multi-layer networks - Training issues local optima, overfitting
- References Chapter 4, Mitchell Chapter 4,
Bishop Rumelhart et al - Research Issues How to
- Learn from observation, rewards and penalties,
and advice - Distribute rewards and penalties through learning
model, over time - Generate pseudo-experiential training instances
in pattern recognition - Partition learning problems on the fly, via
(mixture) parameter estimation
10Classes 11-12Reinforcement Learning and Advice
- Paper
- Topic Knowledge and Reinforcement Learning in
Intelligent Agents - Title Incorporating Advice into Agents that
Learn from Reinforcements - Authors Maclin and Shavlik
- Presenter Kiranmai Nandivada
- Key Strengths
- Idea compile advice into ANN representation for
RL - Advice expressed in terms of constraint-based
knowledge - Like KBANN, achieves knowledge refinement through
ANN training - Key Weaknesses
- Like KBANN, lose semantic clarity of advice,
policy, explanations - How to evaluate refinement effectively?
Quantitatively? Logically? - Future Research Issues
- How to retain semantic clarity of explanations,
DT, knowledge representation - Applications intelligent agents, web mining
(spiders, search engines), games - See work by Shavlik, Maclin, Stone, Veloso, Sun,
Sutton, Pearl, Kuipers
11Classes 13-14Reinforcement Learning Over Time
- Paper
- Topic Temporal-Difference Reinforcement Learning
- Title TD Models Modeling the World at a Mixture
of Time Scales - Author Sutton
- Presenter Vrushali Koranne
- Key Strengths
- Idea combine state-action evaluation function
(Q) estimates over multiple time steps of
lookahead - Effective temporal credit assignment (TCA)
- Biologically plausible (simulates TCA aspects of
dopaminergic system) - Key Weaknesses
- TCA methodology is effective but semantically
hard to comprehend - Slow convergence can knowledge help? How will
we judge? - Future Research Issues
- How to retain clarity, improve convergence speed,
of multi-time RL models - Applications control systems, robotics, game
playing - See work by Sutton, Barto, Mitchell, Kaelbling,
Smyth, Shafer, Goldberg
12Classes 15-16Generative Neural Models
- Paper
- Topic Pattern Recognition using Unsupervised
ANNs - Title The Wake-Sleep Algorithm for Unsupervised
Neural Networks - Authors Hinton, Dayan, Frey, and Neal
- Presenter Prasanna Jayaraman
- Key Strengths
- Idea use 2-phase algorithm to generate training
instances (dream stage) and maximize
conditional probability of data given model
(wake stage) - Compare expectation-maximization (EM) algorithm
- Good for image recognition
- Key Weaknesses
- Not all data admits this approach (small samples,
ill-defined features) - Not immediately clear how to use for
problem-solving performance elements - Future Research Issues
- Studying information theoretic properties of
Helmholtz machine - Applications image/speech/signal recognition,
document categorization - See work by Hinton, Dayan, Frey, Neal,
Kirkpatrick, Hajek, Gharahmani
13Classes 17-18Modularity in Neural Systems
- Paper
- Topic Combining Models using Modular ANNs
- Title Modular and Hierarchical Learning Systems
- Authors Jordan and Jacobs
- Presenter Afrand Agah
- Key Strengths
- Idea use interleaved EM update steps to update
expert, gating components - Effect forces specialization among ANN
components (GLIMs) boosts performance of single
experts very fast convergence in some cases - Explores modularity in neural systems (artificial
and biological) - Key Weaknesses
- Often cannot achieve higher accuracy than ML,
MAP, Bayes optimal estimation - Doesnt provide experts that specialize in
spatial, temporal pattern recognition - Future Research Issues
- Constructing, selecting mixtures of other ANN
components (not just GLIMs) - Applications pattern recognition, time series
prediction - See work by Jordan, Jacobs, Nowlan, Hinton,
Barto, Jaakola, Hsu
14Class 19Introduction to Probabilistic Reasoning
- Architectures
- Bayesian (Belief) Networks
- Tree structured, polytrees
- General
- Decision networks
- Temporal variants (beyond scope of this course)
- Parameter Estimation
- Maximum likelihood (MLE), maximum a posteriori
(MAP) - Bayes optimal classification, Bayesian learning
- References Chapter 6, Mitchell Chapters 14-15,
19, Russell and Norvig - Research Issues How to
- Learn from observation, rewards and penalties,
and advice - Distribute rewards and penalties through learning
model, over time - Generate pseudo-experiential training instances
in pattern recognition - Partition learning problems on the fly, via
(mixture) parameter estimation
15Classes 20-21Approaches to Uncertain Reasoning
- Paper
- Topic The Case for Probability
- Title In Defense of Probability
- Author Cheeseman
- Presenter Pallavi Paranjape
- Key Strengths
- Idea probability is mathematically sound way to
represent uncertainty - Views of probability considered objectivist,
frequentist, logicist, subjectivist - Argument made for meta-subjectivist belief
measure concept of probability - Key Weaknesses
- Highly dogmatic view without concrete
justification for all assertions - Does not quantitatively, empirically compare
Bayesian, non-Bayesian methods - Future Research Issues
- Integrating symbolic and numerical (statistical)
models of uncertainty - Applications uncertain reasoning, pattern
recognition, learning - See work by Cheeseman, Cox, Good, Pearl, Zadeh,
Dempster, Shafer
16Classes 22-23Learning Bayesian Network Structure
- Paper
- Topic Learning Bayesian Networks from Data
- Title Learning Bayesian Network Structure from
Massive Datasets - Authors Friedman, Pe'er, Nachman
- Presenter Jincheng Gao
- Key Strengths
- Idea can use graph constraints, scoring
functions to select candidate parents in
constructing directed graph model of probability
(BBN) - Tabu search, greedy score-based methods (K2),
etc. also considered - Key Weaknesses
- Optimal Bayesian network structure learning still
intractable for conventional (single-instruction
sequential) architectures - More empirical comparison among alternative
methods warranted - Future Research Issues
- Scaling up to massive real-world data sets (e.g.,
medical, agricultural, DSS) - Applications diagnosis, troubleshooting, user
modeling, intelligent HCI - See work by Friedman, Goldszmidt, Heckerman,
Cooper, Beinlich, Koller
17Classes 24-25Bayesian Networks for User Modeling
- Paper
- Topic Decision Support Systems and Bayesian User
Modeling - Title The Lumiere Project Bayesian User
Modeling for Inferring the Goals and Needs of
Software Users - Authors Horvitz, Breese, Heckerman, Hovel,
Rommelse - Presenter Yuhui (Cathy) Liu
- Key Strengths
- Idea BBN model is developed from user logs, used
to infer mode of usage - Can infer goals, skill level of user
- Key Weaknesses
- Need high accuracy in inferring goals to deliver
meaningful content - May be better to use next-generation search
engine (more interactivity, less passive
monitoring) - Future Research Issues
- Designing better interactive user modeling
- Applications clickstream monitoring, e-commerce,
web search, help - See work by Horvitz, Breese, Heckerman, Lee,
Huang
18Classes 26-27Causal Reasoning
- Paper
- Topic KDD and Causal Reasoning
- Title Symbolic Causal Networks for Reasoning
about Actions and Plans - Authors Darwiche and Pearl
- Presenter Yue Jiao
- Key Strengths
- Idea use BBN to represent symbolic constraint
knowledge - Can use to generate mechanistic explanations
- Model actions
- Model sequences of actions (plans)
- Key Weaknesses
- Integrative methods (numerical, symbolic BBNs)
still need exploration - Unclear how to incorporate methods for learning
to plan - Future Research Issues
- Reasoning about systems
- Applications uncertain reasoning, pattern
recognition, learning - See work by Horvitz, Breese, Heckerman, Lee,
Huang
19Classes 28-29Knowledge Discovery from
Scientific Data
- Paper
- Topic KDD for Scientific Data Analysis
- Title KDD for Science Data Analysis Issues and
Examples - Authors Fayyad, Haussler, and Stolorz
- Presenter Arulkumar Elumalai
- Key Strengths
- Idea investigate how and whether KDD techniques
(OLAP, learning) scale up to huge data sets - Answer it depends on computational
complexity, many other factors - Key Weaknesses
- Havent developed clear theory yet of how to
assess how much data is really needed - No technical treatment or characterization of
data cleaning - Future Research Issues
- Data cleaning (aka data cleansing), pre- and
post-processing (OLAP) - Applications intelligent databases,
visualization, high-performance CSE - See work by Fayyad, Smyth, Uthurusamy, Haussler,
Foster
20Classes 30-31Relevance Determination
- Paper
- Topic Relevance Determination in KDD
- Title Irrelevant Features and the Subset
Selection Problem - Authors John, Kohavi, and Pfleger
- Presenter DingBing Yang
- Key Strengths
- Idea cast problem of choosing relevant
attributes (given top-level learning problem
specification) as search - Effective state space search (A/A-based)
approach demonstrated - Key Weaknesses
- May not have good enough heuristics!
- Can either develop them (via information theory)
or use MCMC methods - Future Research Issues
- Selecting relevant data channels from continuous
sources (e.g., sensors) - Applications bioinformatics (genomics,
proteomics, etc.), prognostics - See work by Kohavi, John, Rendell, Donoho, Hsu,
Provost
21Classes 32-33Learning for Text Document
Categorization
- Paper
- Topic Text Documents and Information Retrieval
(IR) - Title Hierarchically Classifying Documents using
Very Few Words - Authors Koller and Sahami
- Presenter Yan Song
- Key Strengths
- Idea use rank-frequency scoring methods to find
keywords that make a difference - Break into meaningful hierarchy
- Key Weaknesses
- Sometimes need to derive semantically meaningful
cluster labels - How to integrate this method with dynamic cluster
segmentation, labeling? - Future Research Issues
- Bayesian architectures using non-Bayesian
learning algorithms (e.g., GAs) - Applications digital libraries (hierarchical,
distributed dynamic indexing), intelligent search
engines, intelligent displays (and help indices) - See work by Koller, Sahami, Roth, Charniak,
Brill, Yarowsky
22Classes 34-35Web Mining
- Paper
- Topic KDD and The Web
- Title Learning to Extract Symbolic Knowledge
from the World Wide Web - Authors Craven, DiPasquo, Freitag, McCallum,
Mitchell, Nigam, and Slattery - Presenter Ping Zou
- Key Strengths
- Idea build probabilistic model of web documents
using keywords that matter - Use probabilistic model to represent knowledge
for indexing into web database - Key Weaknesses
- How to account for concept drift?
- How to explain and express constraints (e.g.,
proper nouns that are person names dont
matter)? Not considered here - Future Research Issues
- Using natural language processing (NLP), image /
audio / signal processing - Applications searchable hypermedia, digital
libraries, spiders, other agents - See work by McCallum, Mitchell, Roth, Sahami,
Pratt, Lee
23Class 36Introduction to Evolutionary Computation
- Architectures
- Genetic algorithms (GAs), genetic programming
(GP), genetic wrappers - Simple vs. parameterless GAs
- Issues
- Loss of diversity
- Consequence collapse of Pareto front
- Solutions niching (sharing, preselection,
crowding) - Parameterless GAs
- Other issues (not covered) genetic drift,
population sizing, etc. - References Chapter 9, Mitchell Chapters 1-6,
Goldberg Chapter 1-5, Koza - Research Issues How to
- Design GAs based on credit assignment system (in
performance element) - Build hybrid analytical / inductive learning GP
systems - Use GAs to perform relevance determination in KDD
- Control diversity in GA solutions for hard
optimization problems
24Class 37-38Genetic Algorithms and Classifier
Systems
- Paper
- Topic Classifier Systems and Inductive Learning
- Title Generalization in the XCS Classifier
System - Author Wilson
- Presenter Elizabeth Loza-Garay
- Key Strengths
- Idea incorporate performance element (classifier
system) into GA design - Solid theoretical foundation advanced building
block (aka schema) theory - Can use to engineer more efficient GA model, tune
parameters - Key Weaknesses
- Need to progress from toy problems (e.g., MUX
learning) to real-world ones - Need to investigate scaling up of GA principles
(e.g., building block mixing) - Future Research Issues
- Building block scalability in classifier systems
- Applications reinforcement learning, mobile
robotics, other animats, a-life - See work by Wilson, Goldberg, Holland, Booker
25Class 39-40Knowledge-Based Genetic Programming
- Paper
- Topic Genetic Programming and Multistrategy
Learning - Title Genetic Programming and Deductive-Inductive
Learning A Multistrategy Approach - Authors Aler, Borrajo, and Isasi
- Presenter Yuhong Cheng
- Key Strengths
- Idea use knowledge-based system to calibrate
starting state of MCMC optimization system (here,
GP) - Can incorporate knowledge (as in CIS830 Part 1 of
5) - Key Weaknesses
- Generalizability of HAMLET population seeding
method not well established - General-purpose problem solving systems can
become Rube Goldberg-ian - Future Research Issues
- Using multistrategy GP systems to provide
knowledge-based decision support - Applications logistics (military, industrial,
commercial), other problem solving - See work by Aler, Borrajo, Isasi, Carbonell,
Minton, Koza, Veloso
26Class 41-42Genetic Wrappers for Inductive
Learning
- Paper
- Topic Genetic Wrappers for KDD Performance
Enhancement - Title Simultaneous Feature Extraction and
Selection Using a Masking Genetic Algorithm - Authors Raymer, Punch, Goodman, Sanschagrin,
Kuhn - Presenter Karthik K. Krishnakumar
- Key Strengths
- Idea use GA to empirically (statistically)
validate inducer - Can use to select, synthesize attributes (aka
features) - Can also use to tune other GA parameters (hence
wrapper) - Key Weaknesses
- Systematic experimental studies of genetic
wrappers have not yet been done - Wrappers dont yet take performance element into
explicit account - Future Research Issues
- Improving supervised learning inducers (e.g., in
MLC) - Applications better combiners feature subset
selection, construction - See work by Raymer, Punch, Cherkauer, Shavlik,
Freitas, Hsu, Cantu-Paz
27Class 43-44Genetic Algorithms for Optimization
- Paper
- Topic Genetic Optimization and Decision Support
- Title A Niched Pareto Optimal Genetic Algorithm
for Multiobjective Optimization - Authors Horn, Nafpliotis, and Goldberg
- Presenter Li Lian
- Key Strengths
- Idea control representation of neighborhoods
Pareto optimal front by niching - Gives abstract and concrete case studies of
niching (sharing) effects - Key Weaknesses
- Need systematic exploration, characterization of
sweet spot - Shows static comparisons, not small-multiple
visualizations that led to them - Future Research Issues
- Biologically (ecologically) plausible models
- Applications engineering (ag / bio, civil,
computational, environmental, industrial,
mechanical, nuclear) optimization computational
life sciences - See work by Goldberg, Horn, Schwefel, Punch,
Minsker, Kargupta
28Class 45Meta-Summary
- Data Mining / KDD Problems
- Business decision support
- Classification
- Recommender systems
- Control and policy optimization
- Data Mining / KDD Solutions Machine Learning,
Inference Techniques - Models
- Version space, decision tree, perceptron, winnow
- ANN, BBN, SOM
- Q functions
- GA/GP building blocks (schemata), GP building
blocks - Algorithms
- Candidate elimination, ID3, delta rule, MLE,
Simple (Naïve) Bayes - K2, EM, backprop, SOM convergence, LVQ, ADP,
simulated annealing - Q-learning, TD(?)
- Simple GA, GP