A Brief Tour of Machine Learning - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

A Brief Tour of Machine Learning

Description:

Play Tennis. Play Monopoly. Causal links between features can be modelled ... How to test leave one out, cross validation, stratify, online, offline... – PowerPoint PPT presentation

Number of Views:123
Avg rating:3.0/5.0
Slides: 33
Provided by: dav5178
Category:

less

Transcript and Presenter's Notes

Title: A Brief Tour of Machine Learning


1
A Brief Tour of Machine Learning
  • David Lindsay

2
What is Machine Learning?
  • Very multidisciplinary field statistics,
    mathematics, artificial intelligence, psychology,
    philosophy, cognitive science
  • In a nutshell developing algorithms that learn
    from data
  • Historically flourished from advances in
    computing in the early 60s, resurgence in the
    late 90s

3
Main areas in Machine Learning
1 Supervised learning
assumes a teacher exists to label/annotate data
2 Unsupervised learning
no need for a teacher, try to learn relationships
automatically
3 Reinforcement learning
biologically plausible, try to learn from
reward/punishment stimuli/feedback
4
Supervised Learning
  • Learning with a teacher

5
More about Supervised Learning
  • Perhaps the most well studied area of machine
    learning lots of nice theory adapted from
    statistics/mathematics.
  • Assume the existence of a training and test set
  • Main sub-areas of research are
  • Pattern recognition (discrete labels)
  • Regression (continuous labels)
  • Time series analysis (temporal dependence in data)

i.i.d. assumption commonly made
6
The formalisation of data
  • How to we formally describe our data?

Property of the object that we want to predict in
the future using our training data e.g..
screening cancer labels could be Y normal,
benign, malignant
Label

Commonly represented as a feature vector this
describes the object
Object
The individual features can be real, discrete,
symbolic eg. patient symptoms temperature, sex,
eye colour
7
The formalisation of data (continued)
  • What is training and test data?

y
7
6
1
7
?
?
2
New test images labels either not known or
withheld from the learner
x
Training set of images
We learn from the training data, and try to
predict new unseen test data. More formally we
have a set of n training and test examples
(information pairs object label) from the
some unknown probability distribution P(X,Y).
8
More about Pattern Recognition
  • Lots of algorithms/techniques the main
    contenders
  • Support Vector Machines (SVM)
  • Nearest Neighbours
  • Decision Trees
  • Neural Networks
  • Multivariate Statistics
  • Bayesian algorithms
  • Logic programming

9
The mighty SVM algorithm
  • Very popular technique lots of followers,
    relatively new
  • Very simple technique related to the
    Perceptron, is a linear classifier (separates
    data into half spaces).

Concept keep the classifier simple, dont over
fit the data ? the classifier generalises well on
new test data (Occams razor)
Concept if data not linearly separable use a
kernel ? F map into another higher dimensional
feature space and data may be separable
10
Hot topics in SVMs
  • Kernel design central to the application to
    data, eg. when the objects are text documents,
    the features are words ? incorporate domain
    knowledge about grammar.
  • Applying the kernel technique to other learning
    algorithms e.g.. Neural Networks

11
The trusty old Nearest Neighbour algorithm
  • Born in the 60s probably the most simple of
    all algorithms to understand.
  • Decision rule classify new test examples by
    finding the closest neighbouring example in the
    training set and predict the same label as the
    closest.
  • Lots of theory justifying its convergence
    properties.
  • Very lazy technique, not very fast has to
    search for each test example.

12
Problems with Nearest Neighbours
  • View examples in Euclidean space, can be very
    sensitive to feature scaling.
  • Finding computationally efficient ways to search
    for the Nearest Neighbour example.

13
Decision Trees
  • Many different varieties C4.5, CART, ID3
  • Algorithms build classification rules using a
    tree of if-then statements.
  • Constructs tree using Minimum Description Length
    (MDL) principles (tries to make the tree as
    simple as possible)

IF temperature 65
Patient has fever
IF dehydrated yes
Patient has flu
Patient has pneumonia
14
Benefits/Issues with Decision Trees
  • Instability minor changes to training data
    makes huge changes to decision tree
  • User can visualise/interpret the hypothesis
    directly, can find interesting classification
    rules
  • Problems with continuous real attributes, must be
    discretalised.
  • Large AI following, and widely used in industry

15
Mystical Neural Networks
  • Very flexible, learning is a gradient descent
    process (back propagation)
  • Training neural networks involves a lot of design
    choices
  • what network structure, how many hidden layers
  • how to encode the data (must be values 0,1)
  • use momentum to speed up convergence
  • Use weight decay to keep simple

16
Training a neural network
Learnt hypothesis is represented by the weights
that interconnect each neuron
The aim in training the neural network is find
the weight vector w that minimises the error E(w)
on the training set
Sigmoid function
Gradient descent problem
17
Interesting applications
  • Bioinformatics
  • genetic/protein code analysis
  • microarray analysis
  • gene regulatory pathways
  • WWW
  • classifying text/html documents
  • filtering images
  • filtering emails

18
Bayesian Algorithms
  • Try to model interrelationships between variables
    probabilistically.
  • Can model expert/domain knowledge directly into
    the classifier as prior belief in certain events.
  • Use basic axioms of probability theory to extract
    probabilistic estimates

19
Bayesian algorithms in practice
  • Lots of different algorithms Relevance Vector
    Machine (RVM), Naïve Bayes, Simple Bayes,
    Bayesian Belief Networks (BBN)
  • Has a large following especially Microsoft
    Research

Weather sunny
Causal links between features can be modelled
Temperature
Humidity 100
Play Monopoly
Play Tennis
20
Issues with Bayesian algorithms
  • Tractability to find solutions need numerical
    approximations or take computational shortcuts
  • Can model causal relationships between variables
  • Need lots of data to estimate probabilties using
    obsevered training data frequencies

21
Very important side problems
  • Feature Selection/Extraction Using Principle
    Component Analysis, Wavelets, Cananonical
    Correlation, Factor Analysis, Independent
    Component Analysis
  • Imputation what to do with missing features?
  • Visualisation make the hypothesis human
    readable/interpretable
  • Meta learning how to add functionality to
    existing algorithms, or combine the prediction of
    many classifiers (Boosting, Bagging, Confidence
    and Probability Machines)

22
Very important side problems (continued)
  • How to incorporate domain knowledge into a
    learner
  • Trade off between complexity (accuracy on
    training) vs. generalisation (accuracy on test)
  • Pre-processing of data, normalising,
    standardising, discretalising.
  • How to test leave one out, cross validation,
    stratify, online, offline

23
Unsupervised Learning
  • Learning without a teacher

24
An introduction to Unsupervised Learning
  • No need for a teacher/supervisor
  • Mainly clustering trying to group objects into
    sensible clusters
  • Novelty detection finding strange examples in
    data

Clustering examples
Novelty detection
25
Algorithms available
  • For clustering EM algorithm, K-Means, Self
    Organising Maps (SOM)
  • For novelty detection 1-Class SVM, support
    vector regression, Neural Networks

26
Issues and Applications
  • Very useful for extracting information from data.
  • Used in medicine to identify disease sub types.
  • Used to cluster web documents automatically
  • Used to identify customer target groups in
    buisness
  • Not much publicly available data to test
    algorithms with

27
Reinforcement Learning
  • Learning inspired by nature

28
An introduction
  • Most biologically plausible feedback given
    through stimuli reward/punishment
  • A field with a lot of theory needing for real
    life applications (other than playing BackGammon)
  • But also encompasses the large field of
    Evolutionary Computing
  • Applications are more open ended
  • Getting closer to what public consider AI.

29
Traditional Reinforcement Learning
  • Techniques use dynamic programming to search for
    optimal strategy
  • Algorithms search to maximise their reward.
  • Q Learning (Chris Watkins next door) is most
    well known technique.
  • Only successful applications are to games and toy
    problems.
  • A lack of real life applications.
  • Very few researchers in this field.

30
Evolutionary Computing
  • Inspired by the process of biological evolution.
  • Essentially an optimisation technique the
    problem is encoded as a chromosome.
  • We find new/better solutions to problem by sexual
    reproduction and mutation.
  • This will encourage mutation

31
Techniques available in Evolutionary Computing
  • Lower level optimisers
  • Evolutionary Programming, Evolutionary Algorithms
  • Genetic Programming, Genetic Algorithms,
  • Evolutionary Strategy
  • Simulated Annealing
  • Higher level optimisers
  • TABU search
  • Multi-objective optimisation

Pareto front of optimal solutions which one
should we pick?
Objective 2
Objective 1
32
Issues in Evolutionary Computing
  • How to encode the problem is very important
  • Setting mutation/crossover rates is very adhoc
  • Very computationally/memory intensive
  • Not much theory can be developed frowned upon
    by machine learning theorists
Write a Comment
User Comments (0)
About PowerShow.com