Machine Learning to explore fish species interaction in the Northern gulf of St Lawrence - PowerPoint PPT Presentation

About This Presentation
Title:

Machine Learning to explore fish species interaction in the Northern gulf of St Lawrence

Description:

Machine Learning to explore fish species interaction in the Northern gulf of St Lawrence Dr Allan Tucker Centre for Intelligent Data Analysis Brunel University – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 49
Provided by: disc97
Category:

less

Transcript and Presenter's Notes

Title: Machine Learning to explore fish species interaction in the Northern gulf of St Lawrence


1
Machine Learning to explore fish species
interaction in the Northern gulf of St Lawrence
  • Dr Allan Tucker
  • Centre for Intelligent Data Analysis
  • Brunel University
  • West London
  • UK

2
Talk Outline
  • Introduce myself and research group
  • Introduce Machine Learning
  • Describe Bayesian network models
  • Document some preliminary results on fish
    population data
  • Conclusions

3
Who Am I?
  • Research Lecturer at Brunel University, West
    London
  • Member of Centre for IDA (est 1994)

X
4
What is the ?
  • Over 25 members (academics, postdocs, and PhDs)
    with diverse backgrounds (e.g. maths, statistics,
    computing, biology, engineering)
  • Over 140 journal publications a dozen research
    council grants since 2001
  • Many collaborating partners in UK, Europe, China
    and USA
  • Bi Annual Symposia in Europe

5
Some Previous Work in
  • Machine Learning and Temporal Analysis
  • Oil Refinery Models
  • Forecasting
  • Explanation
  • Medical Data Retinal (Visual Field)
  • Screening
  • Forecasting
  • Bioinformatics
  • Gene Clusters
  • Gene Regulatory Networks

6
Some Previous Work in
7
Part 1
What is Machine Learning?
8
What is Machine Learning?
  • (and why not statistics?)
  • Data oriented
  • Extracting useful info from data
  • As automated as possible
  • Useful when lots of data and little theory
  • Making predictions about the future

9
What Can we do with ML?
  • Classification and Clustering
  • Feature Selection
  • Prediction and Forecasting
  • Identifying Structure in Data

10
E.g. Classification
  • Given some labelled data (supervised)
  • Build a model to allow us to classify other
    unlabelled data
  • e.g. A doctor diagnosing a patient based upon
    previous cases

11
Classification e.g. medical
  • Scatterplot of patients
  • 2 variables
  • Measurement of expression of 2 genes

12
Classification
  • How do we classify them?
  • Nearest Neighbour / Linear / Complex Fn?

13
Classification
  • Trivial case with Cod and Shrimp Data

14
The Data
  • Northern Gulf (region a)
  • Two ships (Needler and Hammond) combined by
    normalising according to overlap year
  • Multivariate Spatial Time Series (short)
  • Missing Data

15
Background
  • Northern Gulf considered to be one ecosystem /
    fish community
  • Quite heavily fished until about 1990
  • Most fish populations collapsed since
  • Some say that moved to an alternative stable
    state and unlikely to come back to cod dominated
    community without some chance event beyond human
    control.
  • Lots of speculation
  • cold water
  • large increases in population of predators.
  • Examine nature and strength of interactions
    between species in the two periods.
  • Ask what if ? questions
  • For other parts of community to recover, we would
    need cod to have X strength of interaction with Y
    number of other species?

16
ML for Northern Gulf Data
  • Network building
  • knowledge and data of interactions
  • Feature Selection for Classification of relevant
    species to the cod collapse
  • State Space / Dynamic models for predicting
    populations
  • Hidden variable analysis

17
Part 2
Bayesian Networks for Machine Learning
18
Bayesian Networks
  • Method to model a domain using probabilities
  • Easily interpreted by non-statisticians
  • Can be used to combine existing knowledge with
    data
  • Essentially use independence assumptions to
    model the joint distribution of a domain

19
Bayesian Networks
  • Simple 2 variable Joint Distribution
  • can use it to ask many useful questions
  • but requires kN probabilities

P(Collapse1, Collapse2)
Species2 Species2
Species1 0.89 0.01
Species1 0.03 0.07
20
Bayesian Network for Toy Domain
SpeciesA
SpeciesB
P(A)
P(B)
.001
.002
A B P(C)
T T .95
T F .94
SpeciesC
F T .29
F F .001
C P(E)
C P(D)
T .70
T .90
F .01
F .05
SpeciesD
SpeciesE
21
Bayesian Networks
  • Bayesian Network Demo
  • Species_Net
  • Use algorithms to learn structure and parameters
    from data
  • Or build by hand (priors)
  • Also continuous nodes (density functions)

22
Informative Priors
  • To build BNs we can also use prior structures
    and probabilities
  • These are then updated with data
  • Usually uniform (equal probability)
  • Informative Priors used to incorporate existing
    knowledge into BNs

23
Bayesian Networks for Classification Feature
Selection
  • Node that represents the class label attached to
    the data

24
Dynamic Bayesian Networks for Forecasting
  • Nodes represent variables at distinct time
    slices
  • Links between nodes over time
  • Can be used to forecast into the future
  • Species_Dynamic_Net

25
Hidden Markov Models
  • Like a DBN but with hidden nodes
  • Often used to model sequences

HT-1
HT
OT-1
OT
26
Typical Algorithms for HMMs
  • Given an observed sequence and a model, how do
    we compute its probability given the model?
  • Given the observed sequence and the model, how
    do we choose an optimal hidden state sequence?
  • How do we adjust the model parameters to
    maximise the probability of the observed sequence
    given the model?

27
Summary
  • Different learning tasks can be used to solve
    real world problems
  • Machine Learning techniques useful when lots of
    data and lots of gaps in knowledge
  • Bayesian Networks probabilistic framework that
    can perform most key ML tasks
  • Also transparent can incorporate expert
    knowledge

28
Part 3
Some Preliminary Results on Northern Gulf Data
29
Expert Knowledge
  • Ask marine biologists to generate matrices of
    expected relationships
  • Can be used to compare models learnt from data
  • Also to be used as priors to improve model
    quality

30
Results Expert networks
31
Results Data networks (BN from correlation)
  • 85 conf. imputed from 70 data
  • Warning data quality, spurious relations

(Eel pout / Ocean Sun Fish)
Witch Flounder
(Lumpfish)
Cod
Haddock
Shrimp
(Silver Hake)
(Atlantic soft pout / Bristlemouths)
32
Example DBN
  • Lets look at an example DBN
  • NGulfDynamic - range
  • Structure Encoded by knowledge
  • Updated by data
  • Explore with queries
  • Supported by previous knowledge
  • In the Northern gulf of st. Lawrence, cod (code
    438) and redfish (792,793,794,795,796) collapsed
    to very low levels in the mid 1990s. Subsequently
    the shrimp (8111) increased greatly in biomass so
    one will see this signal in the data. It is
    hypothesised that these are exclusive community
    states where you never get high abundance of both
    at the same time owing to predatory interactions.

33
Feature Selection
  • Given that we know that from 1990 the cod
    population collapsed
  • Can we apply Feature Selection to see what
    species characterise this collapse
  • Learn BN and apply CV

34
Results 7 Feature Selection with Bootstrap
Filter method using Log Likelihood
Wrapper method using BNs
Redfish
35
Results Feature Selection
  • Change in Correlation of interactions between
    cod and high ranking species before and after
    1990

36
Dynamic Models
  • Given that the data is a time-series
  • Can we build dynamic models to forecast future
    states?
  • Can we use HMM to classify the time-series?

37
Multivariate Time Series
  • N Gulf is process measured over time
  • Autoregressive Correlation Function
  • (here cod)
  • Cross Correlation Function
  • (here hake to cod)

ACF
CCF
38
Results 3 Fitting Dynamic Models
  • HMM Expert with CCF gt 0.3 (maxlag 5)

LSS 8.3237
39
Results 3 Fitting Dynamic Models
  • Learning DBN from CCF data

LSS 5.0106
Fluctuation Early Indicator of Collapse?
40
Results 4 Examining DBN Net
  • Data only Dynamic Links

Hakes
Redfish
Cod
Haddock
Witch Flounder
White Hake
Thorny Skate
Shrimp
41
Results 5 Fitting Dynamic Models
  • Learning DBN from Expert biased CCF data CCF gt
    0.5 (maxlag5)

LSS 6.1326
42
Results 6 Examining DBN Net
  • Data Biased Expert Dynamic Links

Cod
Herring
Witch Flounder
Mackerel / Capelin
43
Results 7 Linear Dynamic System
  • Instead of hidden state, continuous var
  • Could be interpreted as measure of fishing?
    Predator population (e.g. seals)? Water
    temperature?

1987 (white fur ban)
1991
1997 (white fur hunt)
1984
44
Conclusions
  • Hopefully conveyed the broad idea of machine
    learning
  • Shown how it can be used to help analyse data
    like fish population data
  • Potentially applicable to other data studied
    here at MLI

45
Potential Projects
  • Spatio-Temporal Analysis
  • Use Spatio-Temporal BNs to model fish stock data.
    Nodes would represent species in specific
    regions
  • Combining Expert Knowledge and Data for improved
    Prediction
  • Looking for Un/Stable States and the factors that
    influence them
  • Functional Analysis of Data from Multiple
    Locations

46
E.G. Spatial Analysis
  • Spatial Bayesian Network Analysis
  • NGulfCodSpatial

47
E.G. Functional Models
  • Functional Models to assimilate data from
    different oceans...

48
AcknowledgementsDaniel DupliseaPanayiota
Apostolaki
Any Questions?
Write a Comment
User Comments (0)
About PowerShow.com