Data-driven methods in Environmental Sciences Exploration of Artificial Intelligence Techniques - PowerPoint PPT Presentation

About This Presentation
Title:

Data-driven methods in Environmental Sciences Exploration of Artificial Intelligence Techniques

Description:

Successful AI applications. Targeted tasks more amenable to automated methods. Build special-purpose AI systems. Determine appropriate dosage for a drug ... – PowerPoint PPT presentation

Number of Views:121
Avg rating:3.0/5.0
Slides: 35
Provided by: cimm
Learn more at: http://www.cimms.ou.edu
Category:

less

Transcript and Presenter's Notes

Title: Data-driven methods in Environmental Sciences Exploration of Artificial Intelligence Techniques


1
Data-driven methods in Environmental
SciencesExploration of Artificial Intelligence
Techniques
  • Valliappa.Lakshmanan_at_noaa.gov

2
Data Driven Methods
  • What is Artificial Intelligence?
  • Common AI techniques
  • Choosing between AI techniques
  • Pre and post processing

3
What is AI?
  • Machines that perceive, understand and react to
    their environment
  • Goal of Babbage, etc.
  • Oldest endeavor in computer science
  • Machines that think
  • Robots factory floors, home vacuums
  • Still quite impractical

4
AI vs. humans
  • AI applications built on Aristotlean logic
  • Induction, semantic queries, system of logic
  • Human reasoning involves more than just induction
  • Computers never as good as humans
  • In reasoning and making sense of data
  • In obtaining a holistic view of a system
  • Computers much better than humans
  • In processing reams of data
  • Performing complex calculations

5
Successful AI applications
  • Targeted tasks more amenable to automated methods
  • Build special-purpose AI systems
  • Determine appropriate dosage for a drug
  • Classify cells as benign or cancerous
  • Called expert systems
  • Methodology based on expert reasoning
  • Quick and objective ways to obtain answers

6
Data Driven Methods
  • What is Artificial Intelligence?
  • Common AI techniques
  • Choosing between AI techniques
  • Pre and post processing

7
Fuzzy logic
  • Fuzzy logic addresses key problem in expert
    systems
  • How to represent domain knowledge
  • Humans use imprecisely calibrated terms
  • How to build decision trees on imprecise
    thresholds

8
Fuzzy logic example
Source Matlab fuzzy logic toolbox
tutorial http//www.mathworks.com/access/helpdesk/
help/toolbox/fuzzy/fp350.html
9
Advantages of fuzzy logic
  • Considerable skill for little investment
  • Fuzzy logic systems piggy bank on human analysis
  • Humans encode rules after intelligent analysis of
    lots of data
  • Verbal rules generated by humans are robust
  • Simple to create
  • Not much need for data or ground truth
  • Logic tends to be easy to program
  • Fuzzy rules are human understandable

10
Where not to use fuzzy logic
  • Do not use fuzzy logic if
  • Humans do not understand the system
  • Different experts disagree
  • Knowledge can not be expressed with verbal rules
  • Gut instinct is involved
  • Not just objective analysis
  • A fuzzy logic system is limited
  • Piece-wise linear approximation to a system
  • Non-linear systems can not be approximated
  • Many environment applications are non-linear

11
Neural Networks
  • Neural networks can approximate non-linear
    systems
  • Evidence-based
  • Weights chosen through optimization procedure on
    known dataset (training)
  • Works even if experts cant verbalize their
    reasoning, or if there is ground truth

12
A example neural network
Diagram from http//www.codeproject.com/useritems
/GA_ANN_XOR.asp
13
Advantages of neural networks
  • Can approximate any smooth function
  • The three-layer neural network
  • Can yield true probabilities
  • If output node is a sigmoid node
  • Not hard to train
  • Training process is well understood
  • Fast in operations
  • Training is slow, but once trained, the network
    can calculate the output for a set of inputs
    quite fast
  • Easy to implement
  • Just a sum of exponential functions

14
Disadvantages of neural networks
  • A black box
  • The final set of weights yields no insights
  • Magnitude of weights doesnt mean much
  • Measure of skill needs to be differentiable
  • RMS error, etc.
  • Can not use Probability of Detection, for example
  • Training set has to be complete
  • Unpredictable output on data unlike training
  • Need lots of data
  • Need expert willing to do lot of truthing

15
Recap
  • Fuzzy logic
  • Humans provide the rules
  • Not optimal
  • Neural network
  • Humans can not understand system
  • Optimal
  • Middle ground?
  • Genetic Algorithms
  • Decision Trees

16
Genetic algorithms
  • In genetic algorithms
  • One fixes the model (rule base, equations, class
    of functions, etc.)
  • Optimize the parameters to model on training data
    set
  • Use optimal set of parameters for unknown cases

17
An example genetic algorithm
Sources http//tx.technion.ac.il/edassau/web/gen
etic_algorithms.htm http//cswww.essex.ac.uk/resea
rch/NEC/
18
Advantages of genetic algorithms
  • Near-optimal parameters for given model
  • Human-understandable rules
  • Best parameters for them
  • Cost function need not be differentiable
  • The process of training uses natural selection,
    not gradient descent
  • Requires less data than a neural network
  • Search space is more limited

19
Disadvantages of genetic algorithms
  • Highly dependent on class of functions
  • If poor model is chosen, poor results
  • Optimization may not help at all
  • Known model does not always lead to better
    understanding
  • Magnitude of weights, etc. may not be meaningful
    if inputs are correlated
  • Problem may have multiple parametric solutions

20
Decision trees
  • Can automatically build decision trees from known
    data
  • Prune trees
  • Select thresholds
  • Choose operators
  • Disadvantages
  • Piece-wise linear, so typically less skilled than
    neural networks
  • Large decision trees are effectively a blackbox
  • Can not do regression, only classification
  • Advantages
  • Fast to train
  • New advances bagged, boosted decision trees
    approach skill of neural networks, but are no
    longer fast to train

Root 30 50
T lt 10C 20 15
T gt 10C 10 35
Z gt 45 18 2
Z lt 45 2 13
V lt 5 8 2
V gt 5 2 33
21
Radial Basis Functions
Diagram from A. W. Jayawardena D. Achela K.
Fernando 1998 Use of Radial Basis Function Type
Artificial Neural Networks for Runoff Simulation,
Computer-Aided Civil and Infrastructure
Engineering 132
  • Radial Basis Functions are a form of neural
    network
  • Localized gaussians
  • Linear sum of non-linear functions
  • Advantage Can be solved by inverting a matrix,
    so very fast
  • Disadvantage Not a general-enough model

22
Data Driven Methods
  • What is Artificial Intelligence?
  • Common AI techniques
  • Choosing between AI techniques
  • Pre and post processing

23
Typical data-driven application
Input Data
Which features?
How do we find f()
Features
f(features)
AI application in run-time
Result
24
What is the role of the data?
  • Validation
  • Test known model
  • Technique
  • Difference between model output and ground truth
    helps to validate the model
  • Calibration
  • Find parameters to model with desired structure
  • Technique
  • Tuned fuzzy logic method
  • Genetic algorithms
  • Induction
  • Find model and parameters from just data
  • Technique
  • Neural network methods, bagged/boosted decision
    trees, support vector machines, etc.

25
What is the problem to solve?
  • Do you have a bunch of data and want to
  • Estimate an unknown parameter from it?
  • True rainfall based on radar observations?
  • Amount of liquid content from in-situ
    measurements of temperature, pressure, etc?
  • Regression
  • Classify what the data correspond to?
  • A water surge?
  • A temperature inversion?
  • A boundary?
  • Classification
  • Regression and classification arent that
    different
  • Classification estimate probability of an event
  • A function from 0-1

26
Which AI technique?
  • Do you have expert knowledge?
  • Humans have a model in their head? Should the
    final f() be understandable?
  • Create fuzzy logic rules from experts reasoning
  • Aggregate the individual fuzzy logic rules
  • Can tune the fuzzy rules based on data
  • Using regression, decision trees or neural
    networks for RMS error criterion
  • Genetic algorithms for error criteria like ROC,
    economic cost, etc.
  • Many times the original rules are just fine
  • Do you already know the model?
  • A power-law relationship? Gaussian? Quadratic?
    Rules?
  • Just need to find parameters to this model?
  • If linear, just use linear regression
  • If non-linear use genetic algorithms
  • Use continuous GAs
  • Both of these can be used for regression
    (therefore, also classification)

27
Which AI technique (contd.)
  • Do you know nothing about the data?
  • Not the suspected equation/model (GA)?
  • Not the suspected rules (fuzzy logic)?
  • Use a AI technique that supplies its
    equations/rules
  • black box.
  • For classification, use
  • Bagged decision trees or Support Vector Machines
  • If output is probabilistic, remember to apply
    Platt scaling
  • Summary statistics on bagged DTs can help answer
    why
  • Neural Networks
  • For regression, use
  • Neural networks

28
Where do your data come from?
  • Observed data
  • Compute features
  • Choose AI technique
  • The 4 choices in the previous two slides
  • Simulated data
  • Example trying to replicate a very complex model
  • Throw randomly-generated data at model
  • Compute features
  • Choose AI technique
  • GA for parametric approximations
  • NN when you dont know how to approximate

29
Where do you get your inputs?
  • What type of data do you have?
  • Individual observations?
  • Sample them (choose at random) and use directly
  • Sparse observations in a time series?
  • Generate time-based features (1D moving windows)
  • Signal processing features from time series
  • Data from remotely sensed 2D grids?
  • Generate image-based features using convolution
    filters
  • Do you need
  • Pixel-based regression/classification?
  • Use convolution features directly
  • Object-based regression/classification?
  • Identify regions using region growing
  • Use region-aggregate features

30
Typical data-driven application
Observed data
Signal/image processingsampling
Features
normalize/create chromosome/ determine confidences
f()
FzLogic/GenAlg/NN/DecTree
Platt method/region-average/threshold
A data-driven application in run-time
Result
31
Data Driven Methods
  • What is Artificial Intelligence?
  • Common AI techniques
  • Choosing between AI techniques
  • Pre and post processing

32
Preprocessing
  • Often can not use pixel data directly
  • Too much data, too highly correlated
  • May need to segment pixels into objects and use
    features computed on the objects
  • Different data sets may not be collocated
  • Need to interpolate to line them up
  • Mapping, objective analysis
  • Noise in data may need to be reduced
  • Smoothing
  • Present statistic of data, rather than data
    itself
  • Features need to be extracted from data
  • Human experts often good source of ideas on
    signatures to extract from data

33
Postprocessing
  • The output of an expert system may be grid point
    by grid point
  • May need to provide output on objects
  • Storms, forests, etc.
  • Can average outputs over objects pixels
  • May need probabilistic output
  • Scale output of maximum marginal techniques
  • Use a sigmoid function
  • Called Platt scaling

34
Summary
  • What is Artificial Intelligence?
  • Data-driven methods to perform specific targeted
    tasks
  • Common AI techniques
  • Fuzzy logic, neural networks, genetic algorithms,
    decision trees
  • Choosing between AI techniques
  • Understand the role of your data
  • Do experts understand the system? (have a model)
  • Do experts expect to understand the system?
    (readability)
  • Pre and post processing
  • Image processing techniques on spatial grids
Write a Comment
User Comments (0)
About PowerShow.com