Data-driven methods in Environmental Sciences Exploration of Artificial Intelligence Techniques - PowerPoint PPT Presentation

About This Presentation

Title:

Data-driven methods in Environmental Sciences Exploration of Artificial Intelligence Techniques

Description:

Successful AI applications. Targeted tasks more amenable to automated methods. Build special-purpose AI systems. Determine appropriate dosage for a drug ... – PowerPoint PPT presentation

Number of Views:121

Avg rating:3.0/5.0

Slides: 35

Provided by: cimm

Learn more at: http://www.cimms.ou.edu

Category:

more less

Transcript and Presenter's Notes

Title: Data-driven methods in Environmental Sciences Exploration of Artificial Intelligence Techniques

1
Data-driven methods in Environmental
SciencesExploration of Artificial Intelligence
Techniques

Valliappa.Lakshmanan_at_noaa.gov

2
Data Driven Methods

What is Artificial Intelligence?
Common AI techniques
Choosing between AI techniques
Pre and post processing

3
What is AI?

Machines that perceive, understand and react to
their environment
Goal of Babbage, etc.
Oldest endeavor in computer science
Machines that think
Robots factory floors, home vacuums
Still quite impractical

4
AI vs. humans

AI applications built on Aristotlean logic
Induction, semantic queries, system of logic
Human reasoning involves more than just induction
Computers never as good as humans
In reasoning and making sense of data
In obtaining a holistic view of a system
Computers much better than humans
In processing reams of data
Performing complex calculations

5
Successful AI applications

Targeted tasks more amenable to automated methods
Build special-purpose AI systems
Determine appropriate dosage for a drug
Classify cells as benign or cancerous
Called expert systems
Methodology based on expert reasoning
Quick and objective ways to obtain answers

6
Data Driven Methods

What is Artificial Intelligence?
Common AI techniques
Choosing between AI techniques
Pre and post processing

7
Fuzzy logic

Fuzzy logic addresses key problem in expert
systems
How to represent domain knowledge
Humans use imprecisely calibrated terms
How to build decision trees on imprecise
thresholds

8
Fuzzy logic example
Source Matlab fuzzy logic toolbox
tutorial http//www.mathworks.com/access/helpdesk/
help/toolbox/fuzzy/fp350.html
9
Advantages of fuzzy logic

Considerable skill for little investment
Fuzzy logic systems piggy bank on human analysis
Humans encode rules after intelligent analysis of
lots of data
Verbal rules generated by humans are robust
Simple to create
Not much need for data or ground truth
Logic tends to be easy to program
Fuzzy rules are human understandable

10
Where not to use fuzzy logic

Do not use fuzzy logic if
Humans do not understand the system
Different experts disagree
Knowledge can not be expressed with verbal rules
Gut instinct is involved
Not just objective analysis
A fuzzy logic system is limited
Piece-wise linear approximation to a system
Non-linear systems can not be approximated
Many environment applications are non-linear

11
Neural Networks

Neural networks can approximate non-linear
systems
Evidence-based
Weights chosen through optimization procedure on
known dataset (training)
Works even if experts cant verbalize their
reasoning, or if there is ground truth

12
A example neural network
Diagram from http//www.codeproject.com/useritems
/GA_ANN_XOR.asp
13
Advantages of neural networks

Can approximate any smooth function
The three-layer neural network
Can yield true probabilities
If output node is a sigmoid node
Not hard to train
Training process is well understood
Fast in operations
Training is slow, but once trained, the network
can calculate the output for a set of inputs
quite fast
Easy to implement
Just a sum of exponential functions

14
Disadvantages of neural networks

A black box
The final set of weights yields no insights
Magnitude of weights doesnt mean much
Measure of skill needs to be differentiable
RMS error, etc.
Can not use Probability of Detection, for example
Training set has to be complete
Unpredictable output on data unlike training
Need lots of data
Need expert willing to do lot of truthing

15
Recap

Fuzzy logic
Humans provide the rules
Not optimal
Neural network
Humans can not understand system
Optimal
Middle ground?
Genetic Algorithms
Decision Trees

16
Genetic algorithms

In genetic algorithms
One fixes the model (rule base, equations, class
of functions, etc.)
Optimize the parameters to model on training data
set
Use optimal set of parameters for unknown cases

17
An example genetic algorithm
Sources http//tx.technion.ac.il/edassau/web/gen
etic_algorithms.htm http//cswww.essex.ac.uk/resea
rch/NEC/
18
Advantages of genetic algorithms

Near-optimal parameters for given model
Human-understandable rules
Best parameters for them
Cost function need not be differentiable
The process of training uses natural selection,
not gradient descent
Requires less data than a neural network
Search space is more limited

19
Disadvantages of genetic algorithms

Highly dependent on class of functions
If poor model is chosen, poor results
Optimization may not help at all
Known model does not always lead to better
understanding
Magnitude of weights, etc. may not be meaningful
if inputs are correlated
Problem may have multiple parametric solutions

20
Decision trees

Can automatically build decision trees from known
data
Prune trees
Select thresholds
Choose operators
Disadvantages
Piece-wise linear, so typically less skilled than
neural networks
Large decision trees are effectively a blackbox
Can not do regression, only classification
Advantages
Fast to train
New advances bagged, boosted decision trees
approach skill of neural networks, but are no
longer fast to train

Root 30 50
T lt 10C 20 15
T gt 10C 10 35
Z gt 45 18 2
Z lt 45 2 13
V lt 5 8 2
V gt 5 2 33
21
Radial Basis Functions
Diagram from A. W. Jayawardena D. Achela K.
Fernando 1998 Use of Radial Basis Function Type
Artificial Neural Networks for Runoff Simulation,
Computer-Aided Civil and Infrastructure
Engineering 132

Radial Basis Functions are a form of neural
network
Localized gaussians
Linear sum of non-linear functions
Advantage Can be solved by inverting a matrix,
so very fast
Disadvantage Not a general-enough model

22
Data Driven Methods

What is Artificial Intelligence?
Common AI techniques
Choosing between AI techniques
Pre and post processing

23
Typical data-driven application
Input Data
Which features?
How do we find f()
Features
f(features)
AI application in run-time
Result
24
What is the role of the data?

Validation
Test known model
Technique
Difference between model output and ground truth
helps to validate the model
Calibration
Find parameters to model with desired structure
Technique
Tuned fuzzy logic method
Genetic algorithms
Induction
Find model and parameters from just data
Technique
Neural network methods, bagged/boosted decision
trees, support vector machines, etc.

25
What is the problem to solve?

Do you have a bunch of data and want to
Estimate an unknown parameter from it?
True rainfall based on radar observations?
Amount of liquid content from in-situ
measurements of temperature, pressure, etc?
Regression
Classify what the data correspond to?
A water surge?
A temperature inversion?
A boundary?
Classification
Regression and classification arent that
different
Classification estimate probability of an event
A function from 0-1

26
Which AI technique?

Do you have expert knowledge?
Humans have a model in their head? Should the
final f() be understandable?
Create fuzzy logic rules from experts reasoning
Aggregate the individual fuzzy logic rules
Can tune the fuzzy rules based on data
Using regression, decision trees or neural
networks for RMS error criterion
Genetic algorithms for error criteria like ROC,
economic cost, etc.
Many times the original rules are just fine
Do you already know the model?
A power-law relationship? Gaussian? Quadratic?
Rules?
Just need to find parameters to this model?
If linear, just use linear regression
If non-linear use genetic algorithms
Use continuous GAs
Both of these can be used for regression
(therefore, also classification)

27
Which AI technique (contd.)

Do you know nothing about the data?
Not the suspected equation/model (GA)?
Not the suspected rules (fuzzy logic)?
Use a AI technique that supplies its
equations/rules
black box.
For classification, use
Bagged decision trees or Support Vector Machines
If output is probabilistic, remember to apply
Platt scaling
Summary statistics on bagged DTs can help answer
why
Neural Networks
For regression, use
Neural networks

28
Where do your data come from?

Observed data
Compute features
Choose AI technique
The 4 choices in the previous two slides
Simulated data
Example trying to replicate a very complex model
Throw randomly-generated data at model
Compute features
Choose AI technique
GA for parametric approximations
NN when you dont know how to approximate

29
Where do you get your inputs?

What type of data do you have?
Individual observations?
Sample them (choose at random) and use directly
Sparse observations in a time series?
Generate time-based features (1D moving windows)
Signal processing features from time series
Data from remotely sensed 2D grids?
Generate image-based features using convolution
filters
Do you need
Pixel-based regression/classification?
Use convolution features directly
Object-based regression/classification?
Identify regions using region growing
Use region-aggregate features

30
Typical data-driven application
Observed data
Signal/image processingsampling
Features
normalize/create chromosome/ determine confidences
f()
FzLogic/GenAlg/NN/DecTree
Platt method/region-average/threshold
A data-driven application in run-time
Result
31
Data Driven Methods

What is Artificial Intelligence?
Common AI techniques
Choosing between AI techniques
Pre and post processing

32
Preprocessing

Often can not use pixel data directly
Too much data, too highly correlated
May need to segment pixels into objects and use
features computed on the objects
Different data sets may not be collocated
Need to interpolate to line them up
Mapping, objective analysis
Noise in data may need to be reduced
Smoothing
Present statistic of data, rather than data
itself
Features need to be extracted from data
Human experts often good source of ideas on
signatures to extract from data

33
Postprocessing