Environmental applications of neural networks - PowerPoint PPT Presentation

1 / 39
About This Presentation
Title:

Environmental applications of neural networks

Description:

wnew = wold a(desired output)*input a is the learning rate ... wnew = wold a E/wold (a learning rate) The procedure is repeated through many epochs. ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 40
Provided by: giorgio7
Category:

less

Transcript and Presenter's Notes

Title: Environmental applications of neural networks


1
Environmental applications of neural networks
2
Forecast schema
PREDICTOR
  • y(t, t-1,..) autoregressive terms (past values)
  • u1,u2(t-?,t -?-1,..) exogenous terms (meteo
    conditions, other pollutants,)
  • ? delay ( corrivation time, reaction time,)

3
K steps recursive forecast
yt u1t-? yt-1 u1t-? -1 . yt-L1
PREDICTOR
  • The forecasted value at a given time step becomes
    the first autoregressive term for the next
    forecast

u1t-?1 yt u1t-? . yt-L
PREDICTOR
K max min(?1, ?2.. ?M)1
4
ARX linear models
X2
X1
AR
  • n exogenous inputs, each related to a different
    variable (2 in the example)
  • parameters estimation via linear least squares

5
ARX with threshold (piecewise linear)
  • Refinement of simple ARX different parameters
    estimate for different system conditions
  • Main drawback abrupt model switch in
    correspondence of the thresholds, hardly
    acceptable from a physical viewpoint
  • Possible solution combine (weighted sum) the
    output of the predictors.

6
A different modelling paradigmArtificial Neural
Networks (ANN)
  • Combines the output of many linear models and
    introduces non-linearity

7
The biological inspiration (1)
8
The biological inspiration (2)
  • The brain is
  • Intrinsically parallel among simple units (the
    neurons)
  • Distributed
  • Redundant and thus fault tolerant
  • Adaptive (synapses are reinforced by learning)
  • In the brain we have
  • neurons 1011
  • synapses per neurons 104
  • 223 Km

9
Artificial neurons (1)
10
Artificial neurons (2)
weighting
non linearity
11
Artificial neurons (3)
12
Artificial neurons (4)
The final structure is thus
13
Artificial Neural Network
hidden layer
  • The structure may be replicated (more hidden
    layers, more outputs)
  • As the information moves always ahead within the
    network, such an architecture is called
    feed-forward

14
Artificial Neural Network (2)
An important result
A standard multilayer feedforward network with a
locally bounded piecewise continuous activation
function can approximate any continuous function
to any degree of accuracy if and only if the
network's activation function is not a
polynomial. Without the threshold, the last
theorem does not hold. (see
e.g. Leshno et al., Neural Networks, 1993)
This means that we can in principle substitute
any mapping (model) between some input variables
to some output variables by a neural network of
sufficient complexity with no loss of accuracy.
15
Artificial Neural Network (3)
In practice, we have to fix the network structure
and weights in order to accomplish the task.
  • We operate in the following way
  • Fix a structure (architecture)
  • Determine the weights to optimize the network
    performances (training)
  • Test the generalization ability of the network
  • We thus subdivide the available data into 3
    different sets
  • Training set (to determine parameters)
  • Testing set (to evaluate the architecture)
  • Validation set (to test generalization ability)

16
ANN training
Training the network means determining the values
of the weights wij (including bias).
  • EXAMPLE (the perceptron binary output)
  • Set arbitrary initial values for the weights and
    compute the output
  • If the output is not correct, the weights are
    adjusted according to the formula
  • wnew wold a(desired
    output)input a is the learning rate
  • 1 0.5 0 0.2 1 0.8 1.3
  • Assuming Output Threshold 1.2
  • 1.3 gt 1.2
  • If Output was supposed to be 0
  • Assume a 1 and update the weights
  • W1new 0.5 1(0-1)1 -0.5
  • W2new 0.2 1(0-1)0 0.2 W3new 0.8
    1(0-1)1 -0.2
  • Iterate the process several times (epochs) till
    the correct result is obtained.

17
ANN training (2)
The most common algorithm for network training is
backpropagation.
  • Measure the error is the most common way
  • E (target output)2
  • The partial derivatives of the error wrt the
    weights can be computed
  • Calculation of these derivatives flows backwards
    through the network, hence the name,
    backpropagation
  • The weights are updated with the formula
    (gradient)
  • wnew wold a ?E/?wold
    (a learning rate)
  • The procedure is repeated through many epochs.

18
ANN training (3)
  • Problems with training
  • a too small ? slow convergence
  • a too large ? may not converge
  • E depends on many variables ? local minima
  • Too few epochs ? low precision
  • Too many epochs ? overfitting (low generalization
    ability)
  • Overfitting may also depend on redundant
    structure (too many neurons)

19
ANN training improvement
  • The number of parameters (weights) can easily
    reach some hundreds.
  • Overfitting problems can be reduced by early
    stopping.
  • At each epoch, compute the error on both the
    training and validation set
  • Use the weights that minimize the error on the
    validation set

selection
Epochs
20
ANN training improvement (2)
Another possible improvement is PRUNING, i.e.
automatically reducing the network complexity
  • Pruning provides a method to identify and remove
    redundant/irrelevant parameters, thus reducing
    the overfitting problems.
  • It also provides a framework for automatic
    determination of a neural network (sub)optimal
    architecture.

21
ANN training improvement (2)
  • Pruning is based on the concept of saliency of a
    parameter, i.e.
  • sj measures how much the training error would
    increase, if parameter j is removed from the
    network architecture
  • The parameter having the lowest saliency is
    tentatively removed from the network architecture

22
Pruning procedure
  • Train of the fully connected (possibly
    overfitted) neural network
  • Assess error on the training data set
  • Evaluate the saliency of each parameter
  • Removed the parameter with the lowest saliency
  • The new architecture (1 paramater less than the
    previous one) is trained, and back to point 2

parameters
23
Flood forecastTagliamento case study
  • Area 2480 km2
  • Average Q 90 m3/sec
  • Max flow (1966) 4000 m3/sec
  • 5 rain gauges
  • dataset 2000 hourly records (floods)

24
Standard network
  • Fully connected network (5 rain gauges if one
    or more are not available during the flood, the
    forecast cannot be issued)
  • Efficiency ranges is 0 if the prediction is
    issued as the average of the time series it can
    rise up to 1 for a perfect predictor.
  • Forecast efficiency 5h ahead 85

Rain gauges input terms with a certain delay
Autoregressive terms
25
Pruned architecture
  • Many connections removed
  • 3 rain gauges only considered
  • 5hours-ahead efficiency 84,5

26
Results
Adding also an input with the total rainfall of
the preceding 5 days
27
Conclusions of flood case study
  • ANN allows better accuracy than linear ARX
  • Pruning allows to detect a parameter
    parsimonious, yet effective, architecture for the
    neural network
  • Pruning allows to reduce the use of (redundant)
    rain gauges, without worsening the predictive
    accuracy (more robust measurement network)

28
PM10 in Milan
  • Significant reduction of the yearly average of
    pollutants such as SO2, NOx, CO, TSP (-90, -50,
    -65, -60 in the period 1989-2001).
  • A major concern is constituted by PM10. Its
    yearly average is stable (about 45mg/m3) since
    the beginning of the monitoring (1998).
  • The limit value on the daily average (50mg/m3) is
    exceeded about 100 days every year.
  • The application prediction at 9 a.m. of the PM10
    daily average concentration of the current (and
    the following) day.

29
Air pollutants trends in Milan
  • SO2, NOx and CO decreasing trends (catalytic
    converters, improved heating oils)
  • PM10 and O3 increasing from the early 90s

30
Prediction methodology FFNN
  • The input set contains both pollutants (PM10,
    NOx, SO2) and meteorological data (pressure,
    temperature, etc).
  • The input hourly time series are grouped to daily
    ones as averages over of given hourly time
    windows (chosen by means of cross-correlation
    analysis).
  • The architecture is selected via trial and error
    and trained using the LM algorithm and early
    stopping.

31
PM10 timeseries analysis
  • Available dataset 1999-2002
  • Winter concentrations are about twice as summer
    ones, both because of unfavorable dispersion
    conditions and higher emissions
  • On average, concentrations are about 25 lower on
    Sundays than in other days

32
Deseasonalization
  • Yearly and weekly PM10 periodicities are clearly
    detected also in the frequency domain
  • The same periodicities are detected also on NOx
    and SO2
  • On each pollutant, we fit a periodic regressor
    R(?,t) before training the predictors.
  • PM10pred (t) R(?,t) y(t) y(t) is the
    actual output of the ANN
  • R(?,t) cf(?1,t) f(?2,t) where
  • f(?,t)?k aksin(?t)bkcos(?t)
  • ?1 2?/365 day-1 ?2 2?/7 day-1
  • Meteorological data are standardized as usual.

33
Prediction at 9 a.m. for the current day t
  • Deseasonalization allows to increase the average
    goodness of fit indicators
  • As a term of comparison, a linear ARX predictor
    results in ? 0.89 and MAE 11mg/m3

34
Prediction for the following day (t1)
  • To meet such an ambitious target, we added
    further meteorological improper (i.e., unknown at
    9 a.m. of day t) input variables, such as
    rainfall, temperature, pressure etc. measured
    over both day t and t1
  • The performances obtained in this way can be
    considered as an upper bound of what can be
    achieved by inserting actual meteorological
    forecasts in the predictor
  • Pollutant time series have been again
    deseasonalized via periodic regressor
  • We tried - besides trial and error - also a
    different identification approach for neural
    networks, namely pruning.

35
Pruned ANNs
  • The network showing the lowest validation error
    is finally chosen as optimal
  • Pruned ANNs are parsimonious they contain one
    order of magnitude less parameters than
    fully-connected ones

36
Results
  • Performances of the two models are very close to
    each other, decreasing strongly with respect to
    the 1-day case
  • As a term of comparison, the network trained
    without improper meteorological information
    looses just a few percent over the different
    indicators, showing an almost irrelevant gap

37
Conclusions on PM10
  • Performances on the 1-day prediction appears to
    be satisfactory in this case, the system can be
    really operated as a support to daily decisions
    (traffic blocks, alarm diffusion,).
  • Deseasonalization of data before training the
    predictors seems to be helpful in improving the
    performances.
  • 2-days forecast are disappointing, even if
    improper meteo data are introduced. Performance
    differences between pruned and fully connected
    neural networks are negligible.
  • More comprehensive meteorological data (vertical
    profiles, mixing layer) may be more substantial
    than training methods in improving the quality of
    longer term forecasts.

38
Other network architectures
RECURRENT NETWORKS some of the outputs are fed
back to the input at the following iteration.
Used in various fields (see for instance
www.idsia.ch/juergen/rnn.html).
PROBLEM how to train the network? Possible
solution limit the number of possible iterations
39
  • AUTOASSOCIATIVE NETWORKS
  • They are trained to reproduce the input with very
    few neurons in one hidden layer.
  • They may be used to detect input characteristics
    (as Principal Components).
  • They can highlight non linear links between input
    variables.
  • They can also be useful, for instance, to
    diagnose faults in a sensor network (a broken
    sensor gives values different from those of the
    network output).
Write a Comment
User Comments (0)
About PowerShow.com