Environmental applications of neural networks - PowerPoint PPT Presentation

1 / 39

About This Presentation

Title:

Environmental applications of neural networks

Description:

wnew = wold a(desired output)*input a is the learning rate ... wnew = wold a E/wold (a learning rate) The procedure is repeated through many epochs. ... – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 40

Provided by: giorgio7

Category:

more less

Transcript and Presenter's Notes

Title: Environmental applications of neural networks

1
Environmental applications of neural networks
2
Forecast schema
PREDICTOR

y(t, t-1,..) autoregressive terms (past values)
u1,u2(t-?,t -?-1,..) exogenous terms (meteo
conditions, other pollutants,)
? delay ( corrivation time, reaction time,)

3
K steps recursive forecast
yt u1t-? yt-1 u1t-? -1 . yt-L1
PREDICTOR

The forecasted value at a given time step becomes
the first autoregressive term for the next
forecast

u1t-?1 yt u1t-? . yt-L
PREDICTOR
K max min(?1, ?2.. ?M)1
4
ARX linear models
X2
X1
AR

n exogenous inputs, each related to a different
variable (2 in the example)
parameters estimation via linear least squares

5
ARX with threshold (piecewise linear)

Refinement of simple ARX different parameters
estimate for different system conditions

Main drawback abrupt model switch in
correspondence of the thresholds, hardly
acceptable from a physical viewpoint
Possible solution combine (weighted sum) the
output of the predictors.

6
A different modelling paradigmArtificial Neural
Networks (ANN)

Combines the output of many linear models and
introduces non-linearity

7
The biological inspiration (1)
8
The biological inspiration (2)

The brain is
Intrinsically parallel among simple units (the
neurons)
Distributed
Redundant and thus fault tolerant
Adaptive (synapses are reinforced by learning)
In the brain we have
neurons 1011
synapses per neurons 104
223 Km

9
Artificial neurons (1)
10
Artificial neurons (2)
weighting
non linearity
11
Artificial neurons (3)
12
Artificial neurons (4)
The final structure is thus
13
Artificial Neural Network
hidden layer

The structure may be replicated (more hidden
layers, more outputs)
As the information moves always ahead within the
network, such an architecture is called
feed-forward

14
Artificial Neural Network (2)
An important result
A standard multilayer feedforward network with a
locally bounded piecewise continuous activation
function can approximate any continuous function
to any degree of accuracy if and only if the
network's activation function is not a
polynomial. Without the threshold, the last
theorem does not hold. (see
e.g. Leshno et al., Neural Networks, 1993)
This means that we can in principle substitute
any mapping (model) between some input variables
to some output variables by a neural network of
sufficient complexity with no loss of accuracy.
15
Artificial Neural Network (3)
In practice, we have to fix the network structure
and weights in order to accomplish the task.

We operate in the following way
Fix a structure (architecture)
Determine the weights to optimize the network
performances (training)
Test the generalization ability of the network
We thus subdivide the available data into 3
different sets
Training set (to determine parameters)
Testing set (to evaluate the architecture)
Validation set (to test generalization ability)

16
ANN training
Training the network means determining the values
of the weights wij (including bias).

EXAMPLE (the perceptron binary output)
Set arbitrary initial values for the weights and
compute the output
If the output is not correct, the weights are
adjusted according to the formula
wnew wold a(desired
output)input a is the learning rate
1 0.5 0 0.2 1 0.8 1.3
Assuming Output Threshold 1.2
1.3 gt 1.2
If Output was supposed to be 0
Assume a 1 and update the weights
W1new 0.5 1(0-1)1 -0.5
W2new 0.2 1(0-1)0 0.2 W3new 0.8
1(0-1)1 -0.2
Iterate the process several times (epochs) till
the correct result is obtained.

17
ANN training (2)
The most common algorithm for network training is
backpropagation.

Measure the error is the most common way
E (target output)2
The partial derivatives of the error wrt the
weights can be computed
Calculation of these derivatives flows backwards
through the network, hence the name,
backpropagation
The weights are updated with the formula
(gradient)
wnew wold a ?E/?wold
(a learning rate)
The procedure is repeated through many epochs.

18
ANN training (3)

Problems with training
a too small ? slow convergence
a too large ? may not converge
E depends on many variables ? local minima
Too few epochs ? low precision
Too many epochs ? overfitting (low generalization
ability)
Overfitting may also depend on redundant
structure (too many neurons)

19
ANN training improvement

The number of parameters (weights) can easily
reach some hundreds.
Overfitting problems can be reduced by early
stopping.

At each epoch, compute the error on both the
training and validation set
Use the weights that minimize the error on the
validation set

selection
Epochs
20
ANN training improvement (2)
Another possible improvement is PRUNING, i.e.
automatically reducing the network complexity

Pruning provides a method to identify and remove
redundant/irrelevant parameters, thus reducing
the overfitting problems.
It also provides a framework for automatic
determination of a neural network (sub)optimal
architecture.

21
ANN training improvement (2)

Pruning is based on the concept of saliency of a
parameter, i.e.

sj measures how much the training error would
increase, if parameter j is removed from the
network architecture
The parameter having the lowest saliency is
tentatively removed from the network architecture

22
Pruning procedure

Train of the fully connected (possibly
overfitted) neural network
Assess error on the training data set
Evaluate the saliency of each parameter
Removed the parameter with the lowest saliency
The new architecture (1 paramater less than the
previous one) is trained, and back to point 2

parameters
23
Flood forecastTagliamento case study

Area 2480 km2
Average Q 90 m3/sec
Max flow (1966) 4000 m3/sec
5 rain gauges
dataset 2000 hourly records (floods)

24
Standard network

Fully connected network (5 rain gauges if one
or more are not available during the flood, the
forecast cannot be issued)
Efficiency ranges is 0 if the prediction is
issued as the average of the time series it can
rise up to 1 for a perfect predictor.
Forecast efficiency 5h ahead 85

Rain gauges input terms with a certain delay
Autoregressive terms
25
Pruned architecture

Many connections removed
3 rain gauges only considered
5hours-ahead efficiency 84,5

26
Results
Adding also an input with the total rainfall of
the preceding 5 days
27
Conclusions of flood case study

ANN allows better accuracy than linear ARX
Pruning allows to detect a parameter
parsimonious, yet effective, architecture for the
neural network
Pruning allows to reduce the use of (redundant)
rain gauges, without worsening the predictive
accuracy (more robust measurement network)

28
PM10 in Milan

Significant reduction of the yearly average of
pollutants such as SO2, NOx, CO, TSP (-90, -50,
-65, -60 in the period 1989-2001).
A major concern is constituted by PM10. Its
yearly average is stable (about 45mg/m3) since
the beginning of the monitoring (1998).
The limit value on the daily average (50mg/m3) is
exceeded about 100 days every year.
The application prediction at 9 a.m. of the PM10
daily average concentration of the current (and
the following) day.

29
Air pollutants trends in Milan

SO2, NOx and CO decreasing trends (catalytic
converters, improved heating oils)

PM10 and O3 increasing from the early 90s

30
Prediction methodology FFNN

The input set contains both pollutants (PM10,
NOx, SO2) and meteorological data (pressure,
temperature, etc).
The input hourly time series are grouped to daily
ones as averages over of given hourly time
windows (chosen by means of cross-correlation
analysis).
The architecture is selected via trial and error
and trained using the LM algorithm and early
stopping.

31
PM10 timeseries analysis

Available dataset 1999-2002
Winter concentrations are about twice as summer
ones, both because of unfavorable dispersion
conditions and higher emissions
On average, concentrations are about 25 lower on
Sundays than in other days

32
Deseasonalization

Yearly and weekly PM10 periodicities are clearly
detected also in the frequency domain
The same periodicities are detected also on NOx
and SO2

On each pollutant, we fit a periodic regressor
R(?,t) before training the predictors.
PM10pred (t) R(?,t) y(t) y(t) is the
actual output of the ANN
R(?,t) cf(?1,t) f(?2,t) where
f(?,t)?k aksin(?t)bkcos(?t)
?1 2?/365 day-1 ?2 2?/7 day-1
Meteorological data are standardized as usual.

33
Prediction at 9 a.m. for the current day t

Deseasonalization allows to increase the average
goodness of fit indicators
As a term of comparison, a linear ARX predictor
results in ? 0.89 and MAE 11mg/m3

34
Prediction for the following day (t1)

To meet such an ambitious target, we added
further meteorological improper (i.e., unknown at
9 a.m. of day t) input variables, such as
rainfall, temperature, pressure etc. measured
over both day t and t1
The performances obtained in this way can be
considered as an upper bound of what can be
achieved by inserting actual meteorological
forecasts in the predictor
Pollutant time series have been again
deseasonalized via periodic regressor
We tried - besides trial and error - also a
different identification approach for neural
networks, namely pruning.

35
Pruned ANNs

The network showing the lowest validation error
is finally chosen as optimal
Pruned ANNs are parsimonious they contain one
order of magnitude less parameters than
fully-connected ones

36
Results

Performances of the two models are very close to
each other, decreasing strongly with respect to
the 1-day case
As a term of comparison, the network trained
without improper meteorological information
looses just a few percent over the different
indicators, showing an almost irrelevant gap

37
Conclusions on PM10

Performances on the 1-day prediction appears to
be satisfactory in this case, the system can be
really operated as a support to daily decisions
(traffic blocks, alarm diffusion,).
Deseasonalization of data before training the
predictors seems to be helpful in improving the
performances.
2-days forecast are disappointing, even if
improper meteo data are introduced. Performance
differences between pruned and fully connected
neural networks are negligible.
More comprehensive meteorological data (vertical
profiles, mixing layer) may be more substantial
than training methods in improving the quality of
longer term forecasts.

38
Other network architectures
RECURRENT NETWORKS some of the outputs are fed
back to the input at the following iteration.
Used in various fields (see for instance
www.idsia.ch/juergen/rnn.html).
PROBLEM how to train the network? Possible
solution limit the number of possible iterations
39

AUTOASSOCIATIVE NETWORKS
They are trained to reproduce the input with very
few neurons in one hidden layer.
They may be used to detect input characteristics
(as Principal Components).
They can highlight non linear links between input
variables.

They can also be useful, for instance, to
diagnose faults in a sensor network (a broken
sensor gives values different from those of the
network output).

Write a Comment

User Comments (0)