Program for North American Mobility in Higher Education - PowerPoint PPT Presentation

1 / 48
About This Presentation
Title:

Program for North American Mobility in Higher Education

Description:

Title: PowerPoint Presentation Author: lafourcs Last modified by: Agnes Devarieux-Martin Created Date: 7/25/2001 7:57:15 PM Document presentation format – PowerPoint PPT presentation

Number of Views:171
Avg rating:3.0/5.0
Slides: 49
Provided by: lafo2
Category:

less

Transcript and Presenter's Notes

Title: Program for North American Mobility in Higher Education


1
NC STATE UNIVERSITY
Program for North American Mobility in Higher
Education Introducing Process Integration for
Environmental Control in Engineering
Curricula MODULE 17 Introduction to
Multivariate Analysis
Created at Ecole Polytechnique de Montreal
North Carolina State University, 2003.
2
2.4 Example (3) Shorter Timescales
3
Shorter timescales
The previous two examples used daily averages for
the 130 process variables. However, we could
just as easily have chosen weekly averages,
monthly averages, or several other options. We
could also have chosen shorter timescales, such
as 8-hour averages or 30-minute averages.
Obviously, at some point the number of
observations will become unmanageable. For
instance, a spreadsheet with 3 years worth of
1-minute averages would have over a million lines.
   
   
Simply by choosing the timescale, you are already
influencing your MVA results.
Example 3
4
Choosing a timescale
The first thing we need to understand is what
timescales are available. For the TMP process we
have been studying, the shortest possible time
period between two logged values is 10 seconds
(note that not all tags are updated this
frequently). Several key values, such as
wood and pulp characteristics, are only measured
every few hours as shown above. These tags will
be of little or no use at a very short timescale.

IMPORTANT CONCEPT Some variables can only be
studied at longer timescales, others at shorter
timescales, depending on their sampling/logging
frequency.
Example 3
5
Shortest possible timescale
For the purposes of illustration, we will use the
shortest possible timescale in this example,
namely 10 seconds. Because some tags are updated
less frequently, we will use interpolated values
for all variables, which may or may not represent
reality. 10 seconds To keep the size of the
dataset manageable, we have taken these data over
a 24-hour period, which corresponds to
around 9,000 observations. Because we have over
100 tags, the resulting dataset has about one
million values.
A million values per day, for only one section of
the papermaking process - if we were to include
the entire industrial plant over several years,
we would have to analyse billions of datapoints.
Example 3
6
PCA of entire 24-hour period
Simca found numerous components ? retained 3
The PCA for the entire 24-hour period shows quite
a strong model, with a cumulative R2 over 60.
This is misleading, however. As shown on the
score plot, there is a major process excursion
which has totally skewed the MVA results.
Example 3
7
Major process excursion
Major process excursion from 8h15 to 8h45
A review of the original data indicates that
production dropped below 10 t/d during a
ten-minute period (815 to 825). The cause was
a major refiner blockage known as a feedguard
event, which makes the refiner motor shut down.
Example 3
8
Exclude process excursion
The process excursion sticks out like a sore
thumb on the score plot. This means that the
process temporarily went to a radically different
place or operating regime, where relationships
between the variables are different. Trying
to do PCA on several different operating regimes
all at once is a waste of time. The software
will try to establish the correlations between
the different variables, and if these
correlations change abruptly the results will be
useless. The way to get around this problem is
to divide the observations into different
operating regimes, and study each regime
separately. In this case we will remove the
low production period to prevent it from skewing
the rest of the results.
Sticking out like a sore thumb or a solar flare
Example 3
9
PCA with process excursion removed
We removed the entire period when the process was
perturbed (810 to 845) and did a PCA on the
rest of the observations. Interestingly,
the R2 values went down slightly. This is
because many of the variables changed abruptly
all together when the process was shut down,
making it look like they were correlated with
each other. Remember, MVA knows nothing about
the process, and just uses the data as it is.
Example 3
10
Score plot of normal operation
Now that we have removed the process upset, the
score plot takes on an entirely different
character. There is now an obvious time trend.
During our 24-hour period, the process snakes
around in multi-dimensional space. It is a
moving target. Almost all process data show
this characteristic, because a real process is
never really in steady state. The process
control systems are constantly responding to
outside perturbations, like changes in feed
material quality. Operator intervention is
another source of perturbation. There are many
others. One operating goal is to maintain the
snake within a certain desirable zone.
Whereas score plots for longer, averaged periods
generally resemble clouds, score plots for short
timescales resemble snakes.
Example 3
11
Score plot showing time trend
Start0100
End 0059
Obvious time trend
Example 3
12
What is the significance?
This snaking of the process at short timescales
is highly significant. This was not seen when
using the daily averages. By looking at which
variables are changing with time, we can get
tremendous insight into the process dynamics.
One way to do this is to compare the contribution
plots (like we saw in Example 2) at different
times. Contribution plots for the start and end
points of our 24-hour period are shown on the
next page. Obviously it is impossible to read
the names of all the variables, but that is not
the point. Just look at the bar graphs. They
are very different, indicating a continuous
change in operating regime from start to finish.
Example 3
13
Time trend within the process
Contribution plots
0100
0059
14
Studying the snake
To gain further insight, we can colour-code the
observations on the score plot. We did something
similar in Example 1, when we colour-coded the
days to show the seasons. This is very easy to
do with modern MVA software. In this case, we
have modified the score plot to show which range
that observation falls in for one of the
variables. In this case we have chosen
freeness, an important pulp quality parameter
which the process control systems try to maintain
at a constant value. We could have chosen any
variable. Note that during the course of our
24-hour period, the freeness starts high, then
gets lower, then goes back up again. Someone
with an intimate knowledge of the process could
gain insight from this result.
Example 3
15
Score plot coloured for freeness
Exactly the same score plot, coloured for pulp
freeness
Example 3
16
Score plot in 3-D
Same plot, showing 3rd component
Component 3
Component 1
Component 2
Example 3
17
MVA foresight
Another powerful use of MVA over short timescales
is to predict problems before they become more
widely visible. The residuals plot on the next
page tells the whole story. Remember we said
that the refiner shut down at 815 due to a
blockage? It is obvious that the process started
to move away from normal operation well before
then. The operators tend to look at a handful of
key variables when monitoring the process, but
MVA looks at all the variables at the same time
and is therefore much more sensitive. An analogy
would be a seismometer being used to predict
volcanic eruptions.
A seismometer is extremely sensitive to the
slightest vibrations.
Example 3
18
Residuals plot showing MVA foresight
Build-up to 8h15 something is happening to the
process!
Example 3
19
Using shorter timescales
  • By now it should be clear that doing MVA at a
    shorter timescales is totally different to
    studying averages taken over longer timespans.
    Once again, we conclude that the best solution is
    to try many different approaches. No single MVA
    approach will provide all the answers we are
    seeking.
  • Part of the power of this technique is the way
    completely different results can be obtained from
    exactly the same database, simply by slicing and
    dicing the data in various ways
  • Longer vs. shorter timescales
  • More vs. fewer variables
  • PCA vs. PLS
  • MVA is just a black box. Its use MUST be
    driven
  • by an understanding of the process being studied,
  • otherwise it is just meaningless number-crunching.

Number Cruncher
Example 3
20
End of Example 3 One step at a time
21
End of Tier 2
Congratulations! This is the end of Tier 2.
Obviously the details of these examples are hard
to grasp for a first-timer, but hopefully some of
the overall patterns are starting to emerge. A
true understanding of MVA can only come by
actually doing it on your own, which is the
purpose of Tier 3. All that is left is to
complete the short quiz that follows
22
Tier 2 Quiz
  • Question 1
  • What is the difference between a tag and a
    variable?
  • The words tag and variable are synonyms.
  • A tag is an identity label or address, while a
    variable is an attribute of the process.
  • Tags change with time, but variables are fixed.
  • Variables measure similar attributes, while tags
    measure dissimilar attributes.
  • Answers (b) and (c).

Tier 2 Quiz
23
Tier 2 Quiz
  • Question 2
  • Does averaging reduce or increase noise?
  • Averaging increases noise significantly.
  • Averaging increases noise, but only slightly.
  • Averaging does not affect noise.
  • Averaging reduces noise.
  • Averaging reduces noise, but increases the
    likelihood of outliers.

Tier 2 Quiz
24
Tier 2 Quiz
  • Question 3
  • What is the danger of interpolating between
    readings that are far apart in time?
  • The interpolation will give far more weight to
    these individual readings than they deserve.
  • The interpolated values will indicate slow upward
    and downward trends where there are none.
  • The effect of outliers will be enhanced
    many-fold.
  • The engineer will have the false sense of
    comparing variables that are similar, when in
    fact they are very different.
  • All of the above.

Tier 2 Quiz
25
Tier 2 Quiz
  • Question 4
  • If interpolation is such a problem, then why
    cant we just use the discrete values instead?
  • This would give far too much weight to periods
    with a large number of discrete values.
  • Discrete values must be averaged to have meaning.
  • No tag is ever truly discrete.
  • Discrete values have no time signature.
  • Answers (b) and (c).

Tier 2 Quiz
26
Tier 2 Quiz
  • Question 5
  • What is the difference between a process lag and
    a delayed reading?
  • One is caused by the process itself, the other by
    the measurement instruments.
  • They are the same thing.
  • A process lag is due to residence time, while a
    delayed reading is due to the time required for
    sampling, measurement and recording.
  • One is much longer than the other.
  • Answers (a) and (c).

Tier 2 Quiz
27
Tier 2 Quiz
  • Question 6
  • Why does the MVA software reject variables that
    do not change enough with time?
  • Only variables which are part of the experiment
    are permitted.
  • Tags change with time, but these variables are
    fixed.
  • There are insufficient data points.
  • If a variable does not change with time, then it
    cannot be correlated to any other variables.
  • None of the above.

Tier 2 Quiz
28
Tier 2 Quiz
  • Question 7
  • What should you do if your initial PCA gives a
    score plot with two distinct and separate data
    clouds?
  • Study each data cloud separately.
  • Try to determine what these two clouds represent.
  • Ignore the first component, which is probably
    being artificially induced by the two clouds.
  • Do an MVA on the entire dataset.
  • Answers (a), (b) and (c).

Tier 2 Quiz
29
Tier 2 Quiz
  • Question 8
  • Your residual (DModX) plot shows several
    moderate outliers. What should you do?
  • Remove them and continue.
  • Leave them in and continue.
  • Study their contribution plots.
  • Look at the original data to try to determine the
    cause.
  • Answers (c) and (d).

Tier 2 Quiz
30
Tier 2 Quiz
  • Question 9
  • Two variables are located in opposite corners of
    your PCA loadings plot (components 1 and 2).
    What do you conclude?
  • These variables are uncorrelated with each other.
  • These variables are negatively correlated with
    each other.
  • These variables contribute to both the first and
    second components.
  • These variables contribute to neither the first
    nor the second component.
  • Answers (b) and (c).

Tier 2 Quiz
31
Tier 2 Quiz
  • Question 10
  • Theoretically, on average what proportion of
    residuals should be above the 95 confidence
    line? (the red line on the DModX plot)
  • Exactly 0.05
  • Exactly 5.
  • More than 5.
  • Less than 5.
  • Depends on the dataset.

Tier 2 Quiz
32
TIER 3 Open-Ended Problem
33
Tier 3 Statement of Intent
  • Tier 3 Statement of intent
  • The goal of Tier 3 is to finally allow the
    student to do MVA independently, though in a
    controlled context. At the end of Tier 3, the
    student should know how to do the following
  • Prepare a spreadsheet for use in MVA
  • Import spreadsheet into MVA software
  • Set up dataset within MVA software
  • Create simple PCA plots
  • Identify and investigate major and moderate
    outliers
  • Create and interpret more elaborate PCA plots
  • In order to avoid losing the student along the
    way, each of these steps is broken down into a
    series of sub-steps with clear instructions.

Open Problem
34
Tier 3 Contents
Tier 3 is broken down into four sections 3.1
Problem Statement and Dataset 3.2 Preparing and
Importing the Spreadsheet 3.3 Initial MVA
Results 3.4 Outliers and More Elaborate MVA
plots Unlike the previous two sections, Tier 3
has no quiz. The student must submit the results
of the above work in a succinct project report
(10-15 pages).
Open Problem
35
3.1 Problem Statement and Dataset
Open Problem
36
Problem Statement
Your are the process engineer at the TMP mill
from the Tier 2 examples. Your boss, the plant
manager, wants to know why the pulp has different
properties in the summer than in the
winter. You decide to start by
generating PCA results for two different
datasets, one taken during the summer, the other
during the winter, and then comparing them to
each other.
Open Problem
37
Summer/Winter datasets
  • After talking to the operators, you decide to
    take two full weeks of data for 15 key tags,
    using 1-hour averages.
  • Your data have already been imported by an IT
    technician into a standard spreadsheet software.
    The two files are
  • Summerdata.xls
  • Winterdata.xls
  • Open these files, and have a look at the data.
    Can you tell anything about the summer/winter
    question just by looking?
  • Of course not!

These are the actual data files you are going to
use!
Open Problem
38
3.2 Preparing and Importing the Spreadsheet
Open Problem
39
Preparing the spreadsheet
  • As you can see, the spreadsheet has two names for
    each variable
  • long descriptive name, and
  • short tag for easy identification on the MVA
    graphs.
  • We want to do something similar with the
    individual observations. The full time signature
    is too long, and will make the score plots
    impossible to read. Besides, we already know
    which year and month it is. This is not useful
    information. We therefore want to insert a
    column to the right of the time signature, which
    gives the number of hours from the start of the
    two-week period.
  • Do this now, for both spreadsheets. When you are
    done, save them under a new name.

Open Problem
40
Importing the spreadsheet
Now we are ready to open the MVA software. Do it
now. The first thing we need to do is import the
data. Go to File import data, and select your
newly renamed file for summer. The software will
ask you a series of questions. Answer them
according to the instructions on Page 2 of the
spreadsheet file. One of these steps involves
saving the new dataset as an MVA file. Repeat
this operation for the winter spreadsheet.
Open Problem
41
3.3 Initial MVA Results
Open Problem
42
Initial MVA results
  • Re-open the summer file, and create the following
    plot
  • Model bar chart
  • How many components does the software suggest?
    Usually for this kind of initial exercise,
    keeping 3 components is normal. Eliminate the
    components you do not intend to use.
  • Now create the following basic PCA plots
  • Score plots t(1) vs. t(2)
  • What do you notice about the results? Right!
    There are major outliers.
  • Now do the same for the winter dataset.

Copy it by right-clicking and import it into your
word processor file. All these plots must appear
in your report.
Open Problem
43
3.4 Outliers and More Elaborate MVA Plots
Open Problem
44
Investigating Outliers
  • The summer data contains a major process
    excursion that is clearly visible on the score
    plot. Looking at the original data, try to
    determine the cause.
  • Once you are satisfied, remove the outliers and
    save the new model.
  • The winter data looks OK on the score plot, but
    that is not the entire story. Generate the
    following residuals plot
  • DModX
  • What do you notice? Right! There is one major
    outlier. Create a contribution plot to
    investigate
  • Contribution plot
  • What do you conclude? Remove this point and
    continue.

Open Problem
45
Comparing Summer and Winter
  • Now we are ready to compare the summer and winter
    results. Create the following basic PCA plots
  • Score plots t(1) vs. t(2) t(1) vs. t(3) 3-D
    plot
  • Loadings plot p(1) vs. p(2) p(1) vs. p(3) 3-D
    plot
  • Do you notice any major differences between
    summer and winter?
  • Of course you do! What are they?
  • And what does this imply about the cause of the
    summer/winter process differences?

Open Problem
46
Drawing your conclusions
Now you have something to report to your boss
Open Problem
47
More Elaborate MVA Plots
  • To get familiar with some of the other MVA
    outputs, create the following for the final
    summer and winter datasets
  • DModX
  • X/Y Contribution Plot
  • Residuals distribution
  • What do these plots indicate to you? Dont worry
    about finding the right answer, just try to
    figure out what these plots are trying to tell
    us. However, you must justify your answers.
    Dont just guess.

Dont just guess!
Open Problem
48
End of Tier 3

Congratulations! This is the end of
Module 17. Please submit your report to your
professor for grading. We are always interested
in suggestions on how to improve the course. You
may contact us as www.namppimodule.org
Write a Comment
User Comments (0)
About PowerShow.com