Title: Shell 2006
1Shell 2006
- Advisor John R. Williams
- Research Assistant Ching-Huei Tsou
2Agenda
- Summery of Our Previous / Ongoing Work
- Focus of the 2006 Project
- Optimization through Measure-Model-Control Loop
- Building Realistic Models
- Examples
- Next Steps
3Summary of our Previous / Ongoing Work
- A generic framework for data management, data
mining, and model building - XML metadata tagging for easy searching,
manipulating, and data integration and
assimilation - Web-Service based architecture which is secure,
reliable, scalable, and platform independent - Developing robust learning algorithms suitable
for analyzing large volume of incomplete and
inaccurate data. - A high-performance machine learning library
implementing the state-of-the-art learning
algorithms - Applied the methodology of optimization through
measure-model-control loop to both simple and
real-world systems - Toy applications
- Standard benchmark problems
- Structural health monitoring
- Proactive stocking in retail stores
4Agenda
- Summery of Our Previous / Ongoing Work
- Focus of the 2006 Project
- Optimization through Measure-Model-Control Loop
- Building Realistic Models
- Examples
- Next Steps
5Focus of the 2006 Project
- We have been working on developing generic
framework which helps identifying and optimizing
systems through modeling, learning and
simulation. - We will focus on apply the techniques on specific
system and types of data, possible directions
are, as suggested by Dr. Jan Dirk Jansen, - Production data management and criticality
- in a global system optimization context
- impact on maintenance
- Integration of maintenance and logistics in a
total (producing) system optimization - With the general objectives in mind, a natural
extension of our previous work is to obtain real
production data, build a realistic, dynamic
updating model based on both domain knowledge and
the data, and put the model in the optimization
loop to help identifying critical processes and
events. - We have successfully applied this methodology to
several systems and achieved promising results.
How it works in a reservoir and how it compares
to Shells existing approach is not yet studied,
and thus one of the primary goal for next year.
6Agenda
- Summery of Our Previous / Ongoing Work
- Focus of the 2006 Project
- Optimization through Measure-Model-Control Loop
- Building Realistic Models
- Examples
- Next Steps
7Optimization through Measure-Model-Control Loop
- Shell Smart Well Technology
- Well with down-hole instrumentation, such as
sensors, valves and inflow control devices
installed on the production tubing - A smart well management system can be modeled as
a closed-loop Measure-Model-Control system - Measure monitoring the fluid flow rates,
temperature, and pressures of the well with
down-hole sensors - Model a mathematical model represents the
current status of the well - Control adjusting the valves and inflow control
devices installed on the production tubing to
optimize the production - The goal is to achieve global system optimization
8Optimization through Measure-Model-Control Loop
- The loop can also be thought of as a
learning-feedback loop - In particular, we focused on building a realistic
model of the system - Robust against noisy, missing data
- Take advantages from both domain knowledge and
data - Dynamically updating when new measurements are
available
9Agenda
- Summery of Our Previous / Ongoing Work
- Focus of the 2006 Project
- Optimization through Measure-Model-Control Loop
- Building Realistic Models
- Examples
- Next Steps
10Building Realistic Models
- Traditionally, two different approaches are
commonly accepted in modeling - Explicit Modeling
- Extensive domain knowledge is required
- Statistical Modeling
- Large amount of high quality data is required
- Proposed approach A hybrid model
- Synergy Combining Prior Knowledge and Data
- A Machine learning approach
- Prior knowledge is used to guide the learning
process, when less data is available or the
quality of data is suboptimal. - When consistent data is abundant, it can override
an incorrect assumption of prior knowledge
11Analytical vs. Inductive
- Perfect knowledge
- Given any Xi we can calculate Yi using the
equation
- Data Knowledge
- Imperfect knowledge smooth curve
- Limited data
- Complete Data
- Given any Xi we can look-up Yi from the dataset
12What is New?
- Creating a parametric model and determining the
values of the parameters using observed data,
havent we been doing this for ages? - True. But this is not the whole story
- In the simplest case perform a linear regression
on a set of samples - Data the set of samples
- Prior Knowledge the underlying distribution is
linear - Model linear function
- Parameters slope and offset of the line
- Cost function least square error
13Training a Model
- Prior Distribution Linear
- Oversimplified model
- Large training error, cannot generalize well
- Prior Distribution Mixture Gaussian
- Overly complex model
- Overfitting, small training error, does not
generalize well either
14Generative vs. Discriminative (1/2)
- For a learning problem
- Y target function models which generate X
- X observed data
- We want to find a Y which maximize P(YX)
- The most probable model Y after we have seen data
set X - Generative approach
- Use Bayes rule P(YX) P(XY) P(Y) / P(X)
- P(XY) The probability of observing X,
which is generated by a model Y - P(Y) The probability of model Y is true (prior
believe) - P(X) constant when we maximize w.r.t. Y
- Discriminative approach
- Find a Y which maximize P(YX) directly (e.g.
SVMs find a maximum margin hyperplane in the
feature space)
15Discriminative vs. Generative (2/2)
- Examples of generative learning approaches
- Naïve Baye
- Bayesian Belief Network
- Hidden Markov Model
- Graphical Model
- Examples of Discriminative approaches
- Nearest Neighbors
- Artificial Neural Networks
- Support Vector Machines
- The idea of guiding the learning process using
prior knowledge is not new. In fact, it is the
foundation of many generative learning
approaches. - On the other hand, discriminative approaches
usually generalize better (more accurate) but the
link between data and prior knowledge is not
clear (ongoing research area)
16Agenda
- Summery of Our Previous / Ongoing Work
- Focus of the 2006 Project
- Optimization through Measure-Model-Control Loop
- Building Realistic Models
- Examples
- Handwriting Recognition
- Structural Health Monitoring
- Proactive Stocking
- Next Steps
17Example Handwriting Recognition
- MNIST
- Standard pattern recognition benchmark problem at
ATT - training set of 60,000 examples
- test set of 10,000 examples
- Classification Test Error
- Neural Network (NN)
- 1 Layer, No Hidden Unit (HU) 12 (LeCun et al.
1998) - 2 Layers, 800 HU 1.6 (Simard et al., ICDAR
2003) - Support Vector Machine (SVM), Gaussian Kernel
1.4 - Record Low 0.4
- SVM Prior Knowledge
- Prior Knowledge invariant in small rotation
and/or translation - Test Error 0.56
18Agenda
- Summery of Our Previous / Ongoing Work
- Focus of the 2006 Project
- Optimization through Measure-Model-Control Loop
- Building Realistic Models
- Examples
- Handwriting Recognition
- Structural Health Monitoring
- Proactive Stocking
- Next Steps
19Structural Health Monitoring
- Analytical Approach
- Finite element analysis
- Extensive domain knowledge is required
- Equations governing the vibration of the
structure - Dimensions, material, mass, external forces,
damping - Those information is usually not available in
real-world problems - Inductive Approach
- Statistical modeling (e.g. auto-regression model,
Y. Lei, et al, 2003) - Only the acceleration responses data is required
- However, the accuracy of auto-regression model is
limited - Last year we applied SVM to the problem and
achieved better results - Now, combining support vector machine with prior
knowledge we can achieve much higher accuracy - Prior knowledge used
ASCE Benchmark Problem
20Structural Health Monitoring
- Support vector regression without prior knowledge
- 20 features
- a1(t-1), a1(t-2), , a1(t-20)
- Support vector regression with prior knowledge
- 5 features
- a1(t-1), v1(t-1), a2(t-1), v2(t-1), P(t)-P(t-1)
- Smaller error indicates a more realistic model
21Agenda
- Summery of Our Previous / Ongoing Work
- Focus of the 2006 Project
- Optimization through Measure-Model-Control Loop
- Building Realistic Models
- Examples
- Handwriting Recognition
- Structural Health Monitoring
- Proactive Stocking
- Next Steps
22Example 3 Proactive Stocking
- Keeping products in-stock is one of the most
important issues in retail stores - Traditionally, a store manager / associate
replenish a product when the quantity of the
product on the sales floor is below the desired
stock level - Checking the stock level in sales floor is a time
consuming manual process and is prone to error - Often a low stock level is not observed until the
product is completely out-of-stock, and the store
has been losing sales
23Example 3 Proactive Stocking
- Based on the store process, data collected from
point of sales, sales floor and backroom, a store
model can be established. - Knowing the daily sales, maximum shelf capacity
and total on-hand product quantity, the model can
infer the stock level in sales floor. - Reduce the time consuming zoning work
- Replenish the product before it is out-of-stock
- Prioritize the replenishment
- Again a measure-model-control optimization problem
24Explicit Retail Store Model
- Given the complexity and the human-centric nature
of operations in a chain store, statistical
learning is necessary in addition to the explicit
model.
25Agenda
- Summery of Our Previous / Ongoing Work
- Focus of the 2006 Project
- Optimization through Measure-Model-Control Loop
- Building Realistic Models
- Examples
- Next Steps
26Next Steps
- We have successfully applied this methodology to
several systems and achieved promising results.
How it works in a reservoir and how it compares
to Shells existing approach is not yet studied,
and thus the primary goal for next year. - Input from Shell
- Processes involved in the production
- Production data
- Model Building
- Modeling the smart well system based on process
analysis - Learning the model / parameters from the
production data - Running the simulation / prediction on test sets
- Comparing predicted outputs with Shells existing
system - Optimization
- Given an accurate model, we can
- Evaluate new processes
- Predict feedbacks from the system (after
influencing the system through certain control
mechanisms) - Identify critical events
- Data Management
27References
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner,
"Gradient-Based Learning Applied to Document
Recognition," Proceedings of the IEEE, vol. 86,
no. 11, pp. 2278-2324, Nov. 1998. - Patrice Y. Simard, Dave Steinkraus, John Platt,
Best Practice for Convolutional Neural Networks
Applied to Visual Document Analysis,
International Conference on Document Analysis and
Recogntion (ICDAR), IEEE Computer Society, Los
Alamitos, pp. 958-962, 2003. - Y. Lei, et al. An Enhanced Statistical Damage
Detection Algorithm Using Time Series Analysis,
in Proceedings of the 4th International Workshop
on Structural Health Monitoring. 2003. - Farrar, C.R., S.W. Doebling, and D.A. Nix,
Vibration-Based Structural Damage
Identification, Philosophical Transactions of
the Royal Society Mathematical, Physical
Engineering Sciences, 2001. 359(1778) p.
131-149. - T. Jaakkola and D. Haussler. Exploiting
generative models in discriminative classifiers,
In Advances in Neural Information Processing
Systems 11, 1998.
28Additional Slides
29Probability Pre-requirement Bayes Rule
- P(A) A / World
- P(B) B / World
- P(AB) (AB) / World
- P(AB) (AB) / B P(AB) / P(B)
- the probability of A is true in a world where B
is true - P(BA) (AB) / A P(AB) / P(A)
- Bayes Rule
- P(BA) P(A) P(AB) P(AB) P(B)
- ? P(BA) P(AB) P(B) / P(A)
30Generative vs. Discriminative (1/2)
- For a learning problem
- Y target function models which generate X
- X observed data
- We want to find a Y which maximize P(YX)
- The most probable model Y after we have seen data
set X - Generative approach
- Use Bayes rule P(YX) P(XY) P(Y) / P(X)
- P(XY) The probability of observing X,
which is generated by a model Y - P(Y) The probability of model Y is true (prior
believe) - P(X) constant when we maximize w.r.t. Y
- Discriminative approach
- Find a Y which maximize P(YX) directly (e.g.
SVMs find a maximum margin hyperplane in the
feature space)
31Generative vs. Discriminative (2/2)
- Both approaches are popular in various fields,
and each has its pros and cons - Generative Model
- Prior Knowledge can be added
- Examples Naïve Bayes, Hidden Markov Model,
Bayesian Network - Pros prior knowledge, missing values, less data
is required, variable attribute length - Cons Computational inefficient
- Discriminative Model
- Make no attempts to model underlying
distributions - Examples Nearest Neighbors, Neural Networks,
Support Vector Machine - Pros more accurate than generative approach,
performance is usually much better in large-scale
problem - Cons black-box, relationships between variables
are not explicit, need more data
32Bayesian Belief Network
- Bayes rule
- Naïve Bayes classifier
- Bayesian belief network
- Learning Bayesian belief network
- Missing data
- EM algorithm
33Support Vector Machine (1/2)
- Maximum margin classifier
34Support Vector Machine (2/2)
- Mapping to a higher dimension
35Solve / Train the Hybrid Model
- Design issues
- How to represent the arbitrary dynamics in terms
of attribute values (parameters) - How to estimate the probability required by the
classifier - Combine generative and discriminative learning
- New dot-product kernels derived from the system
dynamics model - Fisher kernels / Maximum entropy discrimination
- Not much has been done in this area