Title: A Methodology Using Support Vector Machines for Shortterm Load Forecasting
1A Methodology Using Support Vector Machines for
Short-term Load Forecasting
2Outline
- Introduction Load forecasting problem
- Existing models and approaches
- Support Vector Machines
- Implementation of Support Vector Methodology
- Comparison with other approaches
- Results and conclusions
3Objectives of this research
- To investigate the applicability of Support
Vector Machines (SVM) methodology for short-term
load forecasting - To obtain comparisons of the SVM with existing
approaches - To implement advances in model selection
approaches in order to improve short-term load
forecasts
4Electric Power Grid
- Complex interactive network
- Vulnerable to cascading failures
- Extremely complex behavior
- Multi-scale time hierarchy
5Load Forecasting Problem
How to improve the power grid operation?
- Agent-based anticipatory distributed control
- Robust adaptive and reconfigurable management
- Load forecasting for scheduling of generating
capacities, system security assessments, and
planning
The quality of short term hourly load forecasts
can improve the efficiency of operation of many
electric utilities
6Forecasting Methods
- Expert Judgments
- Linear Models
- Linear Regression
- Ridge Regression
- Nonlinear Models
- Artificial Neural Networks
- Nonlinear Regression
- Support Vector Machines
7Existing Models
Used by Power Utilities (e.g.,ComEd, An Exelon
Company)
- Classical forecasting scheme (expert judgments )
- ANNSTLF
- Artificial Neural Network Short-Term Load
Forecaster - Others
8Empirical Risk Minimization (ERM)
Learning machine
Generator of samples
x
System
- To minimize the error on the training sample with
the expectation that this will give the best
result in the future
The empirical risk can be reduced to 0 if L(z,a)
has sufficient capacity
9Consistency of ERM principle
- The empirical risk uniformly converges to the
actual (true) risk functional as
Necessary and sufficient condition for the
consistency of ERM principle
10Linear Models
- Gives Ordinary Least Squares Solution
- Completely developed theory
- It fails when the data is substantially nonlinear
- When data has severe collinearity the
regularization is required ( Ridge regression ) - Performs really well in many cases
11Neural Models
- Nonlinear regression/classification
- Supervised training
- Back-propagation
- Multiple minima problem
- Slow rate of convergence
- Variety of heuristic approaches tested
Activation function
12Neural Networks
- Perform nonlinear optimization
- Inherently ill-posed problem
- The set of approximating functions is limited by
back-propagation training - Final solution lacks interpretation
- No unifying theory
- They work!
- Hardware implementation is possible
- Modular structure
13Vapnik-Chervonenkis dimension(VC-dimension)
- The VC-dimension of a set of indicator functions
Q(z,a) is equal to the largest number h of
vectors z1,,zN that can be separated in all the
2h using this set of functions
- The VC-dimension is a scalar value, which
measures the capacity of a set of functions - For certain sets the upper bound of VC-dimension
can be calculated analytically - Bounded VC-dimension is a necessary condition for
ERM principle to be consistent
14Model Selection
y
x
How to choose a set of functions properly?
15Structural Risk Minimization (SRM)
- Selecting a subset of a structure with optimal
complexity - Estimating the parameters of the model from this
subset
16Separating Hyperplane
- Linear separation is performed by a hyperplane
17Optimal Separating Hyperplane
- Assuming that margin ? exists
- Optimal hyperplane maximizes margin ?
- Maximizing of margin is equivalent to minimizing
the norm of w
18Optimization problem
- Optimization problem
- Unconstrained problem with Lagrange multipliers
- Using Kuhn-Tucker theorem conditions
19Dual problem
Optimization problem
Separating hyperplane
- This optimization problem can be solved using
standard quadratic programming methods
20Nonseparable case
- It is desirable to separate data with a minimal
number of errors - Positive slack variables ?i can be introduced in
the defining conditions of the hyperplane
21Support Vector Machine
- Vector X is mapped into a high-dimensional
feature space - After the transformation the optimal separating
hyperplane is to be built in the high dimensional
space ?
22Support Vector Machine (continued)
(dual form)
- If Mercer condition is satisfied, then inner
product in the Hilbert space has representation
Optimization problem
23Kernels
- The mapping into ? can be represented by kernel
function - With proper selection of K dot product can be
calculated in the low dimensional input space
24?-insensitive Loss Function
?-insensitive loss function
- Provides the best approximation for the worst
possible noise density - Robust regression under more relaxed assumptions
symmetric convex density of the noise
25Support Vector Regression
Primal form
Dual form
26Data Description
- Real world data were provided by ComEd? (An
Exelon Company) - Hourly loads starting from January 1, 1999
through September 10, 2000 - Training data set actual loads from January 1999
to January 2000
27Software
- SVMTorch
- Collobert and Bengio IDIAP (Dalle Molle Institute
for Perceptual Artificial Intelligence),
Switzerland Unix, C - mySVM Version 2.1.1
- Stefan Rüping, University of Dortmund
Unix\Windows , C - MATLAB SVM Toolbox
- Steve Gunn, Department of Electronics and
Computer Science University of Southampton,
United Kingdom, Unix, Matlab
28ANN forecast
- ANN with 32 neurons in the hidden layer was
designed and tested
29SVM model
- Radial Kernel
- Historical load data
- Information about the day of the week
30SVM forecast
31Parameter ?
- Rigorous choice of epsilon still an open issue
32Parameter C
- Parameter C controls the VC-dimension of the
learning machine - SRM can be employed on a set of functions defined
by parameters C
33SRM principle implementation
34Comparison of models performance
35Comparison of models performance
36Results
- Quadratic optimization is in the core of the
method final result is unique - Tractable solution support vectors are the
crucial training points - All useful information contained in the data set
is summarized by support vectors - Solid and general theoretical foundation
- Installed capabilities for model selection
37SVMs shortcomings
- Computationally slower than neural networks
- Require to choose regularization parameters
- Uncertainty in the choice of kernel
38Findings
- The SVM load forecasting model was designed and
tested - Structural Risk Minimization was used in model
selection - Approaches to the choice of regularization
parameters were proposed and tested
39Challenging Issues
- Design of a SVM model for higher embedding
dimension - The choice of a kernel function
- Computational expense
- Design of SVM load forecasting model for full set
of input parameters
40Conclusion
- The method of structural risk minimization
provides a powerful procedure for the learning
machine design - SVM is a promising
- nonlinear regression technique
- The notion of VC-dimension is elegant,
theoretically solid, and constructive - An application of SV method gives promising
results