Predicting Electricity Distribution Feeder Failures using Machine Learning - PowerPoint PPT Presentation

1 / 45

About This Presentation

Title:

Predicting Electricity Distribution Feeder Failures using Machine Learning

Description:

Roger Anderson. Computer Science: Philip Gross. Rocco Servedio. Gail Kaiser. Samit Jain ... Luis Alonso. Joey Fortuna. Chris Murphy. Stats: Samantha Cook ... – PowerPoint PPT presentation

Number of Views:230

Avg rating:3.0/5.0

Slides: 46

Provided by: Mar1358

Category:

more less

Transcript and Presenter's Notes

Title: Predicting Electricity Distribution Feeder Failures using Machine Learning

1
Predicting Electricity Distribution Feeder
Failures using Machine Learning

Marta Arias 1, Hila Becker 1,2
1Center for Computational Learning Systems
2Computer Science
Columbia University
LEARNING 06

2
Overview of the Talk

Introduction to the Electricity Distribution
Network of New York City
What are we doing and why?
Early solution using MartiRank, a boosting-like
algorithm for ranking
Current solution using Online learning
Related projects

3
Overview of the Talk

Introduction to the Electricity Distribution
Network of New York City
What are we doing and why?
Early solution using MartiRank, a boosting-like
algorithm for ranking
Current solution using Online learning
Related projects

4
The Electrical System
5
Electricity Distribution Feeders
6
Problem

Distribution feeder failures result in automatic
feeder shutdown
called Open Autos or O/As
O/As stress networks, control centers, and field
crews
O/As are expensive ( millions annually)
Proactive replacement is much cheaper and safer
than reactive repair

7
Our Solution Machine Learning

Leverage Con Edisons domain knowledge and
resources
Learn to rank feeders based on susceptibility to
failure
How?
Assemble data
Train model based on past data
Re-rank frequently using model on current data

8
New York City
9
Some facts about feeders and failures

About 950 feeders
568 in Manhattan
164 in Brooklyn
115 in Queens
94 in the Bronx

10
Some facts about feeders and failures

About 60 of feeders failed at least once
On average, feeders failed 4.4 times
(between June 2005 and August 2006)

11
Some facts about feeders and failures

mostly 0-5 failures per day
more in the summer
strong seasonality effects

12
Feeder data

Static data
Compositional/structural
Electrical
Dynamic data
Outage history (updated daily)
Load measurements (updated every 5 minutes)
Roughly 200 attributes for each feeder
New ones are still being added.

13
Feeder Ranking Application

Goal rank feeders according to likelihood to
failure (if high risk place near the top)
Application needs to integrate all types of data
Application needs to react and adapt to incoming
dynamic data
Hence, update feeder ranking every 15 min.

14
Application Structure
15
Goal rank feeders according to likelihood to
failure
16
Overview of the Talk

Introduction to the Electricity Distribution
Network of New York City
What are we doing and why?
Early solution using MartiRank, a boosting-like
algorithm for ranking
Pseudo ROC and pseudo AUC
MartiRank
Performance metric
Early results
Current solution using Online learning
Related projects

17
(pseudo) ROC
outages
feeders
sorted by score
18
(pseudo) ROC
210
Number of outages
941
Number of feeders
19
(pseudo) ROC
1
Fractionof outages
Area under the ROC curve
1
Fraction of feeders
20
Some observations about the (p)ROC

Adapted to positive labels (not just 0/1)
Best pAUC is not always 1 (actually it almost
never is..)
E.g. pAUC 11/15 0.73
Best pAUC with this data is 14/15 0.93
corresponding to ranking 21000

outages
ranking
21
MartiRank

Boosting-like algorithm by Long Servedio,
2005
Greedy, maximizes pAUC at each round
Adapted to ranking
Weak learners are sorting rules
Each attribute is a sorting rule
Attributes are numerical only
If categorical, then convert to indicator vector
of 0/1

22
MartiRank
divide list in two split outages evenly
divide list in three split outages evenly
feeder list begins in random order
continue
sort list by best variable
choose separate best variables for each part,
sort
choose separate best variables for each part,
sort
23
MartiRank

Advantages
Fast, easy to implement
Interpretable
Only 1 tuning parameter nr of rounds
Disadvantages
1 tuning parameter nr of rounds
Was set to 4 manually..

24
Using MartiRank for real-time ranking of feeders

MartiRank is a batch algorithm, hence must deal
with changing system by
Continually generate new datasets with latest
data
Use data within a window, aggregate dynamic data
within that period in various ways (quantiles,
counts, sums, averages, etc.)
Re-train new model, throw out old model
Seasonality effects not taken into account
Use newest model to generate ranking
Must implement training strategies
Re-train daily, or weekly, or every 2 weeks, or
monthly, or

25
Performance Metric

Normalized average rank of failed feeders
Closely related to (pseudo) Area-Under-ROC-Curve
when labels are 0/1
avgRank pAUC 1 / examples
Essentially, difference comes from 0-based pAUC
to 1-based ranks

26
Performance Metric Example
ranking
outages
pAUC17/240.7
27
How to measure performance over time

Every 15 minutes, generate new ranking based on
current model and latest data
Whenever there is a failure, look up its rank in
the latest ranking before the failure
After a whole day, compute normalized average rank

28
MartiRank Comparison training every 2 weeks
29
Using MartiRank for real-time ranking of feeders

MartiRank seems to work well, but..
User decides when to re-train
User decides how much data to use for re-training
. and other things like setting parameters,
selecting algorithms, etc.
Want to make system 100 automatic!
Idea
Still use MartiRank since it works well with this
data, but keep/re-use all models

30
Overview of the Talk

Introduction to the Electricity Distribution
Network of New York City
What are we doing and why?
Early solution using MartiRank, a boosting-like
algorithm for ranking
Current solution using Online learning
Overview of learning from expert advice and the
Weighted Majority Algorithm
New challenges in our setting and our solution
Results
Related projects

31
Learning from expert advice

Consider each model as an expert
Each expert has associated weight (or score)
Reward/penalize experts with good/bad predictions
Weight is a measure of confidence in experts
prediction
Predict using weighted average of top-scoring
experts

32
Learning from expert advice

Advantages
Fully automatic
No human intervention needed
Adaptive
Changes in system are learned as it runs
Can use many types of underlying learning
algorithms
Good performance guarantees from learning theory
performance never too far off from best expert in
hindsight
Disadvantages
Computational cost need to track many models in
parallel
Models are harder to interpret

33
Weighted Majority Algorithm Littlestone
Warmuth 88

Introduced for binary classification
Experts make predictions in 0,1
Obtain losses in 0,1
Pseudocode
Learning rate as main parameter, ß in (0,1
There are N experts, initially weight is 1 for
all
For t1,2,3,
Predict using weighted average of each experts
prediction
Obtain true label each expert incurs loss li
Update experts weights using wi,t1 wi,t
pow(ß,li)

34
In our case, cant use WM directly

Use ranking as opposed to binary classification
More importantly, do not have a fixed set of
experts

35
Dealing with ranking vs. binary classification

Ranking loss as normalized average rank of
failures as seen before, loss in 0,1
To combine rankings, use a weighted average of
feeders ranks

36
Dealing with a moving set of experts

Introduce new parameters
B budget (max number of models) set to 100
p new models weight percentile in 0,100
? age penalty in (0,1
When training new models, add to set of models
with weight corresponding to pth percentile
(among current weights)
If too many models (more than B), drop models
with poor q-score, where
qi wi pow(?, agei)
I.e., ? is rate of exponential decay

37
Other parameters

How often do we train and add new models?
Hand-tuned over the course of the summer
Every 7 days
Seems to achieve balance of generating new models
to adapt to changing conditions without
overflowing system
Alternatively, one could train when observed
performance drops .. not used yet
How much data do we use to train models?
Based on observed performance and early
experiments
1 week worth of data, and
2 weeks worth of data

38
Performance
39
Failures rank distribution
Summer 2005
Autumn 2005
Spring 2006
Winter 2005-06
Summer 2006
40
Daily average rank of failures
Summer 2005
Autumn 2005
Winter 2005-06
Spring 2006
Summer 2006
41
Other things that I have not talked about but
took a significant amount of time

DATA
Data is spread over many repositories.
Difficult to identify useful data
Difficult to arrange access to data
Volume of data.
Gigabytes of data accumulated on a daily basis.
Required optimized database layout and the
addition of a preprocessing stage
Had to gain understanding of data semantics
Software Engineering (this is a deployed
application)

42
Current Status

Summer 2006 System has has been debugged,
fine-tuned, tested and deployed
Now fully operational
Ready to be used next summer (in test mode)
After this summer, were going to do systematic
studies of
Parameter sensitivity
Comparisons to other approaches

43
Related work-in-progress

Online learning
Fancier weight updates with better guaranteed
performance in changing environments
Explore direct online ranking strategies (e.g.
the ranking perceptron)
Datamining project
Aims to exploit seasonality
Learn mapping from environmental conditions to
good performing experts characteristics
When same conditions arise in the future,
increase weights of experts that have those
characteristics
Hope to learn it as system runs, continually
updating mappings
MartiRank
In presence of repeated/missing values, sorting
is non-deterministic and pAUC takes different
values depending on permutation of data
Use statistics of the pAUC to improve basic
learning algorithm
Instead of input nr of rounds, stop when AUC
increase is not significant
Use better estimators of pAUC that are not
sensitive to permutations of the data