Mastering Data Mining: Chapter 10 - PowerPoint PPT Presentation

1 / 17
About This Presentation
Title:

Mastering Data Mining: Chapter 10

Description:

ink viscosity, acidity, voltage level, blade pressure, paper type, etc. 6 of 17 ... Waste as a function of presses equipped with automatic blanket washers ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 18
Provided by: manch1
Category:

less

Transcript and Presenter's Notes

Title: Mastering Data Mining: Chapter 10


1
Waste Not, Want Not Improving Manufacturing
Process
2004. 3. 23. 2000-12008 ???, 2004-20971 ???
2
Introduction
  • CRM is the killer application for data mining
  • Another fruitful application for data mining is
    cost reduction through industrial process
    improvement
  • The efficiency of many manufacturing processes
    depends on hundreds or thousands of variables
    whose interactions are not well understood
  • We can find how variables are tied together
    through the process using data mining

3
Data Mining to Reduce Cost at R.R.Donnelley
  • R. R. Donnelley and Sons
  • The largest printing company in the United States
  • Having sales of around 5 billion yearly
  • Having 49 manufacturing plants
  • One of these plant is the setting for this
    chapters first case study
  • The company has three principal strategies for
    enhancing customer and shareholder value
  • The first strategy as driving continuous cost
    reduction and productivity improvement by
    creating process improvements, optimizing our
    manufacturing operations and enhancing
    supply-chain management

4
The Problems of R.R.Donnelley
  • The technical problem
  • High-speed printing is done on rotogravure
    presses
  • There has been a problem called cylinder
    banding
  • The symptom of cylinder banding is a streak of
    ink running across the printed image, ruining the
    print job
  • The business problem
  • The business problem was learning how to avoid
    costly interruptions to print runs by isolating
    the conditions under which certain defects appear

5
The Data of R.R.Donnelley
  • Data mining depends on data
  • In order to build a decision tree capable of
    explaining anything, there must be a training set
    consisting of observations of relevant variables
    for both positive and negative outcomes
  • At the start, no such data existed
  • They needed to choose a data mining approach that
    would help explore the data and identify a
    handful of the most important variables
  • The team set out to collect data on humidity, ink
    temp., ink viscosity, acidity, voltage level,
    blade pressure, paper type, etc.

6
The data of R.R.Donnelley
  • Deciding on the right attributes
  • The final set of input variables included
  • ESA anode distance in millimeters, chrome
    solution ratio, ESA current density in amperes
    per square decimeter, plating tank, viscosity,
    etc.
  • Total 17 classes
  • Data preparation for continuous inputs
  • What was needed was a way to pick splits that
    captured the notion that there is one range that
    affects the outcome in one direction, a second
    range that affects the outcome in the opposite
    direction, and a neutral range separating the
    first two
  • Defining the target class
  • How long a run is long enough to be considered
    free of banding?

7
Including Rules for Cylinder Bands
  • The goal of this project were different from the
    goals of most of the data mining projects
  • In this case, there was no plan to use the
    decision trees as predictive models
  • Instead, the trees were used to generate a set of
    practical, prescriptive rules that could be
    applied on the shop floor
  • We starts out with a set of heuristics
  • Example higher values of ink temperature, lower
    values of humidity increase likelihood of banding
  • Making split with the ID3 decision tree algorithm
  • A set of operating guidelines to try on the shop
    floor
  • Keep the chrome solution ratio high
  • Keep the ink temperature low
  • Keep the ink viscosity high

8
Change on the Shop Floor
  • Figure 14.3 shows the incidence of banding over
    time
  • It is not hard to pinpoint the time when the new
    guidelines came into effect!
  • It took some time for confidence in the new
    guidelines to spread through the print crews
  • Long-term impact
  • Before the data mining, 538 banding incidents
    caused more than 800 hours of downtime
  • After the data mining, only 21 such incidents
    resulting in 30 hours of downtime

9
Business Problem of Time-Warner
  • Decreasing expenses
  • To control paper cost is to buy when the price is
    right
  • To control paper costs is to use less paper
  • Relationship of Time Inc. to the Printing Plants
  • Printing plants have no direct incentive to be
    frugal with paper
  • The contractual limits on the amount of waste
  • Performance Variation between plants
  • To reduce the waste is valuable

10
The Data of Time-Warner
  • Invoice
  • Useful information for a shipment of paper
  • Stock Status
  • Information on paper in inventory including
    transit damage and in-plant damage to paper rolls
  • Transfer
  • Sometimes paper is transferred between printing
    plants
  • Usage
  • To track the magazine or department that gets
    charged for a press run

11
The Data of Time-Warner
  • Press Run
  • This was the most important table. It includes
    the press start time, the linear feet of paper
    used after makeready, the press stop time, etc.
  • Spoilage Detail
  • More information about waste, including
    calculated allowances for blanket-wash waste and
    bindery waste and the weight of paper printed
    after the correct number of pages has been
    reached.
  • Shipment
  • Information about how the paper got to the plant,
    and the carriers that transported it down to
    which door of a two-door box car from which it
    was unloaded

12
Approach to the Problem of Time-Warner
  • Hypothesis Testing
  • Waste as a function of press type
  • Waste as a function of paper age
  • Waste as a function of basis weight
  • Waste as a function of time of day
  • Waste as a function of presses equipped with
    automatic blanket washers
  • Waste as a function of number of rolls in press
    run

13
Types of Waste
  • Types of Waste
  • Wrapper waste
  • Strip waste
  • Make-ready waste
  • Running waste
  • Core waste
  • Bindery waste
  • Trim Waste
  • Overrun
  • Addressable Waste
  • This is avoidable waste
  • Inducing rules for addressable waste

14
Data Transformation
  • Transformation on the raw data
  • Changing the format
  • Changing the type
  • Convenience Fields
  • Several convenience fields were added to the data
  • Two types of derived fields
  • Continuous variables ? Categorical variables
  • Creating derived fields that contain information
    from two or more other fields
  • Classification Target
  • Data Characterization and Profiling
  • Statviz produces a small graph for each variable

15
Decision Trees
  • Classification versus Explanation
  • The goal is not to classification
  • What is the most important factors affecting
    waste
  • Extracting Rules for Addressable Waste
  • To increase the minimum number of records allowed
    in a node
  • To look only at nodes classified as high waste
    ignoring those which describe more usual runs
  • Other rules
  • Association Rules

16
Putting It All Together
  • Actionable information that they could
  • There is correlation between paper age and
    running waste
  • Print runs using paper from multiple mills had
    slightly higher running waste percentages
  • A fifth color in addition to the usual four,
    something that happens fairly often with the
    cover of Time magazine, leads to increased
    running waste

17
Lessons Learned
  • The projects used data mining to find
    prescriptive rules that could be used to improve
    the production process
  • The projects succeeded due to the constant
    involvement of subject matter experts who
    understood paper and printing inside out and were
    willing to provide guidance to the data miners
  • Implementation of the new policies suggested by
    data mining requires the active cooperation of
    the people on the plant floor
  • Data mining does not always require huge volumes
    of data-The Donnelley study used only a few
    hundred records
  • Where huge volumes of data are available, as at
    Time Inc., data mining can help sense of it.
Write a Comment
User Comments (0)
About PowerShow.com