Data Set Balancing - PowerPoint PPT Presentation

1 / 16

About This Presentation

Title:

Data Set Balancing

Description:

Cancer cases. Loan defaults binary or other. Poor performing employees binary or other ... Insurance Fraud Data. 5000 observations (4000 training, 1000 test) ... – PowerPoint PPT presentation

Number of Views:24

Avg rating:3.0/5.0

Slides: 17

Provided by: CBA478

Category:

Tags: balancing | cancer | classifieds | data | fraud | job | set

Transcript and Presenter's Notes

Title: Data Set Balancing

1
Data Set Balancing

David L. Olson
Department of Management
University of Nebraska

2
Skewed Data Sets

Many interesting applications involve data with
many cases in one category, few in another
Insurance claims binary fraudulent or not
Cancer cases
Loan defaults binary or other
Poor performing employees binary or other
Skewed data sets cause modeling problems
Can cause model degeneracy
call all claims non-fraudulent

3
Test Domain

Models
Decision tree
Regression
Neural network
Data
Categorical or Continuous
Binary or Four-outcome

4
Data Sets

All generated for pedagogical purposes
Loan Application Data
650 observations (400 training, 250 test)
Binary (0 not on time 1 on time)
0.1125 late or default
Insurance Fraud Data
5000 observations (4000 training, 1000 test)
Binary (OK, Fraudulent)
0.0150 fraudulent
Job Application Data
500 observations (250 training, 250 test)
Four outputs (unacceptable, minimal, adequate,
excellent)
0.028 excellent

5
Loan Application Data
6
Insurance Fraud Data
7
Job Application Data
8
Experiments

High degree of imbalance in each data set
Tested both categorical continuous data
Categorical
Decision tree See5
Logistic regression Clementine
Neural network Clementine
Continuous
Regression tree See5
Discriminant analysis Clementine
Neural network Clementine

9
Procedure

Full model run
Training set reduced
Deleted cases from most common outcome
Correct classification rate
Correct/total
Also identified type of error
(coincidence matrix)

10
Loan Application Data Set
11
Insurance Fraud Data Set
12
Job Application Data Set
13
Degeneracy

Model classifies all samples in dominant category
The greater the data set skew
The greater the correct classification rate
BUT MODEL DOESNT HELP

14
Comparison
15
Advanced Solutions

BAGGING
Combine several classifiers majority vote
BOOSTING
Sequentially learn several classifiers
Each classifier used to focus on data poorly
classified by the previous classifier
Combine by weighted vote
STACKING
Combine outputs of multiple classifiers obtained
by different learning algorithms

16
Conclusions

If data highly unbalanced
Algorithms tend to degenerate
If data balanced
Reduces training set size
Can lead to degeneracy by eliminating rare cases
Accuracy rates tend to decline
Decision tree algorithms the most robust

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

World's Best PowerPoint Templates PowerPoint PPT Presentation

World's Best PowerPoint Templates - CrystalGraphics offers more PowerPoint templates than anyone else in the world, with over 4 million to choose from. Winner of the Standing Ovation Award for “Best PowerPoint Templates” from Presentations Magazine. They'll give your presentations a professional, memorable appearance - the kind of sophisticated look that today's audiences expect. Boasting an impressive range of designs, they will support your presentations with inspiring background photos or videos that support your themes, set the right mood, enhance your credibility and inspire your audiences.

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Data Center Switch Market PowerPoint PPT Presentation

Data Center Switch Market - Data Center Switch Market by Port Speed, Technology, Switch type, and End User: Global Opportunity Analysis and Industry Forecast, 2021–2027 | PowerPoint PPT presentation | free to view

Big Data Analytics in Healthcare Market Report PowerPoint PPT Presentation

Big Data Analytics in Healthcare Market Report - Big Data Analytics in Healthcare Market by Component, Deployment, Analytics Type, Application, and End User: Global Opportunity Analysis and Industry Forecast, 2021 - 2027 | PowerPoint PPT presentation | free to view

Big Data in Healthcare Market Report PowerPoint PPT Presentation

Big Data in Healthcare Market Report - Big Data in Healthcare Market: By Components and Services, By Analytic Service, and By Application - Forecast 2027 | PowerPoint PPT presentation | free to view

Data Center Switch Market Report PowerPoint PPT Presentation

Data Center Switch Market Report - Data Center Switch Market by Port Speed, Technology, Switch type, and End User: Global Opportunity Analysis and Industry Forecast, 2021–2027 | PowerPoint PPT presentation | free to view

Data Flow Diagram PowerPoint PPT Presentation

Data Flow Diagram - Chapter 1: Data Flow Diagram Structuring System Process Requirements Chapter 7 in Modern System Analysis and Design Book. * Guidelines for Drawing DFDs Completeness ... | PowerPoint PPT presentation | free to view

Business Telecommunications Data and Computer Communications PowerPoint PPT Presentation

Business Telecommunications Data and Computer Communications - Business Telecommunications Data and Computer Communications Chapter 10 Packet Switching | PowerPoint PPT presentation | free to view

Data%20Preparation%20for%20Knowledge%20Discovery PowerPoint PPT Presentation

Data%20Preparation%20for%20Knowledge%20Discovery - Title: Data Mining and Knowledge Discovery in Business Databases Author: Gregory Piatetsky Last modified by: Gregory Piatetsky Created Date: 6/4/1996 5:33:28 PM | PowerPoint PPT presentation | free to view

Dynamic Balancing Denver PowerPoint PPT Presentation

Dynamic Balancing Denver - We are one of the best Dynamic Balancing Companies that offer dynamic balancing service for rotator machinery for smoother and quieter operation. Though our expertise and experience, we thrive to give you perfection. Call us for all kind of dynamic balancing needs of your machinery and we would love to give you the perfect service. | PowerPoint PPT presentation | free to view

Enov8 - Top 4 Steps to Better Test Data Management PowerPoint PPT Presentation

Enov8 - Top 4 Steps to Better Test Data Management - From time to time it’s been heard from many companies that the non-presence of proper test data is the chief wrongdoers behind uninspired testing plans and problems in production. A balance is important to create realistic Test data management, where test data can be run efficiently, and data subsets should be large enough to return realistic production Test environment. https://goo.gl/434y9w | PowerPoint PPT presentation | free to view

Distance Education PGDITM in Data Analytics and Business Intelligence PowerPoint PPT Presentation

Distance Education PGDITM in Data Analytics and Business Intelligence - PGDITM in Data Analytics and Business Intelligence helps the student to gain the knowledge and skill set in key areas like predictive modeling, social and web analytics among others. You will learn the most industry relevant courses like predictive modeling, web analytics and much more. | PowerPoint PPT presentation | free to view

Tips To Create A Practical Data Governance Strategy PowerPoint PPT Presentation

Tips To Create A Practical Data Governance Strategy - In this PPT, We describe about Tips To Create A Practical Data Governance Strategy. Creating a plan on paper is always easy but implementing it can be filled with challenges. Applying a practical data governance strategy is also much more difficult than formulating it. Every organization is looking to leverage information for its advantage. This has led to a profusion of data management services and experts. | PowerPoint PPT presentation | free to view

Issues in Monitoring Web Data PowerPoint PPT Presentation

Issues in Monitoring Web Data - Issues in Monitoring Web Data Serge Abiteboul INRIA and Xyleme Serge.Abiteboul@inria.fr | PowerPoint PPT presentation | free to view

Abstract Data Types Stack, Queue Amortized analysis PowerPoint PPT Presentation

Abstract Data Types Stack, Queue Amortized analysis - Abstract Data Types Stack, Queue Amortized analysis Cormen: Ch 10, 17 (11, 18) ADT is an interface It defines the type of the data stored operations, what each ... | PowerPoint PPT presentation | free to view

Four Ways To Improve The Quality Of Online Data Entry Services PowerPoint PPT Presentation

Four Ways To Improve The Quality Of Online Data Entry Services - Whether you outsource online data entry services to India or take care of it in your business itself, the quality and accuracy is an important aspect. For more info - https://www.sastabpo.com/outsource-to-india/four-ways-to-improve-the-quality-of-online-data-entry-services/ | PowerPoint PPT presentation | free to view

Data Access with ADO.NET PowerPoint PPT Presentation

Data Access with ADO.NET - Title: Data Access with ADO.NET Author: Svetlin Nakov, Branimir Giurov, Lazar Kirchev, Stefan Zahariev Description: Programming for the .NET Framework http://www ... | PowerPoint PPT presentation | free to view

Big Data Online Training PowerPoint PPT Presentation

Big Data Online Training - http://www.learntek.org/product/big-data-and-hadoop/ http://www.learntek.org Learntek is global online training provider on Big Data Analytics, Hadoop, Machine Learning, Deep Learning, IOT, AI, Cloud Technology, DEVOPS, Digital Marketing and other IT and Management courses. We are dedicated to designing, developing and implementing training programs for students, corporate employees and business professional. | PowerPoint PPT presentation | free to view

CSE 634 Data Mining Techniques PowerPoint PPT Presentation

CSE 634 Data Mining Techniques - CSE 634 Data Mining Techniques CLUSTERING Part 2( Group no: 1 ) By: Anushree Shibani Shivaprakash & Fatima Zarinni Spring 2006 Professor Anita Wasilewska | PowerPoint PPT presentation | free to view

Cosmic Microwave Background Data Analysis : From Time-Ordered Data To Power Spectra PowerPoint PPT Presentation

Cosmic Microwave Background Data Analysis : From Time-Ordered Data To Power Spectra - load balancing (work & memory) - data-delivery, including communication & I/O ... encodes the error information - is sparse, so can be saved even for huge data ... | PowerPoint PPT presentation | free to view

Primary Approaches to Bring Advanced Master Data Management Solutions’ Abilities to Business Users PowerPoint PPT Presentation

Primary Approaches to Bring Advanced Master Data Management Solutions’ Abilities to Business Users - Master data management is a difficult undertaking which businesses in each industry have to deal with. It is about correct data management, compliance, access, safety, quality, storage and usage through mdm tools. Business enterprises have to balance regulatory needs against their company policies to appropriately handle the data. Read more... | PowerPoint PPT presentation | free to view

USB Data Cable Manufacturing Plant Project Report 2021-2026 | Syndicated Analytics PowerPoint PPT Presentation

USB Data Cable Manufacturing Plant Project Report 2021-2026 | Syndicated Analytics - Significant growth in the demand for consumer electronics represents one of the primary factors driving the USB data cable market. This can also be attributed to inflating disposable incomes and rising expenditure capacities of consumers. Additionally, numerous technological advancements, such as the introduction of super speed USB data cables with enhanced speed and performance, are further bolstering the product demand. Read More: https://www.syndicatedanalytics.com/usb-data-cables-production-cost-analysis-report | PowerPoint PPT presentation | free to view

Introduction to Big Data HADOOP HDFS MapReduce - Department of Computer Engineering PowerPoint PPT Presentation

Introduction to Big Data HADOOP HDFS MapReduce - Department of Computer Engineering - This presentation is an Introduction to Big Data, HADOOP: HDFS, MapReduce and includes topics What is Big Data and its benefits, Big Data Technologies and their challenges, Hadoop framework comparison between SQL databases and Hadoop and more. It is presented by Prof. Deptii Chaudhari, from the department of Computer Engineering at International Institute of Information Technology, I²IT. | PowerPoint PPT presentation | free to view

Chapter 6: Steady-State Data Reconciliation with Model Uncertainties PowerPoint PPT Presentation

Chapter 6: Steady-State Data Reconciliation with Model Uncertainties - Chapter 6: Steady-State Data Reconciliation with Model Uncertainties 7.2 Linear Dynamic Data Reconciliation Figure 7.4: Dynamics of a storage tank for a 20% ... | PowerPoint PPT presentation | free to view

Chapter 12 File Processing and Data Management Concepts PowerPoint PPT Presentation

Chapter 12 File Processing and Data Management Concepts - Chapter 12 File Processing and Data Management Concepts Presentation Outline Terminology Database Technology The Architecture of a Database Management System (DBMS ... | PowerPoint PPT presentation | free to view

Foundational Data Modeling and Schema Transformations for XML Data Engineering PowerPoint PPT Presentation

Foundational Data Modeling and Schema Transformations for XML Data Engineering - Foundational Data Modeling and Schema Transformations for XML Data Engineering Stephen W. Liddle Information Systems Department Reema Al-Kamha & David W. Embley | PowerPoint PPT presentation | free to view

Analysis of Large Scale Gene Expression Data PowerPoint PPT Presentation

Analysis of Large Scale Gene Expression Data - Analysis of Large Scale Gene Expression Data | PowerPoint PPT presentation | free to view

Ways in which Load Balancing Software makes your network secure PowerPoint PPT Presentation

Ways in which Load Balancing Software makes your network secure - Load balancing software will help in network administration as it is used to address the network load management needs. | PowerPoint PPT presentation | free to view

Supporting Dynamic Load Balancing in a Parallel Data Mining Middleware PowerPoint PPT Presentation

Supporting Dynamic Load Balancing in a Parallel Data Mining Middleware - Supporting Dynamic Load Balancing in a Parallel Data Mining Middleware Tekin Bicer and Gagan Agrawal Department of Computer Science and Engineering | PowerPoint PPT presentation | free to view