CHAPTER14: INTRODUCTION TO DATA ANALYSIS - PowerPoint PPT Presentation

1 / 51
About This Presentation
Title:

CHAPTER14: INTRODUCTION TO DATA ANALYSIS

Description:

Title: CHAPTER 6: LINEAR PROGRAMMING Author: chen_zhimin Last modified by: user Created Date: 10/20/2004 1:00:08 PM Document presentation format – PowerPoint PPT presentation

Number of Views:226
Avg rating:3.0/5.0
Slides: 52
Provided by: chen206
Category:

less

Transcript and Presenter's Notes

Title: CHAPTER14: INTRODUCTION TO DATA ANALYSIS


1
CHAPTER14INTRODUCTION TO DATA ANALYSIS
2
14.1 INTRODUCTION
  • There are many situations in business where data
    is collected and analysed.
  • The key ideas of data analysis are important in
    the modern business environment.
  • Summarising and understanding the main features
    of the variables contained within the data, and
    investigate the nature of any linkages between
    the variables that may exist.

3
14.2 WHAT IS DATA
  • Example 1
  • Population the set of all people/objects of
    interest in the study being undertaken.
  • Very large
  • Enumerated precisely
  • Cannot be Enumerated physically

Population member
4
  • The information for each member of the
    population
  • Age
  • Gender
  • Parish
  • Will you vote in the by-election?
  • Will you vote for me?
  • Variables one piece of information
  • Five variables

5
  • To investigate the connection between the two
    pairs of variables
  • 'Will you vote for me' and 'Age'
  • 'Will you vote for me' and 'Gender'
  • 'Will you vote for me' and 'Parish'
  • Population data is used ? the outcomes of the
    analysis are precise ? 'perfect information'
    results.

6
  • Example 2
  • Population the set of all customers

7
  • A sensible initial set of questions is
  • Do you understand exactly what each variable is
    measuring/recording?
  • Do you understand the problem under investigation
    and are the objectives of the investigation
    clear.?

8
14.3 DESCRIBING VARIABLES
  • Classification of variable types
  • Attribute variables
  • Measured variables

9
  • Attribute Variables
  • An attribute variable has its outcomes described
    in terms of its characteristics or attributes.
  • Example 1 'By-Election Data'

10
  • Example 2 'Credit Data'
  • Does the customer own their own house?
  • 0Yes 1No
  • The Region in which the customer is resident?
  • 1South West
  • 2South East
  • 3London
  • 4Midland
  • 5North
  • Handling attribute data is to give it a
    numerical code 0, 1, 2 ,.

11
  • Measured Variable
  • A measured variable is a variable that has its
    outcomes measured the resulting outcome is
    expressed in numerical terms.
  • Two types of measured variables
  • Continuous variable continuous scale of
    measurement(person's weight)
  • Discrete variable the number of passengers on
    flight
  • Example 1 'By-Election Data'
  • The measured variable in this data set is 'Age'

12
  • Example 2 'Credit Data'
  • Measured variables as follows


13
14.4 THE CONCEPT OF A STATISTICAL DISTRIBUTION
  • Attribute Variable
  • Gender of constituents (Example 1)

DISTRIBUTION OF GENDER IN THE CONSTITUENCY
14
  • REGION (Example 2)


DISTRIBUTION OF REGION IN WHICH CUSTOMER IS
RESIDENT
15
  • Measured Variable
  • Customer's Age (Example 2)


DISTRIBUTION OF AGE OF CUSTOMER
16
  • Household Income (Example 2)

DISTRIBUTION OF HOUSEHOLD INCOME
17
  • What does the distribution show?
  • The area under the curve from one income value to
    another measures the relative proportion of the
    population having household incomes in that
    range.
  • Lower than 10,000 is relatively rare
  • Large proportion of the population have Household
    incomes between 20,000 50,000

18
  • The Descriptive Statistics for Distribution of a
    Measured Variable
  • Distribution of the height of adults in Great
    Britain.

19
  • The height of children under 11 years of age

children's heights
adult's heights


20
  • Heights in two different countries, country A
    and country B

DISTRIBUTION OF HEIGHTS COUNTRY A B
21
  • A statistical distribution for a measured
    variable can be summarised using three key
    descriptions
  • Centre of the distribution
  • Width of the distribution
  • Symmetry of the distribution

22
  • Measuring the Centre of a Distribution
  • The Mean
  • average value ?X/n
  • Average Household Income
  • symbol for the population mean ?
  • Formally the population mean of a variable is
    defined to be
  • ? ?X/n
  • The Median
  • The median value of the variable is defined to be
    the particular value of the variable such that
    half the data values are less than the median
    value and half are greater.
  • Sorting all data in ascending order, the median
    value is then the middle value in this list

23
  • Measuring the Width of a Distribution
  • The Standard Deviation
  • The Standard Deviation is the square root of the
    average squared deviation from the mean.
  • Symbol of Standard Deviation ?
  • ? is usually defined in terms of the variance ?
    2as
  • ? 2 ?(X- ?)2/n
  • Standard deviation is the square root of the
    variance
  • Calculating the standard deviation for the
    variable Household Income
  • Standard deviation is a relative measure of
    spread (width), the larger the standard deviation
    the wider the distribution.

24
  • Inter-quartile Range
  • The inter-quartile range is the range over which
    the middle 50 of the data values varies
  • To define the quartiles
  • Q1 the value of the variable that divides the
    distribution 25 to the left and 75 to the
    right.
  • Q2 the value of the variable that divides the
    distribution 50 to the left and 50 to the
    right.
  • Q3 the value of the variable that divides the
    distribution 75 to the left and 25 to the
    right.
  • The inter-quartile range is the value Q3 - Q1


25
  • Calculating the Q1, Q2, Q3 for the variable
    'Household Income'
  • Conventionally the mean and standard deviation
    are one pair of measures of location and spread,
    and the median and inter-quartile range as
    another pair of measures.

26
  • Measuring the Symmetry (skewness) of a
    Distribution
  • Pearson's coefficient of Skewness
  • Pearson's coefficient of Skewness 3(mean -
    median)/standard deviation
  • Quartile Measure of Skewness
  • Quartile Measure of Skewness (Q1 - Q3) - (Q2
    Q1)/(Q3 Q1)

27
14.5 SUMMARY
  • What is Data
  • Variables
  • Two types of variable
  • an attribute variable
  • a measured variable
  • The concept of a Statistical Distribution
  • As applied to an attribute variable
  • As applied to a measured variable

28
  • Descriptive Statistics for a measured variable
  • Measures of Centre
  • Mean
  • Median
  • Measures of Width
  • Standard Deviation
  • Inter-Quartile Range
  • Measures of Symmetry (Skewness)
  • Pearson's coefficient of Skewness
  • Quartile Measure of Skewness

29
14.6 THE NATURE OF A SAMPLE
  • POPULATION
  • Perfect Information
  • In practice it is often impossible to enumerate
    the whole population.
  • A sample drawn from the population to make
    judgements (inferences) about the population.

30
  • SAMPLE
  • Imperfect Information
  • Random sample
  • Each item in the population has an equal chance
    of being included in the sample.
  • The KEY PROBLEM is to use this sample data to
    draw valid conclusions about the population with
    the knowledge of and taking into account the
    'error due to sampling'
  • Unrepresentative sample
  • How to Lie with Statistics

31
  • A Credit Scenario
  • Population the set of all customers who used the
    credit facilities between 1st January 2000 and
    31st December 2001.
  • Sample Size 654 customers
  • Data file BDMCREDIT.MTW

32
14.8 DESCRIBING SAMPLE DATA
  • Attribute variable the number of occurrences of
    each attribute is obtained
  • Measured variable Sample descriptive statistics
    describing the centre, width and symmetry of the
    distribution are calculated.

33
  • Attribute Data
  • C5 Does the customer own their own house? Coded
    0 Yes, lNo
  • C6 The Region in which the customer is
    resident? Coded
  • 1                     South West
  • 2                     South East
  • 3                     London
  • 4                     Midlands
  • 5                     North
  • Command STAT-TABLE-TALLY

34

35
  • Summary Statistics for Discrete Variables
  • Counts (OWN-OCC)
  • Percent(OWN-OCC)
  • Distribution graph(OWN-OCC)

Do you Own your own house?
36
  • Summary Statistics for Discrete Variables
  • Count(REGION )
  • The information in form
  • 74 or 11.31 of the respondents are from the
    Southwest
  • 132 or 20.18 of the respondents are from the
    Southeast
  • 165 or 25.23 of the respondents are from the
    London area
  • 161 or 24.62 of the respondents are from the
    Midlands
  • 122 or 18.65 of the respondents are from the
    North


37
  • Measured Variables
  • For the 'Credit Data
  • C2 Customer's Age (AGE)
  • C3 Household Income ( per annum) (SALARY)
  • C4 Estimated monthly outgoing on
    mortgage/rent/rates/utilities/credit card
    payments etc. (PAYOUT)
  • C7 The Amount borrowed on credit (CREDIT)
  • HISTOGRAM


38
  • BOXPLOT
  • The BOXPLOT will prove to be a more useful way of
    representing the picture of a sample distribution
    when the data analysis used to examine the
    connection between two sample variables is
    discussed in later chapters.



39
14.7 DATA ANALYSIS USING SAMPLE DATA
  • Before attempting to analyse any data, the
    analyst should
  • The problem under investigation is clearly
    understood and the objectives of the
    investigation have been clearly specified. Keep
    asking questions until satisfactory answers have
    been obtained.
  • The individual variables making up the dataset
    are clearly understood.

40

41
  • Descriptive Statistics
  • Measures of Centre
  • Mean
  • Sample Mean
  • Median
  • Measures of Width
  • Standard Deviation
  • Sample Standard Deviation S
  • Sample Variance S2
  • Inter-Quartile Range IQR
  • Symmetry

42
  • Symmetry (Skewness)
  • A distribution is skewed if one tail extends
    farther than the other.
  • A value close to 0 indicates symmetric data.
  • Negative values indicate negative/left skew.
  • Positive values indicate positive/right skew.
  • Example of a negative or left-skewed distribution
    (skewness -1.44096)

43
(No Transcript)
44
  • The Relationship between the descriptive
    statistics and the Boxplot
  • The asterisks on the right hand side of the
    median are indicating sample values that are in
    some sense extreme

45
14.9 INVESTIGATING RELATIONSHIPS BETWEEN
VARIABLES
  • To investigate the relationship between
    variables.
  • Response variable
  • a variable that measures either directly or
    indirectly the objectives of the analysis
  • Explanatory variable
  • a variable that may influence the response
    variable

46
  • Example 1
  • A university wishes to investigate the salary of
    its graduates five years after graduating
  • The questionnaire
  • 'Current Salary'
  • 'Starting Salary'
  • 'Class of Degree' Coded lFirst, 2Upper
    Second, 3Lower Second, 4Third, 5Pass.
  • 'Graduate's Gender' Coded lMale, 2Female.
  • Response variable
  • Current Salary (measured variable)

47
  • Explanatory Variable
  • Staring Salary (measured variable)
  • Class of Degree (attribute variable)
  • 'Graduate's Gender (attribute variable)

48
  • Example 2 CREDIT scenario
  • Objectives of the analysis
  • To investigate the nature of credit transactions
  • The variable 'The Amount borrowed on credit'
  • The problem is to investigate the relationship
    between 'The Amount borrowed on credit' and the
    other variables.
  • Summary

49
  • Combinations of Response Variable and
    Explanatory Variable

EXPLANATORY VARIABLE
50
  • The method for investigating the connection
    between a response variable and an attribute
    variable depends on the type of variable.
  • Investigating the connection between a measured
    response and a measured explanatory variables
  • Investigating the connection between a measured
    response and an attribute explanatory variables

51
Homework
  • Find or collect some data in your life or
    business practice, answer the following questions
  • Draw the statistic distribution of data
  • Calculate the Mean and Standard Deviation
  • Calculate the Median and Inter-Quartile Range
  • Calculate the Pearsons Coefficient of Skewness
    and Quartile Measure of Skewness
Write a Comment
User Comments (0)
About PowerShow.com