Loading...

PPT – CHAPTER14: INTRODUCTION TO DATA ANALYSIS PowerPoint presentation | free to download - id: 6c60c1-ZmVkZ

The Adobe Flash plugin is needed to view this content

CHAPTER14 INTRODUCTION TO DATA ANALYSIS

14.1 INTRODUCTION

- There are many situations in business where data

is collected and analysed. - The key ideas of data analysis are important in

the modern business environment. - Summarising and understanding the main features

of the variables contained within the data, and

investigate the nature of any linkages between

the variables that may exist.

14.2 WHAT IS DATA

- Example 1
- Population the set of all people/objects of

interest in the study being undertaken. - Very large
- Enumerated precisely
- Cannot be Enumerated physically

Population member

- The information for each member of the

population - Age
- Gender
- Parish
- Will you vote in the by-election?
- Will you vote for me?
- Variables one piece of information
- Five variables

- To investigate the connection between the two

pairs of variables - 'Will you vote for me' and 'Age'
- 'Will you vote for me' and 'Gender'
- 'Will you vote for me' and 'Parish'
- Population data is used ? the outcomes of the

analysis are precise ? 'perfect information'

results.

- Example 2

- Population the set of all customers

- A sensible initial set of questions is
- Do you understand exactly what each variable is

measuring/recording? - Do you understand the problem under investigation

and are the objectives of the investigation

clear.?

14.3 DESCRIBING VARIABLES

- Classification of variable types
- Attribute variables
- Measured variables

- Attribute Variables
- An attribute variable has its outcomes described

in terms of its characteristics or attributes. - Example 1 'By-Election Data'

- Example 2 'Credit Data'
- Does the customer own their own house?
- 0Yes 1No
- The Region in which the customer is resident?
- 1South West
- 2South East
- 3London
- 4Midland
- 5North
- Handling attribute data is to give it a

numerical code 0, 1, 2 ,.

- Measured Variable
- A measured variable is a variable that has its

outcomes measured the resulting outcome is

expressed in numerical terms. - Two types of measured variables
- Continuous variable continuous scale of

measurement(person's weight) - Discrete variable the number of passengers on

flight - Example 1 'By-Election Data'
- The measured variable in this data set is 'Age'

- Example 2 'Credit Data'
- Measured variables as follows

14.4 THE CONCEPT OF A STATISTICAL DISTRIBUTION

- Attribute Variable
- Gender of constituents (Example 1)

DISTRIBUTION OF GENDER IN THE CONSTITUENCY

- REGION (Example 2)

DISTRIBUTION OF REGION IN WHICH CUSTOMER IS

RESIDENT

- Measured Variable
- Customer's Age (Example 2)

DISTRIBUTION OF AGE OF CUSTOMER

- Household Income (Example 2)

DISTRIBUTION OF HOUSEHOLD INCOME

- What does the distribution show?
- The area under the curve from one income value to

another measures the relative proportion of the

population having household incomes in that

range. - Lower than 10,000 is relatively rare
- Large proportion of the population have Household

incomes between 20,000 50,000

- The Descriptive Statistics for Distribution of a

Measured Variable - Distribution of the height of adults in Great

Britain.

- The height of children under 11 years of age

children's heights

adult's heights

- Heights in two different countries, country A

and country B

DISTRIBUTION OF HEIGHTS COUNTRY A B

- A statistical distribution for a measured

variable can be summarised using three key

descriptions - Centre of the distribution
- Width of the distribution
- Symmetry of the distribution

- Measuring the Centre of a Distribution
- The Mean
- average value ?X/n
- Average Household Income
- symbol for the population mean ?
- Formally the population mean of a variable is

defined to be - ? ?X/n
- The Median
- The median value of the variable is defined to be

the particular value of the variable such that

half the data values are less than the median

value and half are greater. - Sorting all data in ascending order, the median

value is then the middle value in this list

- Measuring the Width of a Distribution
- The Standard Deviation
- The Standard Deviation is the square root of the

average squared deviation from the mean. - Symbol of Standard Deviation ?
- ? is usually defined in terms of the variance ?

2as - ? 2 ?(X- ?)2/n
- Standard deviation is the square root of the

variance - Calculating the standard deviation for the

variable Household Income - Standard deviation is a relative measure of

spread (width), the larger the standard deviation

the wider the distribution.

- Inter-quartile Range
- The inter-quartile range is the range over which

the middle 50 of the data values varies - To define the quartiles
- Q1 the value of the variable that divides the

distribution 25 to the left and 75 to the

right. - Q2 the value of the variable that divides the

distribution 50 to the left and 50 to the

right. - Q3 the value of the variable that divides the

distribution 75 to the left and 25 to the

right. - The inter-quartile range is the value Q3 - Q1

- Calculating the Q1, Q2, Q3 for the variable

'Household Income' - Conventionally the mean and standard deviation

are one pair of measures of location and spread,

and the median and inter-quartile range as

another pair of measures.

- Measuring the Symmetry (skewness) of a

Distribution - Pearson's coefficient of Skewness
- Pearson's coefficient of Skewness 3(mean -

median)/standard deviation - Quartile Measure of Skewness
- Quartile Measure of Skewness (Q1 - Q3) - (Q2

Q1)/(Q3 Q1)

14.5 SUMMARY

- What is Data
- Variables
- Two types of variable
- an attribute variable
- a measured variable
- The concept of a Statistical Distribution
- As applied to an attribute variable
- As applied to a measured variable

- Descriptive Statistics for a measured variable
- Measures of Centre
- Mean
- Median
- Measures of Width
- Standard Deviation
- Inter-Quartile Range
- Measures of Symmetry (Skewness)
- Pearson's coefficient of Skewness
- Quartile Measure of Skewness

14.6 THE NATURE OF A SAMPLE

- POPULATION
- Perfect Information
- In practice it is often impossible to enumerate

the whole population. - A sample drawn from the population to make

judgements (inferences) about the population.

- SAMPLE
- Imperfect Information
- Random sample
- Each item in the population has an equal chance

of being included in the sample. - The KEY PROBLEM is to use this sample data to

draw valid conclusions about the population with

the knowledge of and taking into account the

'error due to sampling' - Unrepresentative sample
- How to Lie with Statistics

- A Credit Scenario
- Population the set of all customers who used the

credit facilities between 1st January 2000 and

31st December 2001. - Sample Size 654 customers
- Data file BDMCREDIT.MTW

14.8 DESCRIBING SAMPLE DATA

- Attribute variable the number of occurrences of

each attribute is obtained - Measured variable Sample descriptive statistics

describing the centre, width and symmetry of the

distribution are calculated.

- Attribute Data
- C5 Does the customer own their own house? Coded

0 Yes, lNo - C6 The Region in which the customer is

resident? Coded - 1 South West
- 2 South East
- 3 London
- 4 Midlands
- 5 North
- Command STAT-TABLE-TALLY

- Summary Statistics for Discrete Variables
- Counts (OWN-OCC)
- Percent(OWN-OCC)
- Distribution graph(OWN-OCC)

Do you Own your own house?

- Summary Statistics for Discrete Variables
- Count(REGION )
- The information in form
- 74 or 11.31 of the respondents are from the

Southwest - 132 or 20.18 of the respondents are from the

Southeast - 165 or 25.23 of the respondents are from the

London area - 161 or 24.62 of the respondents are from the

Midlands - 122 or 18.65 of the respondents are from the

North

- Measured Variables
- For the 'Credit Data
- C2 Customer's Age (AGE)
- C3 Household Income ( per annum) (SALARY)
- C4 Estimated monthly outgoing on

mortgage/rent/rates/utilities/credit card

payments etc. (PAYOUT) - C7 The Amount borrowed on credit (CREDIT)
- HISTOGRAM

- BOXPLOT
- The BOXPLOT will prove to be a more useful way of

representing the picture of a sample distribution

when the data analysis used to examine the

connection between two sample variables is

discussed in later chapters.

14.7 DATA ANALYSIS USING SAMPLE DATA

- Before attempting to analyse any data, the

analyst should - The problem under investigation is clearly

understood and the objectives of the

investigation have been clearly specified. Keep

asking questions until satisfactory answers have

been obtained. - The individual variables making up the data set

are clearly understood.

- Descriptive Statistics
- Measures of Centre
- Mean
- Sample Mean
- Median
- Measures of Width
- Standard Deviation
- Sample Standard Deviation S
- Sample Variance S2
- Inter-Quartile Range IQR
- Symmetry

- Symmetry (Skewness)
- A distribution is skewed if one tail extends

farther than the other. - A value close to 0 indicates symmetric data.
- Negative values indicate negative/left skew.
- Positive values indicate positive/right skew.
- Example of a negative or left-skewed distribution

(skewness -1.44096)

(No Transcript)

- The Relationship between the descriptive

statistics and the Boxplot - The asterisks on the right hand side of the

median are indicating sample values that are in

some sense extreme

14.9 INVESTIGATING RELATIONSHIPS BETWEEN

VARIABLES

- To investigate the relationship between

variables. - Response variable
- a variable that measures either directly or

indirectly the objectives of the analysis - Explanatory variable
- a variable that may influence the response

variable

- Example 1
- A university wishes to investigate the salary of

its graduates five years after graduating - The questionnaire
- 'Current Salary'
- 'Starting Salary'
- 'Class of Degree' Coded lFirst, 2Upper

Second, 3Lower Second, 4Third, 5Pass. - 'Graduate's Gender' Coded lMale, 2Female.
- Response variable
- Current Salary (measured variable)

- Explanatory Variable
- Staring Salary (measured variable)
- Class of Degree (attribute variable)
- 'Graduate's Gender (attribute variable)

- Example 2 CREDIT scenario
- Objectives of the analysis
- To investigate the nature of credit transactions
- The variable 'The Amount borrowed on credit'
- The problem is to investigate the relationship

between 'The Amount borrowed on credit' and the

other variables. - Summary

- Combinations of Response Variable and

Explanatory Variable

EXPLANATORY VARIABLE

- The method for investigating the connection

between a response variable and an attribute

variable depends on the type of variable. - Investigating the connection between a measured

response and a measured explanatory variables - Investigating the connection between a measured

response and an attribute explanatory variables

Homework

- Find or collect some data in your life or

business practice, answer the following questions - Draw the statistic distribution of data
- Calculate the Mean and Standard Deviation
- Calculate the Median and Inter-Quartile Range
- Calculate the Pearsons Coefficient of Skewness

and Quartile Measure of Skewness