Descriptive vs. Inferential Statistics - PowerPoint PPT Presentation

Loading...

PPT – Descriptive vs. Inferential Statistics PowerPoint presentation | free to download - id: 3cfb46-MmFlZ



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Descriptive vs. Inferential Statistics

Description:

Descriptive vs. Inferential Statistics Descriptive Methods for summarizing data Summaries usually consist of graphs and numerical summaries of the data – PowerPoint PPT presentation

Number of Views:442
Avg rating:3.0/5.0
Slides: 135
Provided by: businessA4
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Descriptive vs. Inferential Statistics


1
Descriptive vs. Inferential Statistics
  • Descriptive
  • Methods for summarizing data
  • Summaries usually consist of graphs and numerical
    summaries of the data
  • Inferential
  • Methods of making decisions or predictions about
    a populations based on sample information.

2
Data Vocabulary
  • We will refer to Data as plural and data set as a
    particular collection of data as a whole.
  • Observation each data value.
  • Subject (or individual) an item for study
    (e.g., an employee in your company).
  • Variable a characteristic about the subject or
    individual (e.g., employees income).

3
Data Vocabulary
Consider the multivariate data set with
5 variables
8 subjects
5 x 8 40 observations
4
Data Vocabulary Data Types
  • A data set may have a mixture of data types.

5
Data Vocabulary Attribute Data
  • Also called categorical, nominal or qualitative
    data.
  • Values are described by words rather than
    numbers.
  • For example,
  • Automobile style (e.g., X full, midsize,
    compact, subcompact).

6
Data Vocabulary Data Coding
  • Coding refers to using numbers to represent
    categories to facilitate statistical analysis.
  • Coding an attribute as a number does not make the
    data numerical.
  • For example, 1 Bachelors, 2 Masters, 3
    Doctorate
  • 1 Liberal, 2 Moderate, 3 Conservative

7
Data Vocabulary Binary Data
  • A binary variable has only two values, 1
    presence, 0 absence of a characteristic of
    interest (codes themselves are arbitrary).
  • For example, 1 employed, 0 not employed
    1 married, 0 not married 1 male, 0
    female 1 female, 0 male
  • The coding itself has no numerical value so
    binary variables are attribute data.

8
Data Vocabulary Numerical Data
  • Numerical or quantitative data arise from
    counting or some kind of mathematical operation.
  • For example, - Number of auto insurance claims
    filed in March (e.g., X 114 claims). - Ratio
    of profit to sales for last quarter (e.g., X
    0.0447).
  • Can be broken down into two types discrete or
    continuous data.

9
Data Vocabulary Discrete Data
  • A numerical variable with a countable number of
    values that can be represented by an integer (no
    fractional values).
  • For example, - Number of Medicaid patients
    (e.g., X 2). - Number of takeoffs at OHare
    (e.g., X 37)

10
Data Vocabulary Continuous Data
  • A numerical variable that can have any value
    within an interval (e.g., length, weight, time,
    sales, price/earnings ratios).
  • Any continuous interval contains infinitely many
    possible values (e.g., 426 lt X lt 428).

11
Data Vocabulary - Rounding
  • Ambiguity is introduced when continuous data are
    rounded to whole numbers.
  • Underlying measurement scale is continuous.
  • Precision of measurement depends on instrument.
  • Sometimes discrete data are treated as
    continuous when the range is very large (e.g.,
    SAT scores) and small differences (e.g., 604 or
    605) arent of much importance.

12
Four Levels of Measurement

13
Nominal Level of Measurement
  • Nominal data merely identify a category.
  • Nominal data are qualitative, attribute,
    categorical or classification data (e.g., Apple,
    Compaq, Dell, HP).
  • Nominal data are usually coded numerically, codes
    are arbitrary (e.g., 1 Apple, 2 Compaq, 3
    Dell, 4 HP).
  • Only mathematical operations are counting (e.g.,
    frequencies) and simple statistics.

14
Ordinal Level of Measurement
  • Ordinal data codes can be ranked (e.g., 1
    Frequently, 2 Sometimes, 3 Rarely, 4
    Never).
  • Distance between codes is not meaningful (e.g.,
    distance between 1 and 2, or between 2 and 3, or
    between 3 and 4 lacks meaning). Many useful
    statistical tests exist for ordinal data.
    Especially useful in social science, marketing
    and human resource research.

15
Interval Level of Measurement
  • Data can not only be ranked, but also have
    meaningful intervals between scale points. (e.g.,
    difference between 60?F and 70?F is same as
    difference between 20?F and 30?F).
  • Since intervals between numbers represent
    distances, mathematical operations can be
    performed (e.g., average).
  • Zero point of interval scales is arbitrary, so
    ratios are not meaningful (e.g., 60?F is not
    twice as warm as 30?F).

16
Level of Measurement Likert Scales
  • A special case of interval data frequently used
    in survey research.
  • The coarseness of a Likert scale refers to the
    number of scale points (typically 5 or 7).

17
Likert Scales
  • Careful choice of verbal anchors results in
    measurable intervals (e.g., the distance from 1
    to 2 is the same as the interval, say, from 3
    to 4).
  • Ratios are not meaningful (e.g., here 4 is not
    twice 2).
  • Many statistical calculations can be performed
    (e.g., averages, correlations, etc.).

18
Time Series vs. Cross-sectional Data Time Series
  • Each observation in the sample represents a
    different equally spaced point in time (e.g.,
    years, months, days).
  • Periodicity may be annual, quarterly, monthly,
    weekly, daily, hourly, etc.
  • We are interested in trends and patterns over
    time (e.g., annual growth in consumer debit card
    use from 1999 to 2008).

19
Time Series vs. Cross-sectional Data
Cross-sectional
  • Each observation represents a different
    individual unit (e.g., person) at the same point
    in time (e.g., monthly VISA balances).
  • We are interested in - variation among
    observations or in - relationships.
  • We can combine the two data types to get pooled
    cross-sectional and time series data.

20
Population and Sample
  • Population All subjects of interest
  • Sample Subset of the population for whom we have
    data

21
Populations and Samples

Population
22
Example The Sample and the Population for an
Exit Poll
  • In California in 2003, a special election was
    held to consider whether Governor Gray Davis
    should be recalled from office.
  • An exit poll sampled 3160 of the 8 million people
    who voted.

23

Example The Sample and the Population for an
Exit Poll
Example The Sample and the Population for an
Exit Poll
  • Whats the sample and the population for this
    exit poll?
  • The population was the 8 million people who voted
    in the election.
  • The sample was the 3160 voters who were
    interviewed in the exit poll.

24
Parameter and Statistic
  • A parameter is a numerical summary of the
    population
  • A statistic is a numerical summary of a sample
    taken from the population

25
Sampling Methods
26
Sampling Methods
27
Simple Random Sample
  • Every item in the population of N items has the
    same chance of being chosen in the sample of n
    items.
  • We rely on random
  • numbers to select a
  • name.

28
Graphical Summaries
  • Describe the main features of a variable
  • For Quantitative variables key features are
    center (Where are the data values concentrated?
    What seem to be typical or middle data values?)
  • spread (How much variation is there in the
    data? How spread out are the data values? Are
    there unusual values?) and shape (Are the data
    values distributed symmetrically? Skewed?
    Sharply peaked? Flat? Bimodal?
  • For Categorical variables key feature is the
    percentage in each of the categories

29
Frequency Table
  • A method of organizing data
  • Lists all possible values for a variable along
    with the number of observations for each value
  • Natural categories exist for qualitative
    variables
  • For quantitative variables artificial bins are
    created

30
Example Shark Attacks
31
Example Shark Attacks
Example Shark Attacks
  • What is the variable?
  • Is it categorical or quantitative?
  • How is the proportion for Florida calculated?
  • How is the for Florida calculated?

32
Example Shark Attacks
  • Insights what the data tells us about shark
    attacks

33
Graphs for Categorical Data
  • Pie Chart A circle having a slice of pie for
    each category. Center angle of slice represents
    relative frequency/percentage.
  • Bar Graph A graph that displays a vertical bar
    for each category. Length of bars represents
    frequency.

34
Example Sources of Electricity Use in the U.S.
and Canada
35
Pie Chart
  • A pie chart can only convey a general idea of the
    data.
  • Pie charts should be used to portray data which
    sum to a total (e.g., percent market shares).
  • A pie chart should only have a few (i.e., 3 to
    5) slices.
  • Each slice should be labeled with data values or
    percents.

36
Pie Chart
37
Bar Chart
38
Pie Charts Are Often Abused
  • Consider the following charts used to illustrate
    an article from the Wall Street Journal. Which
    type is better? Why?

39
ILL-Advised Pie Charts Options
  • Exploded and 3-D pie charts add strong visual
    impact but slices are hard to assess.

40
Summarizing Quantitative Data
  • Example Price/Earnings Ratios
  • P/E ratios are current stock price divided by
    earnings per share in the last 12 months. For
    example

41
Graphs for Quantitative Data
  • Dot Plot shows a dot for each observation
  • Histogram uses bars to portray the data
  • Which is Best?
  • Dot-plot
  • More useful for small data sets
  • Data values are retained
  • Histogram
  • More useful for large data sets
  • Most compact display
  • More flexibility in defining intervals

42
Dot Plot
  • A dot plot is the simplest graphical display of n
    individual values of numerical data. - Easy to
    understand - Not good for large samples (e.g., gt
    5,000).
  • Make a scale that covers the data range
  • Mark the axes and label them
  • Plot each data value as a dot above the scale at
    its approximate location
  • If more than one data value lies at about the
    same axis location, the dots are piled up
    vertically.

43
Dot Plot
  • Range of data shows dispersion.
  • Clustering shows central tendency.
  • Dot plots do not tell much of shape of
    distribution.
  • Can add annotations (text boxes) to call
    attention to specific features.

44
Frequency Distributions and Histograms
  • A frequency distribution is a table formed by
    classifying n data values into k classes (bins).
  • Bin limits define the values to be included in
    each bin. Widths must all be the same.
  • Frequencies are the number of observations within
    each bin.
  • Express as relative frequencies (frequency
    divided by the total) or percentages (relative
    frequency times 100).

45
Constructing a Frequency Distribution
  • Sort data in ascending order (e.g., P/E ratios)
  • Choose the number of bins (k)
  • - k should be much smaller than n.
  • Too many bins results in sparsely populated bins,
    too few and dissimilar data values are lumped
    together.

46
Constructing a Frequency Distribution Sturges
Rule
47
Constructing a Frequency Distribution
  • Set the bin limits according to k from Sturges
    Rule
  • For example, for k 7 bins, the approximate bin
    width is
  • To obtain nice limits, round the width to 10
    and start
  • the first bin at 0 to yield 0, 10, 20, 30, 40,
    50, 60, 70

48
Constructing a Frequency Distribution
  • Put the data values in the appropriate bin
  • In general, the lower limit is included in the
    bin while
  • the upper limit is excluded.
  • Create the table you can include
  • Frequencies counts for each bin
  • Relative frequencies absolute frequency divided
    by
  • total number of data values.
  • Cumulative frequencies accumulated relative
  • frequency values as bin limits increase.

49
3A-49
Bin Limits for the P/E Ratio Data
50
3A-50
Frequency Distributions and Histograms
  • A histogram is a graphical representation of a
    frequency distribution.
  • Y-axis shows frequency within each bin.
  • A histogram is a bar chart with no gaps between
    bars
  • X-axis ticks shows end points of each bin.



51
3A-51
Frequency Distributions and Histograms
  • Consider 3 histograms for the P/E ratio data with
    different bin widths. What do they tell you?



52
Frequency Distributions and Histograms Modal
Class
  • A histogram bar that is higher than those on
    either side is called the modal class.
  • Monomodal a single modal class.
  • Bimodal two modal classes.
  • Multimodal more than two modal classes.
  • Modal classes may be artifacts of the way bin
    limits are chosen.



53
3A-53
Shape of Histograms
  • A histogram suggests the shape of the population.
  • It is influenced by number of bins and bin
    limits.
  • Skewness indicated by the direction of the
    longer tail of the histogram.
  • Left-skewed (negatively skewed) a longer left
    tail.
  • Right-skewed (positively skewed) a longer right
    tail.
  • Symmetric both tail areas approximately the
    same.



54
(No Transcript)
55
3A-55
Line Charts
  • Used to display a time series or spot trends, or
    to compare time periods.
  • Can display several variables at once.

56
Scatter Plots for Bi-variate Data
  • A scatter plot shows n pairs of observations as
    dots (or some other symbol) on an XY graph.
  • A starting point for bivariate data analysis.
  • Allows observations about the relationship
    between two variables.
  • Answers the question Is there an association
    between the two variables and if so, what kind of
    association?

57
Scatter Plot Example Birth Rates vs. Life
Expectancy
58
Scatter Plot Example Birth Rates vs. Life
Expectancy
  • Here is a scatter plot with life expectancy on
    the X-axis and birth rates on the Y-axis.
  • Is there an association between the two variables?
  • Is there a cause-and-effect relationship?

59
Scatter Plot Example Aircraft Fuel Consumption
  • Consider five observations on flight time and
    fuel consumption for a twin-engine Piper Cheyenne
    aircraft.
  • A causal relationship is assumed since a longer
    flight would consume more fuel.

60
Scatter Plot Example Aircraft Fuel Consumption
  • Here is the scatter plot with flight time
    (explanatory) on the X-axis and fuel use
    (response) on the Y-axis. Is there an association
    between the variables?

61
Scatter Plots for Bi-variate Data
62
Scatter Plots and Policy Making
  • Scatter plots can be helpful when policy
    decisions need to be made.
  • For example, compare traffic fatalities resulting
    from crashes per million vehicles sold between
    1995 and 1999.
  • Do SUVs create a greater risk to the drivers of
    both cars?

63
Numerical Descriptive Statistics
  • How Can We describe the Center of Quantitative
    Data?

64
Measures of Central Tendency
65
Measures of Central Tendency
66
Measures of Central Tendency
67
Measures of Central Tendency - Mean
  • A familiar measure of central tendency.
  • In Excel, use function AVERAGE(Data) where Data
    is an array of data values.

68
Characteristics of the Mean
  • Arithmetic mean is the most familiar average.
  • Affected by every sample item.
  • The balancing point or fulcrum for the data.

69
Characteristics of the Median
70
Characteristics of the Median
71
Comparison Among Mean, Median, and Mode
  • Consider the following quiz scores for 3 students

Lees scores 60, 70, 70, 70, 80 Mean 70,
Median 70, Mode 70 Pats scores 45, 45,
70, 90, 100 Mean 70, Median 70, Mode
45 Sams scores 50, 60, 70, 80, 90 Mean
70, Median 70, Mode none Xiaos scores
50, 50, 70, 90, 90 Mean 70, Median 70, Modes
50,90
  • What does the mode for each student tell you?

72
Relationships Among Mean, Median and Mode
73
Measures of Variation
  • Variation is the spread of data points about
    the center of the distribution in a sample.
    Consider the following measures of dispersion

74
Measures of Variation
75
Measures of Variation
76
The Range
Range largest measurement - smallest measurement
Example Internists Salaries (in thousands of
dollars) 127 132 138 141 144 146 152 154 165 171
177 192 241 Range 241 - 127 114 (114,000)
77
The Variance
Population X1, X2, , XN
s2
Population Variance
78
The Standard Deviation
79
Example Population Variance/Standard Deviation
Population of annual returns for five junk bond
mutual funds 10.0, 9.4, 9.1, 8.3, 7.8
m 10.09.49.18.37.8 44.6 8.92
5 5
1.1664.2304.38441.2544 3.068
.6136 5
5
80
Sample Variance Example
Sample 2, 3, 5, 6. Here n 4 and x 4
xi (xi-x) (xi- x)2
  • 2 4 -2 4
  • 3 4 -1 1
  • 5 4 1 1
  • 6 4 2 4

Sum 10
s2 10 /(4-1) 3.33
81
Example Sample Variance/Standard Deviation
Sample of five car mileages 30.8, 31.7, 30.1,
31.6, 32.1
s2 2.572 ? 4 0.643
82
Coefficient of Variation
  • Useful for comparing variables measured in
    different units or with different means.
  • A unit-free measure of dispersion
  • Expressed as a percent of the mean.
  • Only appropriate for nonnegative data. It is
    undefined if the mean is zero or negative.

83
Coefficient of Variation Examples
84
Mean Absolute Deviation
  • The Mean Absolute Deviation (MAD) reveals the
    average distance from an individual data point to
    the mean (center of the distribution).
  • Uses absolute values of the deviations around the
    mean.
  • Excels function is AVEDEV(Array)

85
Central Tendency vs. Dispersion
  • Consider the histograms of hole diameters drilled
    in a steel plate during manufacturing.
  • The desired distribution is outlined in red.

86
Central Tendency vs. Dispersion
Acceptable variation but mean is less than 5 mm.
Desired mean (5mm) but too much variation.
  • Take frequent samples to monitor quality.

87
Central Tendency vs. Dispersion Job Performance
  • A high mean (better rating) and low standard
    deviation (more consistency) is preferred. Which
    professor do you think is best?

88
Section 2.6 2.7
  • Interpreting Standard Deviation and Measures of
    Relative Standing

89
Empirical Rule
  • For bell-shaped data sets
  • Approximately 68 of the observations fall within
    1 standard deviation of the mean
  • Approximately 95 of the observations fall within
    2 standard deviations of the mean
  • Approximately 100 of the observations fall
    within 3 standard deviations of the mean

90
Scale in std. dev. units
91
m 9.12 s 0.15
92
Empirical Rule Detecting Unusual Observations
  • The P/E ratio data contains several large data
    values. Are they unusual or outliers?

93
Empirical Rule Detecting Unusual Observations
  • If the sample came from a normal distribution,
    then the Empirical rule states

22.72 1(14.08)
(8.9, 38.8)
22.72 2(14.08)
(-5.4, 50.9)
22.72 3(14.08)
(-19.5, 65.0)
94
Empirical Rule Detecting Unusual Observations
  • Are there any unusual values or outliers?

7 8 . . . 48 55
68 91
22.72
95
Defining a Standardized Variable or Z-Score
  • A standardized variable (Z) redefines each
    observation in terms the number of standard
    deviations from the mean.

Standardization formula for a population
Standardization formula for a sample
96
Z-Score Example
  • zi tells how far away the observation is from the
    mean. A negative z value indicates the
    observation is below the mean while positive z
    value indicates the observation is above the
    mean.
  • For example, for the P/E data, the first value x1
    7. The associated z value is

97
Percentiles, Deciles and Quartiles
  • Percentiles are data that have been divided into
    100 groups.
  • For example, you score in the 83rd percentile on
    a standardized test. That means that 83 of the
    test-takers scored below you.
  • Deciles are data that have been divided into 10
    groups.
  • Quintiles are data that have been divided into 5
    groups.
  • Quartiles are data that have been divided into 4
    groups.

98
Use of Percentiles and Quartiles
  • Percentiles are used to establish benchmarks for
    comparison purposes (e.g., health care,
    manufacturing and banking industries use 5, 25,
    50, 75 and 90 percentiles).
  • Percentiles are used in employee merit evaluation
    and salary benchmarking.
  • Quartiles (25, 50, and 75 percent) are commonly
    used to assess financial performance and stock
    portfolios.

99
Quartiles
  • Quartiles are scale points that divide the sorted
    data into four groups of approximately equal size.
  • The three values that separate the four groups
    are called Q1, Q2, and Q3, respectively.

100
Quartiles
  • The second quartile Q2 is the median, an
    important indicator of central tendency.
  • Q1 and Q3 measure dispersion since the
    interquartile range Q3 Q1 measures the degree
    of spread in the middle 50 percent of data values.

101
Calculating Quartiles
  • For small data sets, find quartiles using method
    of medians

Step 1. Sort the observations.
Step 2. Find the median Q2.
Step 3. Find the median of the data values that
lie below Q2.
Step 4. Find the median of the data values that
lie above Q2.
102
Calculating Quartiles
  • Use Excel function QUARTILE(Array, k) to return
    the kth quartile.
  • Excel treats quartiles as a special case of
    percentiles. For example, to calculate Q3
  • QUARTILE(Array, 3)
  • PERCENTILE(Array, 75)
  • Excel calculates the quartile positions as

103
Central Tendency Using Quartiles
104
Dispersion Using Quartiles
105
Box Plots
  • A useful tool of exploratory data analysis (EDA).
  • Also called a box-and-whisker plot.
  • Based on a five-number summary
  • Consider the five-number summary for the 68 P/E
    ratios

Xmin, Q1, Q2, Q3, Xmax
106
Box Plots
107
Detecting Unusual Observations and Potential
Outliers
  • IQR Q3 Q1
  • An observation is considered unusual if it falls
    more than 1.5 x IQR below the first quartile or
    more than 1.5 x IQR above the third quartile
  • An observation is a potential outlier if it falls
    more than 3 x IQR below the first quartile or
    more than 3 x IQR above the third quartile

108
Box - Whiskers Plots
109
Box Plots
  • Fences and Unusual Data Values
  • Truncate the whisker at the fences and display
    unusual values and outliers as dots.
  • Based on these fences, there are three unusual
    P/E values and two outliers.

110
(No Transcript)
111
(No Transcript)
112
(No Transcript)
113
Probability Concepts
An experiment is any process of observation with
an uncertain outcome. The possible outcomes for
an experiment are called the experimental
outcomes. Probability is a measure of the chance
that an experimental outcome will occur when an
experiment is carried out
114
Probability
If E is an experimental outcome, then P(E)
denotes the probability that E will occur
and Conditions If E can never occur, then P(E)
0 If E is certain to occur, then P(E) 1 The
probabilities of all the experimental outcomes
must sum to 1.
115
Assigning Probabilities to Experimental Outcomes
  • Classical Method
  • For equally likely outcomes
  • Relative frequency or Empirical Approach
  • In the long run
  • Subjective
  • Assessment based on experience, expertise, or
    intuition

116
The Sample Space
The sample space of an experiment is the set of
all experimental outcomes. Example Genders of
Two Children
117
Computing Probabilities of Events
An event is a set (or collection) of experimental
outcomes. The probability of an event is the sum
of the probabilities of the experimental outcomes
that belong to the event.
118
Probabilities Equally Likely Outcomes
If the sample space outcomes (or experimental
outcomes) are all equally likely, then the
probability that an event will occur is equal to
the ratio
119
Example Computing Probabilities
Events P(one boy and one girl) P(BG) P(GB)
¼ ¼ ½ P(at least one girl) P(BG)
P(GB) P(GG) ¼ ¼ ¼ ¾
Note Experimental Outcomes BB, BG, GB, GG All
outcomes equally likely P(BB) P(GG) ¼
120
Event Relations
121
The Addition Rule for Unions
The probability that A or B (the union of A and
B) will occur is
122
Conditional Probability
The probability of an event A, given that the
event B has occurred is called the conditional
probability of A given B and is denoted as
. Further,
123
Independence of Events
Two events A and B are said to be independent if
and only if P(AB) P(A) or,
equivalently, P(BA) P(B)
124
Multiplication Rule for Intersections
The probability that A and B (the intersection of
A and B) will occur is
If A and B are independent, then the probability
that A and B (the intersection of A and B) will
occur is
125
Applications of Independence
  • To illustrate system reliability, suppose a Web
    site has 2 independent file servers. Each server
    has 99 reliability. What is the total system
    reliability? Let,
  • F1 be the event that server 1 fails
  • F2 be the event that server 2 fails
  • P(F1 ? F2 ) P(F1) P(F2) (.01)(.01)
    .0001 So, the probability that both servers are
    down is .0001.
  • The probability that at least one server is up
    is
  • 1 - .0001 .9999 or 99.99

126
Applications of Independence the Five Nines Rule
127
Contingency Tables
128
Contingency Tables Example Salary Gains MBA
Tuition
129
Contingency Tables Example Salary Gains MBA
Tuition
  • Are large salary gains more likely to accrue to
    graduates of high-tuition MBA programs?
  • For example, find the marginal probability of a
    small salary gain (P(S1)).
  • The marginal probability of a single event is
    found by dividing a row or column total by the
    total sample size.
  • P(S1) 17/67 0.2537
  • Conclude that about 25 of salary gains at the
    top-tier schools were under 50,000.

130
Contingency Tables Example Salary Gains MBA
Tuition
  • Find the marginal probability of a low tuition
    P(T1).

P(T1) 16/67 0.2388 There is a 24 chance that
a top-tier schools MBA tuition is under
40,000.
131
Contingency Tables Example Salary Gains MBA
Tuition
  • Find the joint probability of a low tuition and
    large salary gains P(T1 ? S3)
  • P(T1 ? S3) 1/67 0.0149
  • There is less than a 2 chance that a top-tier
    school has both low tuition and large salary
    gains.

132
Contingency Tables Example Salary Gains MBA
Tuition
  • Find the conditional probability that the salary
    gains are small (S1) given that the MBA tuition
    is large (T3).
  • P(S1 T3) 5/32 0.1563
  • There is about a16 chance that a top-tier school
    has small salary gains given the tuition is
    large.

133
Salary Gains MBA Tuition - Independence
  • To check for independent events in a contingency
    table, compare the conditional to the marginal
    probabilities.
  • For example, if small salary gains (S1) were
    independent of high tuition (T3), then P(S1 T3)
    P(S1).
  • What do you conclude about events S1 and T3?
  • They are dependent or not independent

134
Contingency Tables Relative Frequencies
  • Calculate the relative frequencies below for each
    cell of the cross-tabulation table to facilitate
    probability calculations.
  • Symbolic notation for relative frequencies
About PowerShow.com