QMS 6351 Statistics and Research Methods Analyzing the Relationship Between Two and More Variables Chapter 2.4 Chapter 3.5 Chapter 14 (14.1-14.3, 14.6) Chapter 15 (15.1-15.3, 15.7) - PowerPoint PPT Presentation

1 / 61
About This Presentation
Title:

QMS 6351 Statistics and Research Methods Analyzing the Relationship Between Two and More Variables Chapter 2.4 Chapter 3.5 Chapter 14 (14.1-14.3, 14.6) Chapter 15 (15.1-15.3, 15.7)

Description:

Title: QMS6351 Chapter 14a Author: Vera Adamchik Last modified by: VADAMCHIK Created Date: 1/16/2002 12:22:10 AM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:606
Avg rating:3.0/5.0
Slides: 62
Provided by: VeraAd
Category:

less

Transcript and Presenter's Notes

Title: QMS 6351 Statistics and Research Methods Analyzing the Relationship Between Two and More Variables Chapter 2.4 Chapter 3.5 Chapter 14 (14.1-14.3, 14.6) Chapter 15 (15.1-15.3, 15.7)


1
QMS 6351 Statistics and Research Methods
Analyzing the Relationship Between Two and More
Variables Chapter 2.4Chapter 3.5Chapter 14
(14.1-14.3, 14.6)Chapter 15 (15.1-15.3, 15.7)
  • Prof. Vera Adamchik

2
  • Chapter 2
  • Section 2.4
  • Crosstabulations and Scatter Diagrams

3
Crosstabulations
  • Crosstabulation is a method that can be used to
    summarize the data for two variables
    simultaneously.
  • Typically, the tables left and top margin labels
    define the classes for the two variables.
  • Crosstabulation can provide insight about the
    relationship between the variables.

4
Crosstabulations
  • Crosstabulation of Enrollment by Gender and
    Degree Level at a University
  • Degree Level
  • Gender Undergraduate Graduate
    Doctorate Total
  • Male 7341 (47.0) 1937 (53.4) 172
    (59.1) 9450 (48.3)
  • Female 8294 (53.0) 1688 (46.6) 119
    (40.9) 10101 (51.7)
  • Total 15635 (100.0) 3625 (100.0)
    291(100.0)19551 (100.0)

5
Scatter Diagram
  • A scatter diagram is a graphical presentation of
    the relationship between two quantitative
    variables.

6
Scatter Diagram
  • Scatter Diagram for Engine Size and Gas Mileage
    of Eight Automobiles

30
25
20
In-City Gas Mileage (mpg)
15
10
0 2 4 6 8 10
Engine Size (number of cylinders)
7
Example Reed Auto Sales
  • Reed Auto periodically has a special week-long
    sale. As part of the advertising campaign Reed
    runs one or more television commercials during
    the weekend preceding the sale. Data from a
    sample of 5 previous sales showing the number of
    TV ads run and the number of cars sold in each
    sale are shown below. Develop a scatter diagram.

8
Example (cont.)
  • Number of TV Ads Number of Cars Sold
  • 1 14
  • 3 24
  • 2 18
  • 1 17
  • 3 27

9
(No Transcript)
10
  • Chapter 3
  • Section 3.5
  • Measures of Association Between Two Variables
  • Covariance
  • Correlation Coefficient

11
  • Covariance is a descriptive measure of the linear
    association between two variables.
  • The value of covariance depends upon units of
    measurement.
  • A measure of the relationship between two
    variables that avoids this difficulty is the
    correlation coefficient.

12
Covariance
  • If the data sets are samples, the covariance is
    denoted by sxy.
  • If the data sets are populations, the covariance
    is denoted by .

13
Example Reed Auto Sales
  • Sample covariance
  • 20/4 5 (autostv ads)

14
Correlation Coefficient
  • If the data sets are samples, the correlation
    coefficient is denoted by rxy.
  • If the data sets are populations, the correlation
    coefficient is denoted by rxy .

15
Correlation Coefficient
  • The coefficient can take on values between -1 and
    1.
  • If r or r are near -1, it indicates a strong
    negative linear relationship.
  • If r or r are near 1, it indicates a strong
    positive linear relationship.

16
Example Reed Auto Sales
  • s2x 4/4 1 sx 1.
  • s2y 114/4 28.5 sy 5.3385.
  • Correlation coefficient
  • rxy 5/(15.3385) 0.936586.
  • A strong positive linear relationship.

17
  • If r or r 1, it is a case of perfect positive
    linear correlation (all points are on a
    positively sloped straight line).
  • If r or r -1, it is a case of a perfect
    negative linear correlation (all points are on a
    negatively sloped straight line).
  • If r or r 0, there is no linear correlation
    between the two variables (the points are
    scattered all over the diagram).

18
  • We would like to find an analytical/mathematical
    expression (a formula) for the relationship
    between TV ads and auto sales.
  • Both a scatter diagram and correlation
    coefficient suggest that there is a linear
    relationship between TV ads and auto sales.

19
(No Transcript)
20
Chapter 14 Outline
  • The simple linear regression model
  • The Least Squares Method
  • The coefficient of determination

21
Regression analysis
  • Regression analysis is a description or the study
    of the nature of the relationship between
    variables (for example, linear regression,
    non-linear regression, simple regression,
    multiple regression).

22
Functional vs. stochastic relationship
  • Functional (deterministic) relationship the
    variables are perfectly related the relationship
    is true for each/any observation. For example,
    the area of a square in mathematics, total
    revenue in economics.
  • Statistical (stochastic) relationship the
    variables are not perfectly related, the
    relationship is true on average, not for each
    observation. For example, MPC in economics.

23
The simple linear regression
  • The simple linear regression model is a
    mathematical way of stating the linear
    statistical relationship between two variables.
  • The variable being predicted is called the
    dependent variable.
  • The variable being used to predict the value of
    the dependent variable is called the independent
    variable.

24
Regression equation
  • Regression equation the equation that describes
    how the mean value (that is, on average) of the
    dependent variable (y) is related to the
    independent variable(s) (x).
  • Simple Linear Regression Equation
  • E(y) ?0 ?1x
  • ?0 and ?1 are referred to as the parameters of
    the model.

25
Regression model
  • Regression model the equation that describes
    how the dependent variable is related to the
    independent variable(s) and an error term.
  • Simple Linear Regression Model
  • y ?0 ?1x ?
  • ? (the Greek letter epsilon) is a random
    variable referred to as the error term. It
    absorbs the impact of all other variables on y.

26
Estimated regression equation
  • We will use a sample to estimate the population
    parameters ?0 and ?1 . Sample statistics (denoted
    b0 and b1) serve as estimates of ?0 and ?1 .
    Substituting the values of b0 and b1 in the
    regression equation, we obtain the estimated
    regression equation.
  • Estimated Simple Linear Regression Equation
  • y b0 b1x
  • y is the mean value of y for a given value of
    x.



27
The Least Squares Method
  • Least Squares Criterion
  • min S(yi - yi)2
  • where
  • yi observed value of the dependent variable
    for the i th observation
  • yi estimated value of the dependent variable
    for the i th observation



28
The Least Squares Method
  • Slope for the Estimated Regression Equation
  • This formula appears in the footnote on p. 568
  • y -Intercept for the Estimated Regression
    Equation
  • b0 y - b1x

_
_
29
Example Reed Auto Sales
  • Slope for the Estimated Regression Equation
  • b1 220 - (10100)/5 5
  • 24 - (10)2/5
  • y -Intercept for the Estimated Regression
    Equation
  • b0 20 - 5(2) 10
  • Estimated Regression Equation
  • y 10 5x


30
Interpretation
  • bo is the expected value of y when x0. (May be
    meaningless). In our example, when the number of
    TV ads is zero, the expected number of cars sold
    is 10.
  • b1 is the change in the expected value of y when
    x changes by 1 unit of its measurement, ceteris
    paribus. In our example, when the number of TV
    ads increases by 1, the number of cars sold is
    expected to increase by 5 cars.

31
(No Transcript)
32
SST, SSR, SSE
  • Relationship Among SST, SSR, SSE
  • SST SSR SSE

Variation in Y due to X
Total variation in Y
Variation in Y due to all other factors
33
x y yhat y-ybar (y-ybar)2 yhat-ybar (yhat-ybar)2 (y-yhat) (y-yhat)2
1 14 15 -6 36 -5 25 -1 1
3 24 25 4 16 5 25 -1 1
2 18 20 -2 4 0 0 -2 4
1 17 15 -3 9 -5 25 2 4
3 27 25 7 49 5 25 2 4
xbar2 ybar20 114 100 14
34
Coefficient of Determination
  • Coefficient of determination represents the
    proportion of SST that is explained by the use of
    the regression model.
  • Coefficient of Determination
  • r 2 SSR/SST
  • 0 ? r 2 ? 1

35
Example Reed Auto Sales
  • Coefficient of Determination
  • r 2 SSR/SST 100/114 .877193
  • The regression relationship is very strong since
    87.7 of the variation in number of cars sold can
    be explained by the linear relationship between
    the number of TV ads and the number of cars sold.

36
The Correlation Coefficient
  • The correlation coefficient measures the strength
    of the linear association between two variables.
  • The sample correlation coefficient is plus or
    minus the square root of the coefficient of
    determination.
  • Sample Correlation coefficient
  • 0.936586

sign of b1
37
SUMMARY OUTPUT SUMMARY OUTPUT
Regression Statistics Regression Statistics
Multiple R 0.936586
R Square 0.877193
Adjusted R Square 0.836257
Standard Error 2.160247
Observations 5
ANOVA
  df SS MS F Significance F
Regression 1 100 100 21.42857 0.018986
Residual 3 14 4.666667
Total 4 114      
  Coefficients Standard Error t Stat P-value Lower 95 Upper 95 Lower 95.0 Upper 95.0
Intercept 10 2.366432 4.225771 0.024236 2.468958 17.53104 2.468958 17.53104
X Variable 1 5 1.080123 4.6291 0.018986 1.562565 8.437435 1.562565 8.437435
38
Chapter 15 Outline
  • The multiple linear regression model
  • The Least Squares Method
  • The multiple coefficient of determination
  • Categorical independent variables

39
  • Multiple Regression Equation
  • Multiple Regression Model
  • Estimated Multiple Regression Equation

40
Multiple coefficient of determination
  • R2 SSR/SST
  • Adjusted multiple coefficient of determination
  • where p is the number of independent variables.

41
Example Programmer Salary Survey
  • A software firm collected data for a sample of 20
    computer programmers. A suggestion was made that
    regression analysis could be used to determine if
    salary was related to the years of experience and
    the score on the firms programmer aptitude test.
  • The years of experience, score on the aptitude
    test test, and corresponding annual salary
    (1000s) for a sample of 20 programmers is shown
    on the next slide.

42
Test Score
Exper. (Yrs.)
Exper. (Yrs.)
Salary (000s)
Test Score
Salary (000s)
4 7 1 5 8 10 0 1 6 6
9 2 10 5 6 8 4 6 3 3
78 100 86 82 86 84 75 80 83 91
88 73 75 81 74 87 79 94 70 89
38.0 26.6 36.2 31.6 29.0 34.0 30.1 33.9 28.2 30.0
24.0 43.0 23.7 34.3 35.8 38.0 22.2 23.1 30.0 33.0
43
  • Suppose we believe that salary (y) is related to
    the years of experience (x1) and the score on the
    programmer aptitude test (x2) by the following
    regression model
  • where
  • y annual salary (000),
  • x1 years of experience,
  • x2 score on programmer aptitude test.

44
Solving for the Estimates of ß0, ß1, ß2
  • Excels Regression Equation Output

Note Columns F-I are not shown.
45
Estimated Regression Equation
SALARY 3.174 1.404(EXPER) 0.251(SCORE)
Note Predicted salary will be in thousands of
dollars.
46
Interpreting the Coefficients
In multiple regression analysis, we interpret
each regression coefficient as follows
bi represents an estimate of the change in y
corresponding to a 1-unit increase in xi when
all other independent variables are held
constant.
47
Interpreting the Coefficients
b1 1.404
Salary is expected to increase by 1,404 for
each additional year of experience (when the
variable score on programmer attitude test is
held constant).
48
Interpreting the Coefficients
b2 0.251
Salary is expected to increase by 251 for
each additional point scored on the programmer
aptitude test (when the variable years of
experience is held constant).
49
Multiple Coefficient of Determination
  • Excels ANOVA Output

SSR
SST
50
Multiple Coefficient of Determination
R2 SSR/SST
R2 500.3285/599.7855 .83418
51
Adjusted Multiple Coefficient of Determination
52
  • Excels Regression Statistics

Regression Statistics Regression Statistics
Multiple R 0.913334
R Square 0.834179
Adjusted R Square 0.814671
Standard Error 2.418762
Observations 20
53
Categorical independent variables
  • To include categorical independent variables into
    a regression equation, we use dummy (0,1)
    variables. Dummy variables assume the value of 1
    if a specified characteristic is present and the
    value of 0 otherwise. For example, man 1 and
    woman 0.

54
Example (cont.) Programmer Salary Survey
  • As an extension of the problem involving the
    computer programmer salary survey, suppose that
    management also believes that the annual salary
    is related to whether the individual has a
    graduate degree in computer science or
    information systems.
  • The years of experience, the score on the
    programmer aptitude test, whether the individual
    has a relevant graduate degree, and the annual
    salary (000) for each of the sampled 20
    programmers are shown on the next slide.

55
Exper. (Yrs.)
Test Score
Test Score
Exper. (Yrs.)
Salary (000s)
Salary (000s)
Degr.
Degr.
4 7 1 5 8 10 0 1 6 6
9 2 10 5 6 8 4 6 3 3
78 100 86 82 86 84 75 80 83 91
88 73 75 81 74 87 79 94 70 89
38.0 26.6 36.2 31.6 29.0 34.0 30.1 33.9 28.2 30.0
No Yes No Yes Yes Yes No No No Yes
Yes No Yes No No Yes No Yes No No
24.0 43.0 23.7 34.3 35.8 38.0 22.2 23.1 30.0 33.0
56
Multiple Regression Model
where y annual salary (1000) x1 years
of experience x2 score on programmer aptitude
test x3 0 if individual does not have a
graduate degree 1 if individual does
have a graduate degree
x3 is a dummy variable
57
  • Excels Regression Equation Output

Note Columns F-I are not shown.
58
  • Excels ANOVA Output

59
  • Excels Regression Statistics

60
More Complex Categorical Variables
  • If a categorical variable has k levels, k - 1
    dummy variables are required, with each dummy
    variable being coded as 0 or 1.
  • For example, a variable with levels A, B, and C
    could be represented by x1 and x2 values of (0,
    0) for A, (1, 0) for B, and (0,1) for C.
  • Care must be taken in defining and interpreting
    the dummy variables.

61
  • For example, a variable indicating level of
    education could be represented by x1 and x2
    values as follows
Write a Comment
User Comments (0)
About PowerShow.com