Loading...

PPT – Association Between Two Variables PowerPoint presentation | free to view - id: 146498-OWM2N

The Adobe Flash plugin is needed to view this content

Chapter 3

- Association Between Two Variables

True Interest of Statistics

- Examine relationship between two or more

variables. - Does an association exist? Does the value of one

variable relate to an other?

Terms

- Dependent (or response) variable
- Independent (or explanatory) variable

I) Two Categorical Variables

- Describe relationship using Contingency Table
- Each cell records the number of observations

meeting criteria set by variables.

Example Contingency Table

- Question Are pesticides less likely in organic

food (p. 95)? - 1) Identify explanatory and response variables.
- 2) Make a Table
- What do cell values represent?
- 3) Sum totals of each explanatory variable. WHY?

Example Contingency Table

- 4) Conditional upon being a certain type of

produce (Type X), what is the chance of finding

pesticides? - 5) Conclude

Excel Example Contingency Table

Other Examples of Two Categorical Variables

- Condition of Home vs. Home Ownership
- Gender vs. Smoking
- Others?

Identifying Explanatory and Response Variables

- THINK of cause and effect Response variable is

the effect - Direction of arrow
- Income Political ID
- Greenhouse Gas Global Temperature

But

- CORRELATION (ASSOCIATION) DOES NOT IMPLY

CAUSATION. - Reverse Causality (Feedback Loop)
- Omitted Variables
- This course Association/Correlation
- Advanced course Causation
- Be careful about language
- Side note Time component helps est. causation

II) Comparing a Categorical and a Quantitative

Variable

- Easier to analyze if the categorical variable is

the explanatory variable. - Conditional upon being in group X, what are the

characteristics of Y? - Gender and height
- Race and income

Options for Comparing

- 1) Compare summary statistics for two groups.

- 2) Recode categorical data as zeros and ones,

then draw a scatterplot.

Excel Example Ch3-1.xls

- 1) Sort data
- 2) Create table
- 3) Since womens heights are in Column B, Rows

2-263, type average(B2B263) in the table.

For men, type average(B264 B382).

Excel Example Ch3-1.xls

- 4) Standard deviation (of a sample) is

stdev(B2B263). To find this formula - Excel 2007 Formulas ? More Functions ?

Statistical ? stdev - ALL Excel Go to Function Wizard ? More Functions

? Standard Deviation, then select one of the

possibilities. - 5) Coefficient of Variation?
- Advantage of CV over s?

Excel Example Ch3-1.xls

- 6) Scatterplot
- i) Type if(A2Female,1,0) in Female

Column. Why? - ii) Double click crosshair on lower-right corner

of the cell. Why? - iii) Make Chart
- Excel 2007 Highlight Female and Height Data, Go

to Insert ? Chart ? Scatter - Old Excel Chart Wizard ? XY(Scatter) ? Finish
- What is true of all observations at X 0 ?
- What can we learn from this chart?

III) Comparing Two Quantitative Variables

- Examples?

Options for Comparing

- Correlation, r
- A single number (numerical summary) describing

the strength of a linear relationship between X

and Y. - Range of values
- Strong Positive Relationship r
- Strong Negative Relationship r
- Insensitive to which of the two variables is X

and which is Y. - Unit-Free (Insensitive to units of measurement)

Options for Comparing

- Correlation, r
- Where ZX deviations from the mean

Options for Comparing

- Correlation, r
- Correlation does not imply causation
- Reverse Causality
- Omitted (Lurking) Variables
- Scatterplot
- Does a relationship exist?
- Is the relationship linear?
- Is the relationship positive or negative?

Options for Comparing

- Scatterplot
- Example

Options for Comparing

- Scatterplot
- Example r

Options for Comparing

- Scatterplot
- Example r

Options for Comparing

- Scatterplot
- Example r

Options for Comparing

- Scatterplot
- Example Most of the time, when X is above its

mean, Y is above its mean, so ZX ZY gt 0 most of

the time. - This implies that r gt 0
- See Correlation by Eye applet for more

correlation examples.

Options for Comparing

- REGRESSION (Line of Best Fit)
- Suppose we believe a linear relationship between

X and Y exists such that - Y a bX error
- We can develop statistics (a b) to estimate

population parameters (a b) that predict Y

values given particular X values.

Options for Comparing

- REGRESSION (Line of Best Fit)

Excel Example Ch3-3.xls

- 3rd Party Votes in Florida, 1996 vs. 2000.
- What is X What is Y?
- Why is this interesting?
- 1) Correlation
- i) correl(B2B68, C2C68)
- ii) Tools ? Data Analysis ? Correlation

Excel Example Ch3-3.xls

- 2) Scatterplot
- i) Highlight columns
- ii) Choose Insert ? Chart ? Scatter (or Chart

Wizard ? XY(Scatter)) - 3) Regression / Line of Best Fit (all XLS

versions) - i) Right-Click scatterplot data
- ii) Choose Add Trendline, Select Display

equation on chart option. - iii) Interpretation of coefficients?
- iv) Much more on this later

Regression Intuition

- We observe Y, but develop statistics to predict Y

for given X values. - The difference (Y Y) is called a residual
- A regression line of best fit minimizes the sum

of squared residuals. That is, it makes the

errors associated with our predictions as small

as possible.

Regression Intuition

- What do residuals look like?
- Least squares regressions find a b values that

minimize the sum of squared residuals.

Regression Notes

- Size of b says nothing about the strength of the

statistical relationship. If you change the units

of X, you will change the size of b, but not r. - Although b r (SY/SX), you can have r 1 but

still find b to be meaningless. - A regression of Annual Wages on Number of

Children for men earning 20,000 to 40,000

reveals - Wage 29600 178 Children
- What does this mean?

Regression Potential Problems

- Extrapolation
- Previous example What about men who earn more

than 100,000 per year? - Time Series ESPNs Sports Figures episode

starring Marion Jones. Let the Y-variable

represent Ms. Joness fastest time in the 100m

Dash. Let X Years Since 1989, the year she

started competing. What will happen 120 years

after her first race?

Regression Potential Problems

- Correlation does not imply causation.
- Reverse Causality
- Omitted Variables
- Book Example A study found that, over a three

year period, the proportion of smokers who died

was less than the proportion of non-smokers who

died. Is smoking good for you?