# Association Between Two Variables - PowerPoint PPT Presentation

Loading...

PPT – Association Between Two Variables PowerPoint presentation | free to view - id: 146498-OWM2N

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

## Association Between Two Variables

Description:

### Question: Are pesticides less likely in organic food (p. 95) ... Omitted (Lurking) Variables. Scatterplot. Does a relationship exist? Is the relationship linear? ... – PowerPoint PPT presentation

Number of Views:117
Avg rating:3.0/5.0
Slides: 34
Provided by: chadsp
Category:
User Comments (0)
Transcript and Presenter's Notes

Title: Association Between Two Variables

1
Chapter 3
• Association Between Two Variables

2
True Interest of Statistics
• Examine relationship between two or more
variables.
• Does an association exist? Does the value of one
variable relate to an other?

3
Terms
• Dependent (or response) variable
• Independent (or explanatory) variable

4
I) Two Categorical Variables
• Describe relationship using Contingency Table
• Each cell records the number of observations
meeting criteria set by variables.

5
Example Contingency Table
• Question Are pesticides less likely in organic
food (p. 95)?
• 1) Identify explanatory and response variables.
• 2) Make a Table
• What do cell values represent?
• 3) Sum totals of each explanatory variable. WHY?

6
Example Contingency Table
• 4) Conditional upon being a certain type of
produce (Type X), what is the chance of finding
pesticides?
• 5) Conclude

7
Excel Example Contingency Table
8
Other Examples of Two Categorical Variables
• Condition of Home vs. Home Ownership
• Gender vs. Smoking
• Others?

9
Identifying Explanatory and Response Variables
• THINK of cause and effect Response variable is
the effect
• Direction of arrow
• Income Political ID
• Greenhouse Gas Global Temperature

10
But
• CORRELATION (ASSOCIATION) DOES NOT IMPLY
CAUSATION.
• Reverse Causality (Feedback Loop)
• Omitted Variables
• This course Association/Correlation
• Advanced course Causation
• Be careful about language
• Side note Time component helps est. causation

11
II) Comparing a Categorical and a Quantitative
Variable
• Easier to analyze if the categorical variable is
the explanatory variable.
• Conditional upon being in group X, what are the
characteristics of Y?
• Gender and height
• Race and income

12
Options for Comparing
• 1) Compare summary statistics for two groups.
• 2) Recode categorical data as zeros and ones,
then draw a scatterplot.

13
Excel Example Ch3-1.xls
• 1) Sort data
• 2) Create table
• 3) Since womens heights are in Column B, Rows
2-263, type average(B2B263) in the table.
For men, type average(B264 B382).

14
Excel Example Ch3-1.xls
• 4) Standard deviation (of a sample) is
stdev(B2B263). To find this formula
• Excel 2007 Formulas ? More Functions ?
Statistical ? stdev
• ALL Excel Go to Function Wizard ? More Functions
? Standard Deviation, then select one of the
possibilities.
• 5) Coefficient of Variation?
• Advantage of CV over s?

15
Excel Example Ch3-1.xls
• 6) Scatterplot
• i) Type if(A2Female,1,0) in Female
Column. Why?
• ii) Double click crosshair on lower-right corner
of the cell. Why?
• iii) Make Chart
• Excel 2007 Highlight Female and Height Data, Go
to Insert ? Chart ? Scatter
• Old Excel Chart Wizard ? XY(Scatter) ? Finish
• What is true of all observations at X 0 ?
• What can we learn from this chart?

16
III) Comparing Two Quantitative Variables
• Examples?

17
Options for Comparing
• Correlation, r
• A single number (numerical summary) describing
the strength of a linear relationship between X
and Y.
• Range of values
• Strong Positive Relationship r
• Strong Negative Relationship r
• Insensitive to which of the two variables is X
and which is Y.
• Unit-Free (Insensitive to units of measurement)

18
Options for Comparing
• Correlation, r
• Where ZX deviations from the mean

19
Options for Comparing
• Correlation, r
• Correlation does not imply causation
• Reverse Causality
• Omitted (Lurking) Variables
• Scatterplot
• Does a relationship exist?
• Is the relationship linear?
• Is the relationship positive or negative?

20
Options for Comparing
• Scatterplot
• Example

21
Options for Comparing
• Scatterplot
• Example r

22
Options for Comparing
• Scatterplot
• Example r

23
Options for Comparing
• Scatterplot
• Example r

24
Options for Comparing
• Scatterplot
• Example Most of the time, when X is above its
mean, Y is above its mean, so ZX ZY gt 0 most of
the time.
• This implies that r gt 0
• See Correlation by Eye applet for more
correlation examples.

25
Options for Comparing
• REGRESSION (Line of Best Fit)
• Suppose we believe a linear relationship between
X and Y exists such that
• Y a bX error
• We can develop statistics (a b) to estimate
population parameters (a b) that predict Y
values given particular X values.

26
Options for Comparing
• REGRESSION (Line of Best Fit)

27
Excel Example Ch3-3.xls
• 3rd Party Votes in Florida, 1996 vs. 2000.
• What is X What is Y?
• Why is this interesting?
• 1) Correlation
• i) correl(B2B68, C2C68)
• ii) Tools ? Data Analysis ? Correlation

28
Excel Example Ch3-3.xls
• 2) Scatterplot
• i) Highlight columns
• ii) Choose Insert ? Chart ? Scatter (or Chart
Wizard ? XY(Scatter))
• 3) Regression / Line of Best Fit (all XLS
versions)
• i) Right-Click scatterplot data
• ii) Choose Add Trendline, Select Display
equation on chart option.
• iii) Interpretation of coefficients?
• iv) Much more on this later

29
Regression Intuition
• We observe Y, but develop statistics to predict Y
for given X values.
• The difference (Y Y) is called a residual
• A regression line of best fit minimizes the sum
of squared residuals. That is, it makes the
errors associated with our predictions as small
as possible.

30
Regression Intuition
• What do residuals look like?
• Least squares regressions find a b values that
minimize the sum of squared residuals.

31
Regression Notes
• Size of b says nothing about the strength of the
statistical relationship. If you change the units
of X, you will change the size of b, but not r.
• Although b r (SY/SX), you can have r 1 but
still find b to be meaningless.
• A regression of Annual Wages on Number of
Children for men earning 20,000 to 40,000
reveals
• Wage 29600 178 Children
• What does this mean?

32
Regression Potential Problems
• Extrapolation
• Previous example What about men who earn more
than 100,000 per year?
• Time Series ESPNs Sports Figures episode
starring Marion Jones. Let the Y-variable
represent Ms. Joness fastest time in the 100m
Dash. Let X Years Since 1989, the year she
started competing. What will happen 120 years
after her first race?

33
Regression Potential Problems
• Correlation does not imply causation.
• Reverse Causality
• Omitted Variables
• Book Example A study found that, over a three
year period, the proportion of smokers who died
was less than the proportion of non-smokers who
died. Is smoking good for you?
About PowerShow.com