Loading...

PPT – Multivariate Analysis PowerPoint presentation | free to download - id: 12f9c8-NzM3N

The Adobe Flash plugin is needed to view this content

Multivariate Analysis

- Overview

Introduction

- Multivariate thinking
- Body of thought processes that illuminate the

interrelatedness between and within sets of

variables. - The essence of multivariate thinking is to expose

the inherent structure and meaning revealed

within these sets of variables through

application and interpretation of various

statistical methods

Why the multivariate approach?

- Big idea- multiple response outcomes
- With univariate analyses we have just one

dependent variable of interest - Although any analysis of data involving more than

one variable could be seen as multivariate, we

typically reserve the term for multiple dependent

variables - So MV analysis is an extension of UV ones, or

conversely, many of the UV analyses are special

cases of MV ones

Why MV over the univariate approach?

- Complexity
- The subject/data studied may be more complex than

what univariate methods can offer in terms of

analysis - Reality
- In some cases it would be inappropriate to

conduct univariate analysis as the data/research

demand a multivariate analysis

Why MV over the univariate approach?

- Experimental data
- Although experimental research can be and often

is multivariate, typically subjects are assigned

to groups and the manipulations regard

corresponding changes to a single outcome - Different doses of caffeine ? test performance
- Causality is more easily deduced
- Non-experimental data
- Likewise survey/inventory data might be analyzed

in univariate fashion, but typically it will

require the multivariate approach to solve the

questions stemming from it - Correlational

Why not MV?

- In the past the computations were overwhelming

even with smaller datasets, and so MV analyses

were typically avoided - Now this is not a problem but there are still

reasons to not do a MV analysis

Why not MV?

- Ambiguity
- MV analysis may result in a less clear

understanding of the data - E.g. group differences on a linear combination of

DVs (Manova) - Differences are easily interpreted in a

univariate sense - Ambiguity because of ignorance of the technique

is not a valid reason however - Unnecessary complexity
- Just because SEM looks neat/is popular doesnt

mean you have to do one, or that it is the best

way to answer your research question - No free lunch
- MV analyses come with their own rules and

assumptions that may make analysis difficult or

not as strong

Multivariate Pros and Cons Summary

- Advantages of using a multivariate statistic
- Richer realistic design
- Looks at phenomena in an overarching way

(provides multiple levels of analysis) - Each method differs in amount or type of

Independent Variables (IVs) and DVs - Can help control for Type I Error
- Disadvantages
- Larger Ns are often required
- More difficult to interpret
- Less known about the robustness of assumptions

Primary purposes of MV analysis

- Prediction and explanation
- Determining structure

Prediction

- The goal in most research situations is to be

able to predict outcomes based on prior

information - E.g. given a persons gender and region, what

will their attitude be on some social issue? - Given a number of variables how well can we

predict group membership? - Explanation
- Which variables are most important in the

prediction of some outcome? - In many cases this is end goal of an analysis,

though a very problematic one

A caveat regarding explanation

- Determining variable importance can be a suspect

endeavor - Something that might be deemed a statistically

significant variable may not make the cut had the

study been conducted again - Depending on a number of factors, results may be

sample specific - i.e. you may not see the same ordering next time

Structure

- A different goal in MV analysis is to determine

the structure of the data - Is there an underlying dimension that can

describe the data in a simpler fashion? - Methods involve classification and/or data

reduction - Latent variables (constructs)
- Example
- Observed variables Giddiness, Silliness,

Irrationality, Possessiveness and

Misunderstanding reduced to the underlying

construct of Love - Interest may be in reducing variables (Factor

analysis), emphasis on group membership (Cluster

analysis), stimulus structure (MDS) etc.

Prediction and Structure

- Both prediction and structure may be the goal of

analysis - SEM and path analysis
- How well does the model fit the data?

Multivariate Themes

Multivariate Themes

Things to consider

- Initial variable choice
- Comes down to
- Familiarity with previous research
- Instrument used
- Expertise with field of study
- Common sense
- Much of the hard work consists of developing a

plan of attack and deciding on how to study the

problem

Initial Examination of Data

- Preliminary analysis
- A thorough initial examination of the data is not

only required but also necessary for a full

understanding of any research - Such initial analyses provide a better grasp of

what is happening in the data and may inform the

MV analysis to a certain extent - However, in the MV case, if the actual goal is

interpretation of the UV analyses (as one often

sees in MANOVA), the MV analysis is unwarranted

More to consider

- Intro now, more details as we discuss each method
- Assumptions important for inferences beyond the

sample - Normality Basic assumption of General Linear

Model concerned with an elliptical pattern of

residuals for the data - Skewness Distribution of scores is tilted

(asymmetrical) - Direction established by tail
- greater skewness less normality
- Kurtosis Degree of peakedness of data
- 3 Types leptokurtic (thin) mesokurtic (normal)

platykurtic (flattened)

More to consider

- Linearity
- Data forms a relatively straight oval line when

plotted - Homoscedasticity
- variance of 1 variable is equal at all levels of

other variables - understood through standard deviations across

variables and scatter plots - Referred to as homogeneity of variance in ANOVA

methods - Homogeneity of regression
- Regression slopes between covariate and DV are

equal across groups of IV - Do not want this statistic (F) to be

significantly differentif so, violation of

assumption for (M)ANCOVA

More to consider

- Multicollinearity
- Correlation coefficient (r) between predictors is

noticeably large - Causes instability in the statistical procedure
- Cant differentiate which variables are

contributing to outcome - Singularity
- Redundant variablesbrings discriminant in

equation to zero - Orthogonality
- Allows no association among variables
- Not realistic in real world data
- May allow greater interpretability versus data

that are too related

More to consider

- Outliers
- Effect mean (inflate/deflate) disguising true

relationship - Distort datacreate noise (error) lose power
- Transformations (log or square root) may be

helpful with outliers - Reshapes distribution creating a more normal

distribution - However you now have a scale with which you are

unfamiliar and which you cannot generalize back

to the original

Some distinctions

- Types of data
- Nominal/Categorical
- Ordinal
- Continuous
- Interval or Ratio
- The types of variables involved will say much

about what analyses are going to be appropriate

and/or how one might proceed with a particular

analysis

Types of data

- One thing to keep in mind is that these

distinctions are largely arbitrary - One can dichotomize a continuous measure into

categories - A bad idea most of the time
- An ordinal measure (e.g. likert question) has a

mean/construct that actually falls along a

continuum - How the data is to be considered is largely left

to the researcher

Sample vs. Population

- In typical research we are rarely dealing with a

population - The goal in research is not to simply describe

our data but to generalize to the real world - Many analyses and data collection are for a

variety of reasons (not good) sample-specific,

and not much use to the scientific community - Take care in the initial phase of research

planning to help guard against such a situation

The linear combination of variables

- Whether of IVs or DVs, a linear combination of

variables is often necessary to interpret the

data - This idea is essential to thinking multivariately
- MultReg
- Finding the linear combination of IVs that best

predicts the DV - Manova
- What linear combination of DVs maximizes the

distinction between groups

How many variables

- Considerations
- Cost
- Availability
- Meaningfulness
- Theory
- For ease of understanding and efficiency we

typically want the fewest number of variables

that will explain the most - Ockhams razor

Statistical power and effect size

- A problem that has plagued the social sciences is

the lack of power to find subtle effects - Some multivariate procedures will require

relatively large amounts of data (e.g. SEM) - Power and sample size are a required

consideration before any attempt at research,

multivariate or otherwise - After the fact, emphasis should be placed on

effect size and model fit, rather than p-values - More later

The matrices of interest

- Data matrix
- What you see in SPSS or whatever program youre

using - Includes the cases and their corresponding values

for the variables of interest - Correlation matrix- R
- Contains information about the linear

relationship between variables - Standardized covariance
- Symmetrical
- Square
- Typically only the bottom portion is shown as the

top portion is its mirror image and the diagonal

contains all ones (each variable is perfectly

correlated with itself)

The matrices of interest

- Variance/Covariance matrix - S
- Square and symmetrical
- Variance of each variable is on the diagonal,

covariances with other variables on the

off-diagonals - In some cases you will have the option to use

correlations or covariances as the unit of

analysis, with some debate about which is better

under what circumstances

The matrices of interest

- Sum of Squares and cross-products matrix - S
- Precursor to the Variance/Covariance matrix (the

values before division by N-1) - On the diagonal is a variables sum of the

squared deviations from its mean - Off-diagonal elements are the sum of the products

of the deviation scores for the two variables

Methods of analysis

- A host of methods are available to the researcher
- The kind of question asked will help guide one in

choosing the appropriate analysis, however the

data may be available to multiple methods, and

almost always is

Degree of relationship

- Bivariate r
- The degree of linear relationship between two

variables - Partial and semi-partial
- Multiple R
- The relationship of a set of variables to another

(dependent) variable - Canonical R
- The grandaddy
- Relationship between sets of variables
- Methods are also available to assess the

relationship among non-continuous variables - E.g. Chi-square, Multiway Frequency Analysis

Group Differences

- Very popular research question in social sciences

(too popular really) - Is group A different from B?
- The answer is always yes, and with a large enough

sample, statistically significantly so - Anova and related
- Manova the multivariate counterpart
- Repeated measures

Predicting group membership

- Turning the group difference question the other

way around - Discriminant function analysis
- Logistic regression

Structure

- Data reduction and classification
- Cluster analysis
- Seeks to identify homogeneous subgroups of cases

or variables based on some measure of distance - Identify a set of groups in which within-group

variation is minimized and between-group

variation is maximized - Principal components and Factor analysis
- Reduce a large number of variables to smaller
- Often used in psych for the development of

inventories - Structural equation modeling
- Where factor analysis and regression meet

Time course of events

- How long is it before some event occurs?
- How does a DV change over the course of time?
- The former question can be answered with

survival/failure analysis - Survival rates for disease
- Time before failure for a particular electronic

part - The latter is often examined with time-series

analysis - Many time periods are available for analysis
- E.g. monthly stock prices over the past five

years - Popular in the economics realm

Decision tree

Decision tree

Decision tree

- Although such guides may be useful, as mentioned

before, multiple analyses may be appropriate for

the data under consideration - The best plan of attack is to have a well-defined

research question, and collect data appropriate

to the analysis that will best answer that

question

Multivariate Methods Quick Glance

Summary of Methods

- The multivariate methods we will look at are a

set of tools for analyzing multiple variables in

an integrated and powerful way. - They allow the examination of richer and perhaps

more realistic designs than can be assessed with

traditional univariate methods that only analyze

one outcome variable and usually just one or two

independent variables (IVs) - Compared to univariate methods, multivariate

methods allow us to analyze a complex array of

variables, providing greater assurance that we

can come to some synthesizing conclusions with

less error and more validity than if we were to

analyze variables in isolation.