Multivariate Analysis - PowerPoint PPT Presentation

Loading...

PPT – Multivariate Analysis PowerPoint presentation | free to download - id: 12f9c8-NzM3N



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Multivariate Analysis

Description:

The essence of multivariate thinking is to expose the inherent structure and ... Giddiness, Silliness, Irrationality, Possessiveness and Misunderstanding ... – PowerPoint PPT presentation

Number of Views:595
Avg rating:3.0/5.0
Slides: 42
Provided by: mik4
Learn more at: http://www.unt.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Multivariate Analysis


1
Multivariate Analysis
  • Overview

2
Introduction
  • Multivariate thinking
  • Body of thought processes that illuminate the
    interrelatedness between and within sets of
    variables.
  • The essence of multivariate thinking is to expose
    the inherent structure and meaning revealed
    within these sets of variables through
    application and interpretation of various
    statistical methods

3
Why the multivariate approach?
  • Big idea- multiple response outcomes
  • With univariate analyses we have just one
    dependent variable of interest
  • Although any analysis of data involving more than
    one variable could be seen as multivariate, we
    typically reserve the term for multiple dependent
    variables
  • So MV analysis is an extension of UV ones, or
    conversely, many of the UV analyses are special
    cases of MV ones

4
Why MV over the univariate approach?
  • Complexity
  • The subject/data studied may be more complex than
    what univariate methods can offer in terms of
    analysis
  • Reality
  • In some cases it would be inappropriate to
    conduct univariate analysis as the data/research
    demand a multivariate analysis

5
Why MV over the univariate approach?
  • Experimental data
  • Although experimental research can be and often
    is multivariate, typically subjects are assigned
    to groups and the manipulations regard
    corresponding changes to a single outcome
  • Different doses of caffeine ? test performance
  • Causality is more easily deduced
  • Non-experimental data
  • Likewise survey/inventory data might be analyzed
    in univariate fashion, but typically it will
    require the multivariate approach to solve the
    questions stemming from it
  • Correlational

6
Why not MV?
  • In the past the computations were overwhelming
    even with smaller datasets, and so MV analyses
    were typically avoided
  • Now this is not a problem but there are still
    reasons to not do a MV analysis

7
Why not MV?
  • Ambiguity
  • MV analysis may result in a less clear
    understanding of the data
  • E.g. group differences on a linear combination of
    DVs (Manova)
  • Differences are easily interpreted in a
    univariate sense
  • Ambiguity because of ignorance of the technique
    is not a valid reason however
  • Unnecessary complexity
  • Just because SEM looks neat/is popular doesnt
    mean you have to do one, or that it is the best
    way to answer your research question
  • No free lunch
  • MV analyses come with their own rules and
    assumptions that may make analysis difficult or
    not as strong

8
Multivariate Pros and Cons Summary
  • Advantages of using a multivariate statistic
  • Richer realistic design
  • Looks at phenomena in an overarching way
    (provides multiple levels of analysis)
  • Each method differs in amount or type of
    Independent Variables (IVs) and DVs
  • Can help control for Type I Error
  • Disadvantages
  • Larger Ns are often required
  • More difficult to interpret
  • Less known about the robustness of assumptions

9
Primary purposes of MV analysis
  • Prediction and explanation
  • Determining structure

10
Prediction
  • The goal in most research situations is to be
    able to predict outcomes based on prior
    information
  • E.g. given a persons gender and region, what
    will their attitude be on some social issue?
  • Given a number of variables how well can we
    predict group membership?
  • Explanation
  • Which variables are most important in the
    prediction of some outcome?
  • In many cases this is end goal of an analysis,
    though a very problematic one

11
A caveat regarding explanation
  • Determining variable importance can be a suspect
    endeavor
  • Something that might be deemed a statistically
    significant variable may not make the cut had the
    study been conducted again
  • Depending on a number of factors, results may be
    sample specific
  • i.e. you may not see the same ordering next time

12
Structure
  • A different goal in MV analysis is to determine
    the structure of the data
  • Is there an underlying dimension that can
    describe the data in a simpler fashion?
  • Methods involve classification and/or data
    reduction
  • Latent variables (constructs)
  • Example
  • Observed variables Giddiness, Silliness,
    Irrationality, Possessiveness and
    Misunderstanding reduced to the underlying
    construct of Love
  • Interest may be in reducing variables (Factor
    analysis), emphasis on group membership (Cluster
    analysis), stimulus structure (MDS) etc.

13
Prediction and Structure
  • Both prediction and structure may be the goal of
    analysis
  • SEM and path analysis
  • How well does the model fit the data?

14
Multivariate Themes
15
Multivariate Themes
16
Things to consider
  • Initial variable choice
  • Comes down to
  • Familiarity with previous research
  • Instrument used
  • Expertise with field of study
  • Common sense
  • Much of the hard work consists of developing a
    plan of attack and deciding on how to study the
    problem

17
Initial Examination of Data
  • Preliminary analysis
  • A thorough initial examination of the data is not
    only required but also necessary for a full
    understanding of any research
  • Such initial analyses provide a better grasp of
    what is happening in the data and may inform the
    MV analysis to a certain extent
  • However, in the MV case, if the actual goal is
    interpretation of the UV analyses (as one often
    sees in MANOVA), the MV analysis is unwarranted

18
More to consider
  • Intro now, more details as we discuss each method
  • Assumptions important for inferences beyond the
    sample
  • Normality Basic assumption of General Linear
    Model concerned with an elliptical pattern of
    residuals for the data
  • Skewness Distribution of scores is tilted
    (asymmetrical)
  • Direction established by tail
  • greater skewness less normality
  • Kurtosis Degree of peakedness of data
  • 3 Types leptokurtic (thin) mesokurtic (normal)
    platykurtic (flattened)

19
More to consider
  • Linearity
  • Data forms a relatively straight oval line when
    plotted
  • Homoscedasticity
  • variance of 1 variable is equal at all levels of
    other variables
  • understood through standard deviations across
    variables and scatter plots
  • Referred to as homogeneity of variance in ANOVA
    methods
  • Homogeneity of regression
  • Regression slopes between covariate and DV are
    equal across groups of IV
  • Do not want this statistic (F) to be
    significantly differentif so, violation of
    assumption for (M)ANCOVA

20
More to consider
  • Multicollinearity
  • Correlation coefficient (r) between predictors is
    noticeably large
  • Causes instability in the statistical procedure
  • Cant differentiate which variables are
    contributing to outcome
  • Singularity
  • Redundant variablesbrings discriminant in
    equation to zero
  • Orthogonality
  • Allows no association among variables
  • Not realistic in real world data
  • May allow greater interpretability versus data
    that are too related

21
More to consider
  • Outliers
  • Effect mean (inflate/deflate) disguising true
    relationship
  • Distort datacreate noise (error) lose power
  • Transformations (log or square root) may be
    helpful with outliers
  • Reshapes distribution creating a more normal
    distribution
  • However you now have a scale with which you are
    unfamiliar and which you cannot generalize back
    to the original

22
Some distinctions
  • Types of data
  • Nominal/Categorical
  • Ordinal
  • Continuous
  • Interval or Ratio
  • The types of variables involved will say much
    about what analyses are going to be appropriate
    and/or how one might proceed with a particular
    analysis

23
Types of data
  • One thing to keep in mind is that these
    distinctions are largely arbitrary
  • One can dichotomize a continuous measure into
    categories
  • A bad idea most of the time
  • An ordinal measure (e.g. likert question) has a
    mean/construct that actually falls along a
    continuum
  • How the data is to be considered is largely left
    to the researcher

24
Sample vs. Population
  • In typical research we are rarely dealing with a
    population
  • The goal in research is not to simply describe
    our data but to generalize to the real world
  • Many analyses and data collection are for a
    variety of reasons (not good) sample-specific,
    and not much use to the scientific community
  • Take care in the initial phase of research
    planning to help guard against such a situation

25
The linear combination of variables
  • Whether of IVs or DVs, a linear combination of
    variables is often necessary to interpret the
    data
  • This idea is essential to thinking multivariately
  • MultReg
  • Finding the linear combination of IVs that best
    predicts the DV
  • Manova
  • What linear combination of DVs maximizes the
    distinction between groups

26
How many variables
  • Considerations
  • Cost
  • Availability
  • Meaningfulness
  • Theory
  • For ease of understanding and efficiency we
    typically want the fewest number of variables
    that will explain the most
  • Ockhams razor

27
Statistical power and effect size
  • A problem that has plagued the social sciences is
    the lack of power to find subtle effects
  • Some multivariate procedures will require
    relatively large amounts of data (e.g. SEM)
  • Power and sample size are a required
    consideration before any attempt at research,
    multivariate or otherwise
  • After the fact, emphasis should be placed on
    effect size and model fit, rather than p-values
  • More later

28
The matrices of interest
  • Data matrix
  • What you see in SPSS or whatever program youre
    using
  • Includes the cases and their corresponding values
    for the variables of interest
  • Correlation matrix- R
  • Contains information about the linear
    relationship between variables
  • Standardized covariance
  • Symmetrical
  • Square
  • Typically only the bottom portion is shown as the
    top portion is its mirror image and the diagonal
    contains all ones (each variable is perfectly
    correlated with itself)

29
The matrices of interest
  • Variance/Covariance matrix - S
  • Square and symmetrical
  • Variance of each variable is on the diagonal,
    covariances with other variables on the
    off-diagonals
  • In some cases you will have the option to use
    correlations or covariances as the unit of
    analysis, with some debate about which is better
    under what circumstances

30
The matrices of interest
  • Sum of Squares and cross-products matrix - S
  • Precursor to the Variance/Covariance matrix (the
    values before division by N-1)
  • On the diagonal is a variables sum of the
    squared deviations from its mean
  • Off-diagonal elements are the sum of the products
    of the deviation scores for the two variables

31
Methods of analysis
  • A host of methods are available to the researcher
  • The kind of question asked will help guide one in
    choosing the appropriate analysis, however the
    data may be available to multiple methods, and
    almost always is

32
Degree of relationship
  • Bivariate r
  • The degree of linear relationship between two
    variables
  • Partial and semi-partial
  • Multiple R
  • The relationship of a set of variables to another
    (dependent) variable
  • Canonical R
  • The grandaddy
  • Relationship between sets of variables
  • Methods are also available to assess the
    relationship among non-continuous variables
  • E.g. Chi-square, Multiway Frequency Analysis

33
Group Differences
  • Very popular research question in social sciences
    (too popular really)
  • Is group A different from B?
  • The answer is always yes, and with a large enough
    sample, statistically significantly so
  • Anova and related
  • Manova the multivariate counterpart
  • Repeated measures

34
Predicting group membership
  • Turning the group difference question the other
    way around
  • Discriminant function analysis
  • Logistic regression

35
Structure
  • Data reduction and classification
  • Cluster analysis
  • Seeks to identify homogeneous subgroups of cases
    or variables based on some measure of distance
  • Identify a set of groups in which within-group
    variation is minimized and between-group
    variation is maximized
  • Principal components and Factor analysis
  • Reduce a large number of variables to smaller
  • Often used in psych for the development of
    inventories
  • Structural equation modeling
  • Where factor analysis and regression meet

36
Time course of events
  • How long is it before some event occurs?
  • How does a DV change over the course of time?
  • The former question can be answered with
    survival/failure analysis
  • Survival rates for disease
  • Time before failure for a particular electronic
    part
  • The latter is often examined with time-series
    analysis
  • Many time periods are available for analysis
  • E.g. monthly stock prices over the past five
    years
  • Popular in the economics realm

37
Decision tree
38
Decision tree
39
Decision tree
  • Although such guides may be useful, as mentioned
    before, multiple analyses may be appropriate for
    the data under consideration
  • The best plan of attack is to have a well-defined
    research question, and collect data appropriate
    to the analysis that will best answer that
    question

40
Multivariate Methods Quick Glance

41
Summary of Methods
  • The multivariate methods we will look at are a
    set of tools for analyzing multiple variables in
    an integrated and powerful way.
  • They allow the examination of richer and perhaps
    more realistic designs than can be assessed with
    traditional univariate methods that only analyze
    one outcome variable and usually just one or two
    independent variables (IVs)
  • Compared to univariate methods, multivariate
    methods allow us to analyze a complex array of
    variables, providing greater assurance that we
    can come to some synthesizing conclusions with
    less error and more validity than if we were to
    analyze variables in isolation.
About PowerShow.com