Meta-analysis - PowerPoint PPT Presentation


PPT – Meta-analysis PowerPoint presentation | free to download - id: 3b4b3f-Y2ZjN


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation



ESRC Workshop Researcher Development Initiative Department of Education, University of Oxford * 2 June 2008 * 2 June 2008 * 2 June 2008 * 2 June 2008 * 2 June 2008 ... – PowerPoint PPT presentation

Number of Views:1380
Avg rating:3.0/5.0
Slides: 116
Provided by: educatio77
Tags: analysis | meta


Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Meta-analysis

  • ESRC Workshop
  • Researcher Development Initiative

Department of Education, University of Oxford
Todays content
  • What is meta-analysis,
  • When and why we use meta-analysis,
  • Examples of meta-analyses,
  • Benefits and pitfalls of using meta-analysis,
  • Defining a population of studies and finding
  • Coding materials,
  • Inter-rater reliability,
  • Computing effect sizes,
  • Structuring a database,
  • A conceptual introduction to analysis and
    interpretation of results based on fixed effects,
    random effects, and multilevel models, and
  • Supplementary analyses

Primary versus secondary data analysis
  • Traditionally, education researchers collect and
    analyse their own data (referred to as primary
    data). Secondary data analysis is based on data
    collected by someone else (or, perhaps,
    reanalysis of your own published data). There are
    at least four logical perspectives to this issue
  • 1. Meta-analysis -- systematic, quantitative
    review of published research in a particular
    field, the focus of this presentation.
  • 2. Systematic review -- systematic, qualitative
    review of published research in a particular
  • 3. Secondary Data Analyses -- using large
    (typically public) databases
  • 4. Reanalyses of published studies -- (often in
    ways critical of the original study).

Why meta-analysis?
  • Wilson Lipsey (2001) synthesised 319
    meta-analyses of intervention studies. Across the
    studies, roughly equal amounts of variance were
    due to
  • substantive features of the intervention (true
  • method effects (idiosyncratic study features and
    potential biases particularly research design
    and operationalisation of outcome measures), and
  • sampling error.
  • They concluded
  • These results underscore the difficulty of
    detecting treatment outcomes, the importance of
    cautiously interpreting findings from a single
    study, and the importance of meta-analysis in
    summarizing results across studies (p.413).

Why a course on meta-analysis?
  • Meta-analysis is an increasingly popular tool for
    summarising research findings
  • Cited extensively in research literature
  • Relied upon by policymakers
  • Important that we understand the method, whether
    we conduct or simply consume meta-analytic
  • Should be one of the topics covered in all
    introductory research methodology courses

  • What is meta-analysis?
  • When and why we use meta-analysis?

What is meta-analysis?
  • Systematic synthesis of various studies on a
    particular research question
  • Do boys or girls have higher self-concepts?
  • Collect all studies relevant to a topic
  • Find all published journal articles on the topic
  • An effect size is calculated for each outcome
  • Determine the size/direction of gender difference
    for each study
  • Content analysis
  • Code characteristics of the study age, setting,
    ethnicity, self-concept domain (math, physical,
    social), etc.
  • Effect sizes with similar features are grouped
    together and compared tests moderator variables
  • Do gender differences vary with age, setting,
    ethnicity, self-concept, domain, etc?

A blend of qualitative and quantitative approaches
  • Coding the process of extracting the information
    from the literature included in the
    meta-analysis. Involves noting the
    characteristics of the studies in relation to a
    priori variables of interest (qualitative)
  • Effect size the numerical outcome to be analysed
    in a meta-analysis a summary statistic of the
    data in each study included in the meta-analysis
  • Summarise effect sizes central tendency,
    variability, relations to study characteristics

Abridged history
Karl Pearson (1904)
Karl Pearson conducted what is reputed to be the
first meta-analysis (although not called this)
comparing effects of inoculation in different
Classic Meta-analysis Smith Ml, Glass GV (1977)
Meta-analysis Of Psychotherapy Outcome Studies.
American Psychologist, 32, 752-760. Times Cited
  • Gene Glass coined the phrase meta-analysis in
    classic study of the effects of psychotherapy.
    Because most individual studies had small sample
    sizes, the effects typically were not
    statistically significant.
  • Results of 375 controlled evaluations of
    psychotherapy and counselling were coded and
    integrated statistically. The findings provide
    convincing evidence of the efficacy of
  • On the average, the typical therapy client is
    better off than 75 of untreated individuals.
  • Few important differences in effectiveness could
    be established among many quite different types
    of psychotherapy (e.g., behavioral and

ESRC RDI One Day Meta-analysis workshop (Marsh,
OMara, Malmberg)
Why is meta-analysis important? Generalisability
  • The essence of good science is replicable and
    generalisable results.
  • Do we get the same answer to important research
    questions when we run the study again?
  • The primary aims of meta-analysis is to test the
    generalisability of results across a set of
    studies designed to answer the same research
  • Are the results consistent? If not, what are the
    differences in the studies that explain the lack
    of consistency?

When why we use meta-analysis
  • A primary aim is to reach a conclusion to a
    research question from a sample of studies that
    is generalisable to the population of all such
  • Meta-analysis tests whether study-to-study
    variation in outcomes is more than can be
    explained by random chance.
  • When there is systematic variation in outcomes
    from different studies, meta-analysis tries to
    explain these differences in terms of study
    characteristics e.g. measures used study
    design participant characteristics controls for
    potential bias.

When is meta-analysis appropriate?
  • There exists a critical mass of comparable
    studies designed to address a common research
  • Data are presented in a form that allows the
    meta-analyst to compute an effect size for each
  • Characteristics of each study are described in
    sufficient detail to allow meta-analysts to
    compare characteristics of different studies and
    to judge the quality of each study.

Schulze, R. (2007). The state and the art of
meta-analysis . Zeitschrift für
Psychologie/Journal of Psychology, 215, 87-89.
The number of meta-analyses is increasing at a
rapid rate.
Where are meta-analyses done? All over the
What Disciplines publish meta-analyses? ISI 10
Feb, 2008. Topic meta-analysis Results found
, 21,286
All disciplines do meta-analyses, but very
popular in medicine
ISI 10 Feb, 2008. Topicmeta-analysis
Education Disciplines, Results found 612, Sum of
the Times Cited 12,294, Average Citations per
Item 20.09, h-index 54
The number frequency of citations are
increasing in Education
ISI 10 Feb, 2008. Topicmeta-analysis
Psychology Disciplines Results found2,345 Sum
of the Times Cited68,477 Average Citations per
Item 29.20, h-index 125
The number frequency of citations are
increasing in Psychology
Meta-analysis examples
Psychology Where it all began
  • Amato, P. R., Keith, B. (1991). Parental
    divorce and the well-being of children A
    meta-analysis . Psychological Bulletin, 110,
    26-46. Times Cited 471
  • Linn, M. C., Petersen, A. C. (1985). Emergence
    and characterization of sex differences in
    spatial ability A meta-analysis . Child
    Development, 56, 1479-1498. Times Cited 570
  • Johnson, D. W., et al (1981). Effects of
    cooperative, competitive, and individualistic
    goal structures on achievement A meta-analysis .
    Psychological Bulletin, 89, 47-62. Times Cited
  • Tett, R. P., Jackson, D. N., Rothstein, M.
    (1991). Personality measures as predictors of job
    performance A meta-analytic review . Personnel
    Psychology, 44, 703-742 Times Cited 387
  • Hyde, J. S., Linn, M. C. (1988). Gender
    differences in verbal ability A meta-analysis .
    Psychological Bulletin, 104, 53-69. Times Cited
  • Iaffaldano, M. T., Muchinsky, P. M. (1985). Job
    satisfaction and job performance A meta-analysis
    . Psychological Bulletin, 97, 251-273. Times
    Cited 263.

Education Widely Cited Meta-analyses
  • De Wolff, M., van IJzendoorn, M. H. (1997).
    Sensitivity and attachment A meta-analysis on
    parental antecedents of infant attachment . Child
    Development, 68, 571-591. Times Cited 340
  • Wellman, H. M., Cross, D., Watson, J. (2001).
    Meta-analysis of theory-of-mind development The
    truth about false belief . Child Development, 72,
    655-684. Times Cited 276
  • Cohen, E. G. (1994). Restructuring the classroom
    Conditions for productive small groups . Review
    of Educational Research, 64, 1-35. Times Cited
  • Hansen, W. B. (1992). School-based substance
    abuse prevention A review of the state of the
    art in curriculum, 1980-1990 . Health Education
    Research, 7, 403-430. Times Cited 207
  • Kulik, J. A., Kulik, C-L., Cohen, P. A. (1980).
    Effectiveness of Computer-Based College Teaching
    A Meta-Analysis of Findings. Review of
    Educational Research, 50, 525-544. Times Cited

Business/Management Widely Cited Meta-analyses
  • Sheppard, B. H., Hartwick, J., Warshaw, P. R.
    (1988). The theory of reasoned action A
    meta-analysis of past research with
    recommendations for modifications and future
    research . Journal of Consumer Research, 15,
    325-343. Times Cited 515
  • Jackson, S. E., Schuler, R. S. (1985). A
    meta-analysis and conceptual critique of research
    on role ambiguity and role conflict in work
    settings . Organizational Behavior and Human
    Decision Processes, 36, 16-78. Times Cited 401
  • Tornatzky Lg, Klein Kj. (1994). Innovation
    characteristics and innovation adoption-implementa
    tion - A meta-analysis of findings . IEEE
    Transactions On Engineering Management, 29, 28-4.
    Times Cited 269.
  • Lowe KB, Kroeck KG, Sivasubramaniam N. (1996).
    Effectiveness correlates of transformational and
    transactional leadership A meta-analytic review
    of the MLQ literature. Leadership Quarterly, 7,
    385-425. Times Cited 203.
  • Churchill GA, Ford NM, Hartley SW, et al. (1985).
    Title The determinants of salesperson
    performance - A meta-analysis . Journal Of
    Marketing Research, 22, 103-118. Times Cited

Most Widely Cited Meta-analyses are in Medicine
  • Jadad AR, Moore RA, Carroll D, et al. (1996).
    Assessing the quality of reports of randomized
    clinical trials Is blinding necessary?
    Controlled Clinical Trials, 17, 1-12. Times
    Cited 2,008
  • Boushey Cj, Beresford Saa, Omenn Gs, Et . Al.
    (1995). A quantitative assessment of plasma
    homocysteine as a risk factor for
    vascular-disease - Probable benefits of
    increasing folic-acid intakes. JAMA-journal Of
    The American Medical Assoc, 274, 1049-1057. Times
    Cited 2,128
  • Alberti W, Anderson G, Bartolucci A, et al.
    (1995). Chemotherapy in non-small-cell
    lung-cancer - A metaanalysis using updated data
    on individual patients from 52 randomized
    clinical-trials. British Medical Journal, 311,
    899-909. Times Cited 1,591
  • Block G, Patterson B, Subar A (1992). Fruit,
    vegetables, and cancer prevention - A review of
    the epidemiologic evidence. Nutrition And
    Cancer-an International Journal, 18, 1-29. Times
    Cited 1,422

Cohen, P. A. (1980). Effectiveness of
student-rating feedback for improving college
instruction A meta-analysis. Research in Higher
Education, 13, 321-341.
  • Question Does feedback from university students
    evaluations of teaching lead to improved
  • Teachers are randomly assigned to experimental
    (feedback) and control (no feedback) groups
  • Feedback group gets ratings, augmented, perhaps,
    with personal consultation
  • Groups are compared on subsequent ratings and,
    perhaps, other variables
  • Feedback teachers improved their teaching
    effectiveness by .3 standard deviations compared
    to control teachers on the Overall Rating item
    even larger differences for ratings of Instructor
    Skill, Attitude Toward Subject, Student Feedback
  • Studies that augmented feedback with consultation
    produced substantially larger differences, but
    other methodological variations had little

Hattie, J, Marsh, H. W. (1996). The
relationship between research and teaching -- a
meta-analysis. Review of Educational Research,
66, 507-542.
  • Question What is the correlation between
    university teaching effectiveness and research
  • Based on 58 studies and 498 correlations
  • The mean correlation between teaching
    effectiveness (mostly based on Students
    evaluations of teaching) and research
    productivity was almost exactly zero
  • This near-zero correlation was consistent across
    different disciplines, types of university,
    indicators of research, and components of
    teaching effectiveness.
  • This meta-analysis was followed by Marsh Hattie
    (2002) primary data study to more fully evaluate
    theoretical model

ESRC RDI One Day Meta-analysis workshop (Marsh,
OMara, Malmberg)
OMara, A. J., Marsh H. W., Craven, R. G.,
Debus, R. (2006). Do self-concept interventions
make a difference? A synergistic blend of
construct validation and meta-analysis.
Educational Psychologist, 41, 181206.
  • Contention about global self-esteem versus
    multidimensional, domain-specific self-concept
  • Traditional reviews and previous meta-analyses of
    self-concept interventions have underestimated
    effect sizes by using an implicitly
    unidimensional perspective that emphasizes global
  • We used meta-analysis and a multidimensional
    construct validation approach to evaluate the
    impact of self-concept interventions for children
    in 145 primary studies (200 interventions).
  • Overall, interventions were significantly
    effective (d .51, 460 effect sizes).
  • However, in support of the multidimensional
    perspective, interventions targeting a specific
    self-concept domain and subsequently measuring
    that domain were much more effective (d 1.16).
  • This supports a multidimensional perspective of

ESRC RDI One Day Meta-analysis workshop (Marsh,
OMara, Malmberg)
Hanson, R K., Morton-Bourgon, K. E. (2005). The
Characteristics of Persistent Sexual Offenders A
Meta-Analysis of Recidivism Studies. Journal of
Consulting Clinical Psychology, 73, 1154-1163.
  • Examined predictors of sexual, nonsexual violent,
    and general (any) recidivism
  • 82 recidivism studies
  • Identified deviant sexual preferences and
    antisocial orientation as the major predictors of
    sexual recidivism for both adult and adolescent
    sexual offenders. Antisocial orientation was the
    major predictor of violent recidivism and general
    (any) recidivism
  • Concluded that many of the variables commonly
    addressed in sex offender treatment programs
    (e.g., psychological distress, denial of sex
    crime, victim empathy, stated motivation for
    treatment) had little or no relationship with
    sexual or violent recidivism

Bazzano, L. A., Reynolds, K., Holder, K. N.,
He, J. (2006).Effect of Folic Acid
Supplementation on Risk of Cardiovascular
Diseases A Meta-analysis of Randomized
Controlled Trials. JAMA, 296, 2720-2726
  • Epidemiologic studies have suggested that folate
    intake decreases risk of cardiovascular diseases.
    However, the results of randomized controlled
    trials on dietary supplementation with folic acid
    to date have been inconsistent.
  • Included 12 studies with randomised control
  • The overall relative risks of outcomes for
    patients treated with folic acid supplementation
    compared with controls were non-significant for
    cardiovascular diseases, coronary heart disease,
    stroke, and for all-cause mortality.
  • Concluded folic acid supplementation does not
    reduce risk of cardiovascular diseases or
    all-cause mortality among participants with prior
    history of vascular disease.

Fiske, P., Rintamaki, P. T., Karvonen, E. (1998).
Mating success in lekking males a meta-analysis.
Behavioral Ecology, 9, 328-338.
  • In lekking species (those that gather for
    competitive mating), a male's mating success can
    be estimated as the number of females that he
    copulates with.
  • Aim of the study was to find predictors of
    lekking species mating success through analysis
    of 48 studies.
  • Behavioural traits such as male display activity,
    aggression rate, and lek attendance were
    positively correlated with male mating success.
    The size of "extravagant" traits, such as birds
    tails and ungulate antlers, and age were also
    positively correlated with male mating success.
  • Territory position was negatively correlated with
    male mating success, such that males with
    territories close to the geometric centre of the
    leks had higher mating success than other males.
  • Male morphology (measure of body size) and
    territory size showed small effects on male
    mating success.

Benefits and pitfalls of using meta-analysis
Benefits of meta-analysis
  • Compared to traditional literature reviews
  • (1) there is a definite methodology employed in
    the research analysis (more like that used in
    primary research) and 
  • (2) the results of the included studies are
    quantified to a standard metric thus allowing for
    statistical techniques for further analysis.
  • Therefore process of reviewing research
    literature is more objective, transparent, and
    replicable less biased and idiosyncratic to the
    whims of a particular researcher

Battle between different camps do extrinsic
rewards increase intrinsic enjoyment?
  • Cameron, J., Pierce, W. D (1994).
    Reinforcement, reward, and intrinsic motivation
    A meta-analysis. Review of Educational Research,
    64, 363-423.
  • Ryan, R., Deci, E. L. (1996). When paradigms
    clash Comments on Cameron and Pierce's claim
    that rewards do not undermine intrinsic
    motivation. Review of Educational Research, 66,
  • Cameron, J., Pierce, W. D (1996). The debate
    about rewards and intrinsic motivation Protests
    and accusations do not alter the results. Review
    of Educational Research, 66, 39-51.
  • Deci, E. L., Koestner, R., Ryan, R. (2001).
    Extrinsic rewards and intrinsic motivation in
    education reconsidered once again. Review of
    Educational Research, 71, 1-27.
  • Cameron, J. (2001). Negative effects of reward on
    intrinsic motivation a limited phenomenon
    comment on Deci, Koestner, and Ryan. Review of
    Educational Research, 71, 29-42.

ESRC RDI One Day Meta-analysis workshop (Marsh,
OMara, Malmberg)
Benefits of meta-analysis
  • Increased power by combining information from
    many individual studies, the meta-analyst is able
    to detect systematic trends not obvious in the
    individual studies.
  • Conclusions based on the set of studies are
    likely to be more accurate than any one study.
  • Improved precision based on information from
    many studies, the meta-analyst can provide a more
    precise estimate of the population effect size
    (and a confidence interval).
  • Provides potential corrections for potential
    biases, measurement error and other possible
  • Identifies directions for further primary studies
    to address unresolved issues.

Benefits of meta-analysis
  • Able to establish generalisability across many
    studies (and study characteristics).
  • Typically there is study-to-study variation in
    results. When this is the case, the meta-analyst
    can explore what characteristics of the studies
    explain these differences (e.g., study design) in
    ways not easy to do in individual studies.
  • Easy to interpret summary statistics (useful if
    communicating findings to a non-academic

Publication bias
  • Studies that are published are more likely to
    report statistically significant findings. This
    is a source of potential bias.
  • The debate about using only published studies
  • peer-reviewed studies are presumably of a higher
  • significant findings are more likely to be
    published than non-significant findings
  • There is no agreed upon solution. However, one
    should retrieve all studies that meet the
    eligibility criteria, and be explicit with how
    they dealt with publication bias. Some methods
    for dealing with publication bias have been
    developed (e.g., Fail-safe N, Trim and Fill

English language bias
  • Meta-analyses are mostly limited to studies
    published in English.
  • Juni et al. (2002) evaluated the implications of
    excluding non-English publications in
    meta-analyses of randomised clinical trials in 50
  • treatment effects were modestly larger in
    non-English publications (16).
  • However, study quality was also lower in
    non-English publications.
  • Effects were sufficiently small not to have much
    influence on treatment effect estimates, but may
    make a difference in some reviews.

ESRC RDI One Day Meta-analysis workshop (Marsh,
OMara, Malmberg)
Study quality
  • Increasingly, meta-analysts evaluate the quality
    of each study included in a meta-analysis.
  • Sometimes this is a global holistic (subjective)
    rating. In this case it is important to have
    multiple raters to establish inter-rater
    agreement (more on this later).
  • Sometimes study quality is quantified in relation
    to objective criteria of a good study, e.g.
  • larger sample sizes
  • more representative samples
  • better measures
  • use of random assignment
  • appropriate control for potential bias
  • double blinding, and
  • low attrition rates (particularly for
    longitudinal studies)

Study quality in the social sciences
  • In a meta-analysis of Social Science
    meta-analyses, Wilson Lipsey (1993) found an
    effect size of .50. They evaluated how this was
    related to study quality
  • For meta-analyses providing a global (subjective)
    rating of the quality of each study, there was no
    significant difference between high and low
    quality studies the average correlations between
    effect size and quality was almost exactly zero.
  • Almost no difference between effect sizes based
    on random- and non-random assignment (effect
    sizes slightly larger for random assignment).
  • Only study quality characteristic to make a
    difference was positively biased effects due to
    one-group pre/post design with no control group
    at all

ESRC RDI One Day Meta-analysis workshop (Marsh,
OMara, Malmberg)
Study quality in the social sciences
  • Goldring (1990) evaluated the effects of gifted
    education programs on achievement. She found a
    positive effect, but emphasised that findings
    were questionable because of weak studies
  • 21 of the 24 studies were unpublished and only
    one used random assignment.
  • Effects varied with matching procedures
  • largest effects for achievement outcomes were for
    studies in which all non-equivalent groups'
    differences controlled by only one pretest
  • Effect sizes reduced as the number of control
    variables increase and
  • disappeared altogether with random assignment.
  • Goldring (1990, p. 324) concluded policy makers
    need to be aware of the limitations of the GAT

Study quality in medicine
  • Schulz (1995) evaluated study quality in 250
    randomized clinical trials (RCTs) from 33
    meta-analyses. Poor quality studies led to
    positively biased estimates
  • lack of concealment (30-41),
  • lack of double-blind (17),
  • participants excluded after randomization (NS).
  • Moher et al. (1998) reanalysed 127 RCTs
    randomized clinical trials from 11 meta-analyses
    for study quality.
  • Low quality trials resulted in significantly
    larger effect sizes, 30-50 exaggeration in
    estimates of treatment efficacy.
  • Wood et al. (2008) evaluated study quality (1346
    RCTs from 146 meta-analyses.
  • subjective outcomes inadequate/unclear
    concealment lack of blinding resulted in
    substantial biases.
  • objective outcomes no significant effects.
  • conclusion Systematic reviewers should assess
    risk of bias.

Study quality Does it make a difference?
  • Meta-analyses should always include subjective
    and/or objective indicators of study quality.
  • In Social Sciences there is some evidence that
    studies with highly inadequate control for
    pre-existing differences leads to inflated effect
    sizes. However, it is surprising that other
    indicators of study quality make so little
  • In medical research, studies largely limited to
    RCTs where there is MUCH more control than in
    social science research. Here there is evidence
    that inadequate concealment of assignment and
    lack of double-blind inflate effect sizes, but
    perhaps only for subjective outcomes.
  • These issues are likely to be idiosyncratic to
    individual discipline areas and research

Conducting a meta-analysis
  • Defining a population of studies and finding
  • Coding materials
  • Inter-rater reliability
  • Computing effect sizes
  • Structuring a database

Steps in a meta-analysis
Establish research question
  • Comparison of treatment control groups?
  • What is the effectiveness of a reading skills
    program for treatment group compared to an
    inactive control group?
  • Pretest-posttest differences?
  • Is there a change in motivation over time?
  • What is the correlation between two variables?
  • What is the relation between teaching
    effectiveness and research productivity?
  • Moderators of an outcome?
  • Does gender moderate the effect of a
    peer-tutoring program on academic achievement?

Establish research question
  • Do you wish to generalise your findings to other
    studies not in the sample?
  • Do you have multiple outcomes per study? e.g.
  • achievement in different school subjects
  • 5 different personality scales
  • multiple criteria of success
  • Such questions determine the choice of
    meta-analytic model
  • fixed effects
  • random effects
  • multilevel

Defining a population of studies and finding
  • Need to have explicit inclusion and exclusion
  • The broader the research domain, the more
    detailed they tend to become
  • Refine criteria as you interact with the
  • Components of a detailed search criteria
  • distinguishing features
  • research respondents
  • key variables
  • research methods
  • cultural and linguistic range
  • time frame
  • publication types

Locate and collate studies
  • Search electronic databases (e.g., ISI,
    Psychological Abstracts, Expanded Academic ASAP,
    Social Sciences Index, PsycINFO, and ERIC)
  • Examine the reference lists of included studies
    to find other relevant studies
  • If including unpublished data, email researchers
    in your discipline, take advantage of Listservs,
    and search Dissertation Abstracts International

Reporting the search procedures
  • The following is one possible way to write up the
    search procedure (see LeBlanc Ritchie, 2001)
  • Electronic search strategy (e.g., PsycINFO
    Dissertation Abstracts). Provide years included
    in database
  • Keywords and limitations of the search (e.g.,
  • Additional search methods (e.g., mailing lists)
  • Exclusion criteria (e.g., must contain control
  • Yield of the searchnumber of studies found.
    Ideally should also mention how many were
    excluded from the meta-analysis and why

Search procedures
ESRC RDI One Day Meta-analysis workshop (Marsh,
OMara, Malmberg)
Locate and collate studies
  • Inclusion process usually requires several steps
    to cull inappropriate studies
  • Example from Bazzano, L. A., Reynolds, K.,
    Holder, K. N., He, J. (2006).Effect of Folic
    Acid Supplementation on Risk of Cardiovascular
    Diseases A Meta-analysis of Randomized
    Controlled Trials. JAMA, 296, 2720-2726

You can report the inclusion/exclusion process
using text rather than a flow chart, but is not
as easy to follow if it is an elaborate
process. Should report original sample
and final yield as a minimum (in this case,
original 139, final 22)
Develop code materials
Code Sheet
Code Book/manual
  • __ Study ID
  • _ _ Year of publication
  • __ Publication type (1-5)
  • __ Geographical region (1-7)
  • _ _ _ _ Total sample size
  • _ _ _ Total number of males
  • _ _ _ Total number of females

ESRC RDI One Day Meta-analysis workshop (Marsh,
OMara, Malmberg)
Mode of therapy, Duration of therapy, Participant
characteristics, Publication characteristics, Des
ign characteristics
Coding characteristics should be mentioned in the
paper. If the editor allows, a copy of the actual
coding materials can be included as an appendix
Pilot coding
  • Random selection of papers coded by both coders
    (e.g., 30 of publications are double-coded)
  • Meet to compare code sheets
  • Where there is discrepancy, discuss to reach
  • Amend code materials/definitions in code book if
  • May need to do several rounds of piloting, each
    time using different papers

Interrater reliability
  • Percent agreement Common but not recommended
  • Cohens kappa coefficient
  • Kappa is the proportion of the optimum
    improvement over chance attained by the coders,
    where a value of 1 indicates perfect agreement
    and a value of 0 indicates that agreement is no
    better than that expected by chance
  • Kappas over .40 are considered to be a moderate
    level of agreement (but no clear basis for this
  • Correlation between different raters
  • Intraclass correlation. Agreement among multiple
    raters corrected for number of raters using
    Spearman-Brown formula (r)

Exercise 1a
  • The purpose of this exercise is to explore
    various issues of meta-analytic methodology
  • Discuss in groups of 3-4 people the following
    issues in relation to the gender differences in
    smiling study (LaFrance et al., 2003)
  • Did the aims of the study justify conducting a
  • Was selection criteria and the search process
  • How did they deal with interrater (coder)

Ex. 1a discussion points
  • Extend previous meta-analyses, include previously
    untested moderators based on theory/empirical
  • Search process detailed databases and 5 other
    sources of studies, search terms. Selection
    criteria justification provided (e.g., for
    excluding under the age of 13). However, not
    clear how many studies were retrieved and then
    eventually included (compare with flow chart on
    slide 51)
  • Multiple coders (group of coders consisted of
    four people with two raters of each sex coding
    each moderator). Interrater reliability was
    calculated by taking the aggregate reliability of
    the four coders at each time using the
    SpearmanBrown formula

Effect size calculation
Effect size calculation
  • The effect size makes meta-analysis possible
  • It is based on the dependent variable (i.e.,
    the outcome)
  • It standardizes findings across studies such that
    they can be directly compared
  • Any standardized index can be an effect size
    (e.g., standardized mean difference, correlation
    coefficient, odds-ratio), but must
  • be comparable across studies (standardization)
  • represent magnitude direction of the relation
  • be independent of sample size
  • Different studies in same meta-analysis can be
    based on different statistics, but have to
    transform each to a standardized effect size that
    is comparable across different studies

Sample size, significance, effect size
Sample size, significance, effect size
ESRC RDI One Day Meta-analysis workshop (Marsh,
OMara, Malmberg)
Scatter plot of effect size and sample size
  • OMara (2004)

Effect sizes
  • Within the one meta-analysis, can include studies
    based on any combination of statistical analysis
    (e.g., t-tests, ANOVA, multiple regression,
    correlation, odds-ratio, chi-square, etc).
    However, you have to convert each of these to a
    common effect size metric.
  • Lipsey Wilson (2001) present many formulae for
    calculating effect sizes from different
    information. The art of meta-analysis is how to
    compute effect sizes based on non-standard
    designs and studies that do not supply complete
  • However, need to convert all effect sizes into a
    common metric, typically based on the natural
    metric given research in the area. E.g.
    standardized mean difference odds-ratio
    correlation, etc.

ESRC RDI One Day Meta-analysis workshop (Marsh,
OMara, Malmberg)
Effect size calculation
  • Standardized mean difference
  • Group contrast research
  • Treatment groups
  • Naturally occurring groups
  • Inherently continuous construct
  • Odds-ratio
  • Group contrast research
  • Treatment groups
  • Naturally occurring groups
  • Inherently dichotomous construct
  • Correlation coefficient
  • Association between variables research

Effect size calculation
  • Represents a standardized group contrast on an
    inherently continuous measure
  • Uses the pooled standard deviation (some
    situations use control group standard deviation)
  • Commonly called d

In an intervention study with experimental and
control groups, the effect size might be
In a gender difference study, the effect size
might be
ESRC RDI One Day Meta-analysis workshop (Marsh,
OMara, Malmberg)
Effect size calculation
Means and standard deviations
Almost all test statistics can be transformed
into an standardized effect size d
other test statistics
ESRC RDI One Day Meta-analysis workshop (Marsh,
OMara, Malmberg)
Effect size calculation using Excel
Effect size calculation
  • Represents the strength of association between
    two inherently continuous measures
  • Generally reported directly as r (the Pearson
    product moment coefficient)

Effect size calculation
  • The odds-ratio is based on a 2 by 2 contingency
  • The odds-ratio is the odds of success in the
    treatment group relative to the odds of success
    in the control group

Effect size calculation
ESRC RDI One Day Meta-analysis workshop (Marsh,
OMara, Malmberg)
r to d, d to r
Alternatively transform rs into Fishers
Zr-transformed rs, which are more normally
ESRC RDI One Day Meta-analysis workshop (Marsh,
OMara, Malmberg)
Correction for bias
  • Hedges proposed a correction for small sample
    size bias (n lt 20)
  • Must be applied before analysis

  • The effect sizes are weighted by the inverse of
    the variance to give more weight to effects based
    on large sample sizes
  • Variance is calculated as
  • The standard error of each effect size is given
    by the square root of the sampling variance
  • SE ? vi

ESRC RDI One Day Meta-analysis workshop (Marsh,
OMara, Malmberg)
Population and sample
n - size m - mean d effect size
ESRC RDI One Day Meta-analysis workshop (Marsh,
OMara, Malmberg)
Structuring a database
Constructing a database
Analytical Methods
  • Fixed effects model
  • Random effects model
  • Multilevel model

Fixed effects assumptions
  • Includes the entire population of studies to be
    considered do not want to generalise to other
    studies not included (e.g., future studies).
  • All of the variability between effect sizes is
    due to sampling error alone. Thus, the effect
    sizes are only weighted by the within-study
  • Effect sizes are independent.

Conducting fixed effects meta-analysis
  • There are 2 general ways of conducting a fixed
    effects meta-analysis ANOVA multiple
  • The analogue to the ANOVA homogeneity analysis is
    appropriate for categorical variables
  • Looks for systematic differences between groups
    of responses within a variable
  • Multiple regression homogeneity analysis is more
    appropriate for continuous variables and/or when
    there are multiple variables to be analysed
  • Tests the ability of groups within each variable
    to predict the effect size
  • Can include categorical variables in multiple
    regression as dummy variables. (ANOVA is a
    special case of multiple regression)

Q-test of the homogeneity of variance
The homogeneity (Q) test asks whether the
different effect sizes are likely to have all
come from the same population (an assumption of
the fixed effects model). Are the differences
among the effect sizes no bigger than might be
expected by chance?
effect size for each study (i 1 to k)
mean effect size a weight for each study
based on the sample size However, this
(chi-square) test is heavily dependent on sample
size. It is almost always significant unless the
numbers (studies and people in each study) are
VERY small. This means that the fixed effect
model will almost always be rejected in favour of
a random effects model.

Fixed effects mean effect size
Run MATRIX procedure Meta-Analytic
Results ------- Distribution Description
--------------------------------- N
Min ES Max ES Wghtd SD 15.000
.050 1.200 .315 ------- Fixed
Random Effects Model -----------------------------
Mean ES -95CI 95CI SE
Z P Fixed .4312 .3383
.5241 .0474 9.0980 .0000 Random
.3963 .2218 .5709 .0890 4.4506
.0000 ------- Random Effects Variance Component
------------------------ v
.074895 ------- Homogeneity Analysis
Q df p 44.1469
14.0000 .0001 Random effects v estimated
via noniterative method of moments. ------ END
MATRIX -----
ESRC RDI One Day Meta-analysis workshop (Marsh,
OMara, Malmberg)
Modelling moderators
  • Model moderators by grouping effect sizes that
    are similar on a specific characteristic
  • For example, group all effect size outcomes that
    come from studies using a placebo control group
    design and compare with effect sizes from studies
    using a waitlist control group design
  • So in this example, Design is a dichotomous
    variable with the values 0 placebo control and
    1 waitlist control

Exp. cond
Example fixed effects study
  • On the next slide, we will look at the outcomes
    of a study to show the importance of various
    moderator variables
  • Do Psychosocial and Study Skill Factors Predict
    College Outcomes? A Meta-Analysis
  • Robbins, Lauver, Le, Davis, Langley, Carlstrom
    (2004). Psychological Bulletin, 130, 261288
  • Aim
  • To examine the relationship between psychosocial
    and study skill factors (PSFs) and college
    retention by meta-analyzing 109 studies

Fixed effects output
N sample size for that variable k number of
correlation coefficients on which each
distribution was based r mean observed
correlation CIr 10 lower bound of the
confidence interval for observed r CIr 90
upper bound of the confidence interval for
observed r
Regression output example
  • Target self-concept domains are those that are
    directly relevant to the intervention
  • Target-related are those that are logically
    relevant to the intervention, but not focal
  • Non-target are domains that are not expected to
    be enhanced by the intervention

Regression Coefficients and their standard
errors B SE
Sig? Target .4892 .0552 yes
Target-related .1097 .0587
no Non-target .0805 .0489 no From
OMara, Marsh, Craven, Debus (2006)
Random effects assumptions
  • Is only a sample of studies from the entire
    population of studies to be considered. As a
    result, do want to generalise to other studies
    not included in the sample (e.g., future
  • Variability between effect sizes is due to
    sampling error plus variability in the population
    of effects.
  • Effect sizes are independent.

Random effects models
  • If the homogeneity test is rejected (it almost
    always will be), it suggests that there are
    larger differences than can be explained by
    chance variation (at the individual participant
    level). There is more than one population in
    the set of different studies.
  • Now we turn to the random effects model to
    determine how much of this between-study
    variation can be explained by study
    characteristics that we have coded.
  • The total variance associated with the effect
    sizes has two components, one associated with
    differences within each study (participant level
    variation) and one between study variance

Weighting in random effects models
  • The random error variance component is added to
    the variance calculated earlier
  • This means that the weighting for each effect
    size consists of the within-study variance (vi)
    and between-study variance (v?)
  • The new weighting for the random effects model
    (wiRE) is given by the formula

Example random effects study
  • Do Self-Concept Interventions Make a Difference?
    A Synergistic Blend of Construct Validation and
  • OMara, Marsh, Craven, Debus. (2006).
    Educational Psychologist, 41, 181206
  • Aim
  • To examine what factors moderate the
    effectiveness of self-concept interventions by
    meta-analyzing 200 interventions

Example random effects results homogeneity
  • QB between group homogeneity. If the QB value
    is significant, then the groups (categories) are
    significantly different from each other
  • QW within group homogeneity. If QW is
    significant, then the effect sizes within a group
    (category) differ significantly from each other

Random effects mean effect size
Run MATRIX procedure Meta-Analytic
Results ------- Distribution Description
--------------------------------- N
Min ES Max ES Wghtd SD 15.000
.050 1.200 .315 ------- Fixed
Random Effects Model -----------------------------
Mean ES -95CI 95CI SE
Z P Fixed .4312 .3383
.5241 .0474 9.0980 .0000 Random
.3963 .2218 .5709 .0890 4.4506
.0000 ------- Random Effects Variance Component
------------------------ v
.074895 ------- Homogeneity Analysis
Q df p 44.1469
14.0000 .0001 Random effects v estimated
via noniterative method of moments. ------ END
MATRIX -----
Multilevel modelling assumptions
  • Meta-analytic data is inherently hierarchical
    (i.e., effect sizes nested within studies) and
    has random error that must be accounted for
  • Effect sizes are not necessarily independent
  • Allows for multiple effect sizes per study

Multilevel modelling
  • New technique that is still being developed
  • Provides more precise and less biased estimates
    of between-study variance than traditional

Multilevel model structure example
  • Level 1 outcome-level component
  • Effect sizes
  • Level 2 study component
  • Publications

Conducting multilevel model analyses
  • Intercept-only model, which incorporates both the
    outcome-level and the study-level components
    (similar to a random effects model)
  • Expand model to include predictor variables, to
    explain systematic variance between the study
    effect sizes

Example multilevel model
  • Acute Stressors and Cortisol Responses A
    Theoretical Integration and Synthesis of
    Laboratory Research
  • Dickerson Kemeny (2004). Psychological
    Bulletin, 130, 355391.
  • Aim
  • To examine methodological predictors of cortisol
    responses in a meta-analysis of 208 laboratory
    studies of acute psychological stressors

Example multilevel results
  • Only 2 variables significant (Quad Time between
    stress onset assessment Time of day). The
    quadratic component is difficult to interpret as
    an unstandardized regression coefficient, but the
    graph suggests it is meaningfully large

Model selection
  • Fixed, random, or multilevel?
  • Generally, if more than one effect size per study
    is included in sample, multilevel should be used
  • However, if there is little variation at study
    level, the results of multilevel modelling
    meta-analyses are similar to random effects

Model selection
  • Do you wish to generalise your findings to other
    studies not in the sample?
  • Do you have multiple outcomes per study?

Exercise 1b
  • The purpose of this exercise is to consider
    choice of meta-analytic method
  • Discuss in groups of 3-4 people the question in
    relation to the gender differences in smiling
    study (LaFrance et al., 2003)
  • Is there independence of effect sizes? What are
    the implications for model choice (fixed, random,

Supplementary analyses publication bias
  • Fail-safe N
  • Power analysis
  • Trim-and-fill method

Dealing with publication bias
  • The fail-safe N (Rosenthal, 1991) determines the
    number of studies with an effect size of zero
    needed to lower the observed effect size to a
    specified (criterion) level.
  • For example, assume that you want to test the
    assumption that an effect size is at least .20.
  • If the observed effect size was .26 and the
    fail-safe N was found to be 44, this means that
    44 unpublished studies with a mean effect size of
    zero would need to be included in the sample to
    reduce the observed effect size of .26 to .20.

Dealing with publication bias
  • Power is a term used to describe the probability
    of a statistical test committing Type II error.
    That is, it indicates the likelihood that the
    test has failed to reject the null hypothesis,
    which implicitly suggests that there is no effect
    when in reality there is.
  • Power, sample size, significance level, and
    effect size are inter-related.
  • A lower powered study has to exhibit a much
    larger effect size to produce a significant
    finding. This has ramifications for publication
  • Muncer, Craigie, Holmes (2003) recommend
    conducting a power analysis on all studies
    included in the meta-analysis
  • Compare the observed value (d) against a
    theoretical value (includes information about
    sample size)

Dealing with publication bias
  • Trim and fill procedure (Duval Tweedie, 2000a,
    2000b) calculates the effect of potential data
    censoring (including publication bias) on the
    outcome of the meta-analyses.
  • Nonparametric, iterative technique examines the
    symmetry of effect sizes plotted by the inverse
    of the standard error. Ideally, the effect sizes
    should mirror on either side of the mean.

  • Examining the methods and output of published

Exercise 1c
  • Discuss in groups of 3-4 people the following
    question in relation to the gender differences in
    smiling study (LaFrance et al., 2003)
  • How did they deal with publication bias? Does
    this seem appropriate?

Exercise 2
  • The purpose of this exercise is to practice
    reading meta-analytic results tables.
  • This study, by Reger et al. (2004), examines the
    relationship between neuropsychological
    functioning and driving ability in dementia.
  • In Table 3, which variables are homogeneous for
    the on-road tests driving measure in the All
    Studies column? What does this tell you about
    those variables?
  • In Table 4, look at the variables that were
    homogeneous in question (1) for the on-road
    tests using All Studies. Which variables have
    a significant mean ES? Which variable has the
    largest mean ES?

Exercise 2 Answers
  • Homogeneous variables (non-significant Q-values)
    Mental statusgeneral cognition, Visuospatial
    skills, Memory, Executive functions, Language
  • All of the relevant mean effect sizes are
    significant. Memory and language are tied as the
    largest mean ESs for homogeneous variables (r

  • We established what meta-analysis is, when and
    why we use meta-analysis, and the benefits and
    pitfalls of using meta-analysis
  • Summarised how to conduct a meta-analysis
  • Provided a conceptual introduction to analysis
    and interpretation of results based on fixed
    effects, random effects, and multilevel models
  • Applied this information to examining the methods
    of a published meta-analysis

  • Comparing apples and oranges
  • Quality of the studies included in the
  • What to do when studies dont report sufficient
    information (e.g., non-significant findings)?
  • Including multiple outcomes in the analysis
    (e.g., different achievement scores)
  • Publication bias

Future directions
  • With meta-analysis now one of the most popularly
    published research methods, it is an exciting
    time to be involved in meta-analytic research
  • The hottest topics in meta-analysis are
  • Multilevel modelling to address the issue of
    independence of effect sizes
  • New methods in publication bias assessment
    (Trim-and-fill method, post hoc power analysis)
  • Also receiving attention
  • Establishing guidelines for conducting
    meta-analysis (best practice)
  • Meta-analyses of meta-analyses

  • Purpose-built
  • Comprehensive Meta-analysis (commercial)
  • Schwarzer (free, http//
  • Extensions to standard statistics packages
  • SPSS, Stata and SAS macros, downloadable from
  • Stata add-ons, downloadable from
  • HLM V-known routine
  • MLwiN
  • Mplus
  • Please note that we do not advocate any one
    programme over another, and cannot guarantee the
    quality of all of the products downloadable from
    the internet. This list is not exhaustive.

Key reference books
  • Cooper, H., Hedges, L. V. (Eds.) (1994). The
    handbook of research synthesis (pp. 521529). New
    York Russell Sage Foundation.
  • Hox, J. (2003). Applied multilevel analysis.
    Amsterdam TT Publishers.
  • Hunter, J. E., Schmidt, F. L. (1990). Methods
    of meta-analysis Correcting error and bias in
    research findings. Newbury Park Sage
  • Lipsey, M. W., Wilson, D. B. (2001). Practical
    meta-analysis. Thousand Oaks, CA Sage

More information
  • Pick up a brochure about our intermediate and
    advanced meta-analysis courses
  • Visit our website http//