Help! My mentor gave me data and asked me to analyze it - PowerPoint PPT Presentation

1 / 26
About This Presentation
Title:

Help! My mentor gave me data and asked me to analyze it

Description:

Help! My mentor gave me data and asked me to analyze it . Pathways to Careers in Clinical and Translational Research (PACCTR) Curriculum Core – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 27
Provided by: accelerat1
Category:

less

Transcript and Presenter's Notes

Title: Help! My mentor gave me data and asked me to analyze it


1
Help! My mentor gave me data and asked me to
analyze it.
  • Pathways to Careers in Clinical and Translational
    Research (PACCTR) Curriculum Core

2
Help! My mentor gave me data and asked me to
analyze it..
  • Very common project for a mentor to give to a
    student
  • BUT, may not be an appropriate project if you
    dont have experience in statistical analysis or
    access to a statistician
  • Also known as secondary data analysis

3
Secondary Data Analysis
  • Often there is extra data left over that mentor
    thinks could be interesting.
  • Example An RCT has been completed (and
    published) examining the efficacy of a medication
    for some disease. At baseline, subjects were
    asked a lot of questions about quality of life
    and sexual function. The researcher wants you to
    analyze this data.

4
Step 1 Define what you have
  • Sometimes this is not clear
  • Get a list of variables, questionnaire/ survey,
    data abstraction instrument
  • Read the research protocol
  • Read any papers/posters already published

5
Step 2 Research Q
  • Always start with a research question.
  • Ask your mentor what the research question is.
  • If there isnt a clear research question, proceed
    with caution. There may not be an interesting
    project here.

6
Research Q Example
  • Possible RQs from RCT example
  • 1. How does QOL change in treated vs placebo
    group? (An RCT)
  • 2. What is the QOL and sexual functioning of
    people with this disease at baseline (ie before
    treatment). (A descriptive study, ie
    cross-sectional study)
  • 3. What are the determinants of low QOL in people
    with this disease? (similar to above but you
    determine if certain groups have lower QOL than
    others eg by race, education, comorbidities etc.
    This could be done with multivariate analysis.)

7
Step 2 Novel?
  • Is this novel?
  • Again-often there is left over data that
    researcher thinks might be interesting.
  • Your job is to figure out if it would be
    interesting!
  • Do a lit search to see whats been done, talk to
    clinicians to see if it is interesting.
  • If not, consider choosing a different project.

8
Step 3 initial data analysis
  • If there is a research question and it is
    interesting, proceed with initial data analysis.
  • Type of analysis depends on type of data
  • Continuous outcomes means, t-tests, linear
    regression
  • Dichotomous or categorical outcomes proportions
    (s), chi square tests, logistic regression

9
Step 3 initial data analysis
  • Do you have a programmer or statistician?
  • NO. If you dont have data analysis experience,
    consider a different project unless you have a
    lot of time to teach yourself or take a class.
    See Hulleys Designing Clinical Research
  • Yesyou have a programmer or statistician to help
    you
  • your job is to communicate with him/her in order
    to get the info you need. Ask for
  • list of variables
  • list of means and proportions for the variables
    you are interested in
  • Compilation of cross-tabs and/or t-tests for
    selected variables to see if there are
    differences between groups. (see next slide)

10
Cross-tabs?
Low QOL High QOL Total
Male 100 69.44 67.11 44 30.56 31.88 144 100.00 50.17
Female 49 34.27 32.89 94 65.73 68.12 143 100.00 49.83
Total 149 51.92 100.00 138 48.08 100.00 287 100.00 100.00
  • Cross-tabs are a short hand way of saying chi
    square tests (or Fischer exact test)
  • If you ask for sex by high vs low QOL, you would
    get

Fisher's exact 0.000
1-sided Fisher's exact 0.000
Risk ratio 2.026644 1.575926
2.606269
11
How to interpret?
  • There are 287 total with about half male (144)
    and half female (143)

Fisher's exact 0.001
1-sided Fisher's exact 0.001
Risk ratio 2.026644 1.575926
2.606269
12
How to interpret?
  • There are 287 total with about half male (144)
    and half female (143)
  • Men are more likely to have low QOL (100 of 144
    or 69.44) than women (49 of 143 or 34.27)

Fisher's exact 0.001
1-sided Fisher's exact 0.001
Risk ratio 2.026644 1.575926
2.606269
13
How to interpret?
  • There are 287 total with about half male (144)
    and half female (143)
  • Men are more likely to have low QOL (100 of 144
    or 69.44) than women (49 of 143 or 34.27)
  • This difference is significant with a p0.001

Fisher's exact 0.001
1-sided Fisher's exact 0.001
Risk ratio 2.026644 1.575926
2.606269
14
How to interpret?
Risk ratio 2.026644 1.575926
2.606269
  • Sometimes the output will instead come to you as
    a risk ratio (relative risk or odds ratio)
  • Interpretation Men are 2 fold more likely to
    have low QOL (RR2.02)
  • This difference is significant because 95
    confidence interval does not include 1.0 (ie
    1/57-2.61)

Fisher's exact 0.001
1-sided Fisher's exact 0.001
15
What about t-tests?
  • If you asked for BMI vs High/Low QOL you would
    get this
  • Two-sample t test with equal variances
  • --------------------------------------------------
    ----------------------------
  • Group Obs Mean Std. Err.
    Std. Dev. 95 Conf. Interval
  • -------------------------------------------------
    ----------------------------
  • Low QOL 64 24.74505 .8713092
    6.970474 23.00388 26.48622
  • High QOL 62 25.43989 1.0174
    8.011014 23.40548 27.47431
  • -------------------------------------------------
    ----------------------------
  • combined 126 25.08696 .6662367
    7.478489 23.76839 26.40552
  • -------------------------------------------------
    ----------------------------
  • diff -.6948402
    1.336548 -3.340244
    1.950563
  • --------------------------------------------------
    ----------------------------
  • Degrees of freedom 124
  • Ho mean(Placebo) - mean(Digoxin)
    diff 0
  • Ha diff lt 0 Ha diff 0
    Ha diff gt 0
  • t -0.5199 t -0.5199
    t -0.5199
  • P lt t 0.3020 P gt t 0.6041
    P gt t 0.6980

16
How to interpret?
  • Two-sample t test with equal variances
  • --------------------------------------------------
    ----------------------------
  • Group Obs Mean Std. Err.
    Std. Dev. 95 Conf. Interval
  • -------------------------------------------------
    ----------------------------
  • Low QOL 64 24.74505 .8713092
    6.970474 23.00388 26.48622
  • High QOL 62 25.43989 1.0174
    8.011014 23.40548 27.47431
  • -------------------------------------------------
    ----------------------------
  • combined 126 25.08696 .6662367
    7.478489 23.76839 26.40552
  • -------------------------------------------------
    ----------------------------
  • diff -.6948402
    1.336548 -3.340244
    1.950563
  • --------------------------------------------------
    ----------------------------
  • Degrees of freedom 124
  • Ho mean(Placebo) - mean(Digoxin)
    diff 0
  • Ha diff lt 0 Ha diff 0
    Ha diff gt 0
  • t -0.5199 t -0.5199
    t -0.5199
  • P lt t 0.3020 P gt t 0.6041
    P gt t 0.6980
  • The low QOL subjects (n64) have a mean BMI of
    24.7 with a std dev of 6.9 and a 95 CI of 23.0
    to 26.5

17
How to interpret?
  • Two-sample t test with equal variances
  • --------------------------------------------------
    ----------------------------
  • Group Obs Mean Std. Err.
    Std. Dev. 95 Conf. Interval
  • -------------------------------------------------
    ----------------------------
  • Low QOL 64 24.74505 .8713092
    6.970474 23.00388 26.48622
  • High QOL 62 25.43989 1.0174
    8.011014 23.40548 27.47431
  • -------------------------------------------------
    ----------------------------
  • combined 126 15.08696 .6662367
    7.478489 23.76839 26.40552
  • -------------------------------------------------
    ----------------------------
  • diff -.6948402
    1.336548 -3.340244
    1.950563
  • --------------------------------------------------
    ----------------------------
  • Degrees of freedom 124
  • Ho mean(Placebo) - mean(Digoxin)
    diff 0
  • Ha diff lt 0 Ha diff 0
    Ha diff gt 0
  • t -0.5199 t -0.5199
    t -0.5199
  • P lt t 0.3020 P gt t 0.6041
    P gt t 0.6980
  • The low QOL subjects (n64) have a mean BMI of
    24.7 with a std dev of 6.9 and a 95 CI of 23.0
    to 26.5
  • The high QOL subjects have a mean BMI of 25.4

18
How to interpret?
  • Two-sample t test with equal variances
  • --------------------------------------------------
    ----------------------------
  • Group Obs Mean Std. Err.
    Std. Dev. 95 Conf. Interval
  • -------------------------------------------------
    ----------------------------
  • Low QOL 64 24.74505 .8713092
    6.970474 23.00388 26.48622
  • High QOL 62 25.43989 1.0174
    8.011014 23.40548 27.47431
  • -------------------------------------------------
    ----------------------------
  • combined 126 25.08696 .6662367
    7.478489 23.76839 26.40552
  • -------------------------------------------------
    ----------------------------
  • diff -.6948402
    1.336548 -3.340244
    1.950563
  • --------------------------------------------------
    ----------------------------
  • Degrees of freedom 124
  • Ho mean(Placebo) - mean(Digoxin)
    diff 0
  • Ha diff lt 0 Ha diff 0
    Ha diff gt 0
  • t -0.5199 t -0.5199
    t -0.5199
  • P lt t 0.3020 P gt t 0.6041
    P gt t 0.6980
  • The low QOL subjects (n64) have a mean BMI of
    24.7 with a std dev of 6.9 and a 95 CI of 23.0
    to 26.5
  • The high QOL subjects have a mean BMI of 25.4
  • Is this significantly different? Nolook at
    middLe column, p0.6041

19
What about multivariate analysis?
  • Predictors of Low QOL (Low QOL is the outcome so
    this is a logistic regression b/c it is a
    dichotomous outcome)
  • Choose variables to place in your model. Choice
    depends on both biologic plausibility and on
    results of the bivariate analysis (the cross-tabs
    and t-tests you did above)

20
Model selection, multivariate analysis
  • You may choose to put all variables in the model
    that were significant in bivariate analysis at a
    p of lt0.10 (usually you choose p0.10 to 0.20 b/c
    if you limit it to lt0.05 you may miss some
    variables that become significant in a
    multivariate model due to confounding by other
    variables)
  • And, even if not significant in the bivariate
    model, you may choose to include variables that
    you think are important biologically or b/c
    others have reported an association (eg co-morbid
    conditions)

21
Results Multivariate analysis
You ask for the model to be run and get this
  • . xi logistic lowqol i.trirace i.agecat2 male
    private q33job lesshs married
  • Logit Estimates
    Number of obs 371

  • chi2(16) 79.29

  • Prob gt chi2 0.0000
  • Log Likelihood -202.4476
    Pseudo R2 0.1638
  • --------------------------------------------------
    ----------------------------
  • lowqol Odds Ratio Std. Err. z Pgtz
    95 Conf. Interval
  • -------------------------------------------------
    ----------------------------
  • Itrira_1 .9543597 .3067677 -0.145
    0.884 .5082804 1.791929
  • Itrira_2 .404713 .1310575 -2.793
    0.005 .2145379 .763467
  • Iageca 1 2.149653 .749715 2.194
    0.028 1.085182 4.25828
  • Iageca_2 2.007573 .6533771 2.141
    0.032 1.060822 3.79927
  • male 2.227047 .9420758 1.893
    0.058 .9719808 5.102711
  • private 1.085656 .8550493 0.104
    0.917 .2318977 5.082625
  • q33job .8852046 .2355718 -0.458
    0.647 .5254371 1.491305
  • lesshs .8078212 .2238751 -0.770
    0.441 .4692648 1.390633
  • married .9584556 .268145 -0.152
    0.879 .5539024 1.658482
  • --------------------------------------------------
    ----------------------------

22
Interpretation?
Outcome variable low QOL
  • . xi logistic lowqol i.trirace i.agecat2 male
    private q33job lesshs married
  • Logit Estimates
    Number of obs 371

  • chi2(16) 79.29

  • Prob gt chi2 0.0000
  • Log Likelihood -202.4476
    Pseudo R2 0.1638
  • --------------------------------------------------
    ----------------------------
  • lowqol Odds Ratio Std. Err. z
    Pgtz 95 Conf. Interval
  • -------------------------------------------------
    ----------------------------
  • Itrira_1 .9543597 .3067677 -0.145
    0.884 .5082804 1.791929
  • Itrira_2 .404713 .1310575 -2.793
    0.005 .2145379 .763467
  • Iageca 1 2.149653 .749715 2.194
    0.028 1.085182 4.25828
  • Iageca_2 2.007573 .6533771 2.141
    0.032 1.060822 3.79927
  • male 2.227047 .9420758 1.893
    0.058 .9719808 5.102711
  • private 1.085656 .8550493 0.104
    0.917 .2318977 5.082625
  • q33job .8852046 .2355718 -0.458
    0.647 .5254371 1.491305
  • lesshs .8078212 .2238751 -0.770
    0.441 .4692648 1.390633
  • married .9584556 .268145 -0.152
    0.879 .5539024 1.658482
  • --------------------------------------------------
    ----------------------------

Variables in model Race (3 categories,
refwhite), Age (3 categories, reflt30), Male (vs
female), Private insurance (vs Medicaid),
Employed (vs unemployed), Education lt high school
(vs more), Married (vs unmarried).
Note that BMI is not in the model b/c it wasnt
significant in bivariate analysis (t-test)
23
Interpretation?
Outcome variable low QOL
  • . xi logistic lowqol i.trirace i.agecat2 k20
    private q33job lesshs married
  • Logit Estimates
    Number of obs 371

  • chi2(16) 79.29

  • Prob gt chi2 0.0000
  • Log Likelihood -202.4476
    Pseudo R2 0.1638
  • --------------------------------------------------
    ----------------------------
  • lowqol Odds Ratio Std. Err. z
    Pgtz 95 Conf. Interval
  • -------------------------------------------------
    ----------------------------
  • Itrira_1 .9543597 .3067677 -0.145
    0.884 .5082804 1.791929
  • Itrira_2 .404713 .1310575 -2.793
    0.005 .2145379 .763467
  • Iageca 1 2.149653 .749715 2.194
    0.028 1.085182 4.25828
  • Iageca_2 2.007573 .6533771 2.141
    0.032 1.060822 3.79927
  • male 2.227047 .9420758 1.893
    0.058 .9719808 5.102711
  • private 1.085656 .8550493 0.104
    0.917 .2318977 5.082625
  • q33job .8852046 .2355718 -0.458
    0.647 .5254371 1.491305
  • lesshs .8078212 .2238751 -0.770
    0.441 .4692648 1.390633
  • married .9584556 .268145 -0.152
    0.879 .5539024 1.658482
  • --------------------------------------------------
    ----------------------------

Look at the P column to see which variables are
significantly associated with low QOL after
adjustment for other variables in the model
Odds ratios gt 1.0 indicate a higher risk of low
QOL, odds ratios lt1.0 indicate a lower risk of
low QOL.
24
Interpretation?
  • . xi logistic lowqol i.trirace i.agecat2 k20
    private q33job lesshs married
  • Logit Estimates
    Number of obs 371

  • chi2(16) 79.29

  • Prob gt chi2 0.0000
  • Log Likelihood -202.4476
    Pseudo R2 0.1638
  • --------------------------------------------------
    ----------------------------
  • lowqol Odds Ratio Std. Err. z
    Pgtz 95 Conf. Interval
  • -------------------------------------------------
    ----------------------------
  • Itrira_1 .9543597 .3067677 -0.145
    0.884 .5082804 1.791929
  • Itrira_2 .404713 .1310575 -2.793
    0.005 .2145379 .763467
  • Iageca 1 2.149653 .749715 2.194
    0.028 1.085182 4.25828
  • Iageca_2 2.007573 .6533771 2.141
    0.032 1.060822 3.79927
  • male 2.227047 .9420758 1.893
    0.058 .9719808 5.102711
  • private 1.085656 .8550493 0.104
    0.917 .2318977 5.082625
  • q33job .8852046 .2355718 -0.458
    0.647 .5254371 1.491305
  • lesshs .8078212 .2238751 -0.770
    0.441 .4692648 1.390633
  • married .9584556 .268145 -0.152
    0.879 .5539024 1.658482
  • --------------------------------------------------
    ----------------------------

Variables associated with increased risk of low
QOL 1. age 40-50 2 fold increase risk 2. Age
gt50 2 fold increase risk 3. Male has trend
toward significance with p0.06. Variables
associated with decreased risk low QOL 1. Asian
(category 2) 60 decrease risk All other
variables no longer significantly associated with
outcome
Odds ratios gt 1.0 indicate a higher risk of low
QOL, odds ratios lt1.0 indicate a lower risk of
low QOL.
25
Summary Data analysis
  • Clearly define the research question and ensure
    it is novel
  • Understand the data get variable list, read
    questionnaire, read research proposal and already
    published posters/papers
  • Preliminary analysisbivariate (t-test, chi
    square)
  • Advanced analysis multivariate

26
PACCTR Curriculum Core
  • Rebecca Jackson MD, School of Medicine
  • Roberta Oka RN, ANP, DNSc, School of Nursing
  • George Sawaya MD, School of Medicine
  • Susan Hyde DDS, MPH, PhD, School of Dentistry
  • Jennifer Cocohoba PharmD, School of Pharmacy
  • Joel Palefsky MD, School of Medicine

Pathways to Careers in Clinical and
Translational Research
Write a Comment
User Comments (0)
About PowerShow.com