Design and Analysis of Clinical Study 9. Analysis of Cross-sectional Study - PowerPoint PPT Presentation

About This Presentation
Title:

Design and Analysis of Clinical Study 9. Analysis of Cross-sectional Study

Description:

An Example of Calculation of Prevalence ... BMI and Sexual Attractiveness. bmi - c(11.0, 12.0, 12.5, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, ... – PowerPoint PPT presentation

Number of Views:294
Avg rating:3.0/5.0
Slides: 37
Provided by: sws049
Category:

less

Transcript and Presenter's Notes

Title: Design and Analysis of Clinical Study 9. Analysis of Cross-sectional Study


1
Design and Analysis of Clinical Study 9.
Analysis of Cross-sectional Study
  • Dr. Tuan V. Nguyen
  • Garvan Institute of Medical Research
  • Sydney, Australia

2
Overview
  • Estimate of prevalence
  • Analysis of difference between two proportions
  • Analysis of difference among proportions
    Chi-square
  • Analysis of difference between two means
  • Analysis of association I simple linear
    regression analysis
  • Analysis of association II multiple regression
    analysis

3
Prevalence of Disease
  • Prevalence is NOT incidence
  • Measures the no. of people in a population who
    have the disease at a given point in time.
  • this measure has been called point prevalence, in
    contrast to period prevalence, infrequently used,
    which sums cases existing at the start of a time
    period to new cases that occur during the time
    period
  • A measure of disease status, disease burden
  • in contrast to incidence which measures disease
    onset events

4
1
2
3
4
5
T
Time
Prevalence At time T, 2 out of 5 subjects had
the disease P 2/5 0.4
5
Sampling Variability in Prevalence
  • Prevalence in the population (p) is UNKNOWN
  • Sample prevalence (p) is an unbiased estimate of
    p.
  • x number of diseased individuals in the sample
  • p prevalence
  • N sample size
  • Estimates
  • p x/N
  • variance of p
  • standard error of p
  • 95 CI of p

6
An Example of Calculation of Prevalence
  • The prevalence of ABO hemolytic disease in a
    population is 43 out of 3584 subjects.
  • So, the estimated prevalence
  • p 43/3584 0.0125
  • Standard error of the prevalence
  • 95 confidence interval
  • 0.0125(1.96 x 0.002) 0.009 to 0.016.

7
Test for Difference Between Two Proportions
Vietnam Australia
N 700 1287
Osteoporosis 148 345
Prevalence 0.211 0.268
Variance (s2) 0.000238 0.000152
  • p1 proportion for group 1
  • p2 proportion for group 2
  • N1 sample size for group 1
  • N2 sample size for group 2
  • d p1 p2
  • variance of d
  • z-test

d 0.268 0.211 0.057 variance of d s2
0.000238 0.000152 0.000391 z-test z
0.057 / sqrt(0.00391) 2.87 Significant!
8
Test for Difference Among Proportions
Caffeine consumption 1- 151- 300- None 150 300
900 Total _______________________________________
_____ Marital status Married 652 1537 598 242 3029
Divorced 36 46 38 21 141 Single 218 327 106 67 71
8 Total 906 1910 742 330 3888
652/30290.22 1537/30290.51
598/30290.20 242/30290.08 36/1410.26
46/1410.33 38/1410.27
21/1410.15 218/7180.30 327/7180.46
106/7180.15 67/7180.09 906/38880.23
1910/38880.49 742/3888-.19
330/38880.08
In percent (row) Married 0.22 0.51 0.20 0.08 100 D
ivorced 0.26 0.33 0.27 0.15 100 Single
0.30 0.46 0.15 0.09 100 Total 0.23 0.49 0.19 0.08
100
9
Test for Difference Among Proportions
Caffeine consumption 1- 151- 300- None 150 300
900 Total _______________________________________
________ Marital status Married 652 1537 598 242 3
029 Divorced 36 46 38 21 141 Single 218 327 106 67
718 Total 906 1910 742 330 3888
3029/3888906705.8 3029/388819101488
3029/3888742578.1 3029/3888330257.1 14
1/388890632.9 141/3888191069.3
141/388874226.9
141/388833012.0 718/3888906167.3
718/38881910352.7 718/3888742137.0
718/388833060.9
Caffeine consumption 1- 151- 300- None 150 300
900 Total _______________________________________
________ Expected freq. Married 705.8 1488 578.1 2
57.1 3029 Divorced 32.9 69.3 26.9 12.0 141 Single
167.3 352.7 137.0 60.9 718 Total 906 1910 742 330
3888
10
Test for Difference Among Proportions
Caffeine consumption 1- 151- 300- None 150 300
900 ____________________________________________
___ Marital status Married 652 1537 598 242 O (70
5.8) (1488) (578.1) (257.1) E Divorced 36 46 38 2
1 O (32.9) (69.3) (26.9) (12.0) E Single 218 327
106 67 O (167.3) (352.7) (137.0) (60.9) E
(652-705.8)2 / 705.8 4.11 (1537 1488)2 / 1488
1.61 .
Chisq 51.6 df 3x26 X2 1.63 for a0.05
(O - E)2/E Married 4.11 1.61 0.69 0.89 7.30 Divorc
ed 0.30 7.82 4.57 6.82 19.51 Single
15.30 1.88 7.02 0.60 24.86 Total 19.77 11.31 12.28
8.31 51.66
11
Normal Distribution
Phân ph?i chi?u cao ? ph? n? Vi?t Nam v?i trung
bình 156 cm và d? l?ch chu?n 4.6 cm. Tr?c hoành
là chi?u cao và tr?c tung là xác su?t cho m?i
chi?u cao.
12
Application of the Normal Distribution
  • The serum cholesterol levels of Californian
    children have a mean of 175 mg/100ml and a
    standard deviation of 30 mg/100ml. The
    distribution of the cholesterol levels is normal.
  • 95 of the children should have cholesterol
    levels ranged between 175 (1.96x30) 116 and
    234 mg/100ml.
  • If we let X be the chol. level for any child,
    then X can be converted to a variable with mean0
    and SD1
  • Z (X 175) / 30

mg/100l
116
234
175
Z
-1.96
1.96
0
Abnormal?
Abnormal?
13
Two-group comparison unpaired t-test
Mean difference D x1 x2 Variance of D
Group 1 Group2 x11
x21 x12 x22 x13 x23 x14 x24
x15 x25 x1n x2n Sample size n1 n2
Mean x1 x2 SD s1 s2
T-statistic
95 Confidence interval
14
Two-group comparison an example
A B 100 122 108 130 119 138 127 142 132
152 135 154 136 176 164 N 8 7 Mean 127.
6 144.9 SD 19.6 17.8
Mean difference d 127.6 144.9
-17.3 Variance of D
T-statistic
95 Confidence interval
15
Analysis of Correlation
ID Age Chol (mg/ml) 1 46 3.5 2 20 1.9 3 52 4.0
4 30 2.6 5 57 4.5 6 25 3.0 7 28 2.9 8 36 3.8 9 22
2.1 10 43 3.8 11 57 4.1 12 33 3.0 13 22 2.5 14 63
4.6 15 40 3.2 16 48 4.2 17 28 2.3 18 49 4.0
16
Variance, Covariance and Correlation Theory
  • Let x and y be two random variables from a sample
    of n obervations.
  • Measure of variability of x and y variance
  • Measure of covariation between x and y ?
  • Coefficient of correlation (r)

17
Positive and Negative Correlation
r 0.9
r -0.9
18
Test of Hypothesis of Correlation
  • Hypothesis Ho r 0 versus Ho r not equal to
    0.
  • Step 1 Fishers z-transformation
  • Step 2 calculate standard error of z
  • Step 3 calculate t-statistic

19
An Example of Correlation Analysis
  • ID Age Cholesterol
  • (x) (y mg/100ml)
  • 1 46 3.5
  • 2 20 1.9
  • 3 52 4.0
  • 4 30 2.6
  • 5 57 4.5
  • 6 25 3.0
  • 7 28 2.9
  • 8 36 3.8
  • 9 22 2.1
  • 10 43 3.8
  • 11 57 4.1
  • 12 33 3.0
  • 13 22 2.5
  • 14 63 4.6
  • 15 40 3.2
  • 16 48 4.2
  • 17 28 2.3

Cov(x, y) 10.68
t-statistic 0.56 / 0.26 2.17 Critical t-value
with 17 df and alpha 5 is 2.11 Conclusion
There is a significant association between age
and cholesterol.
20
Simple Linear Regression Analysis
  • Only two variables are of interest one response
    variable and one predictor variable
  • No adjustment is needed for confounding or
    covariate
  • Assessment
  • Quantify the relationship between two variables
  • Prediction
  • Make prediction and validate a test
  • Control
  • Adjusting for confounding effect (in the case of
    multiple variables)

21
Linear Regression Model
  • Y random variable representing a response
  • X random variable representing a predictor
    variable (predictor, risk factor)
  • Both Y and X can be a categorical variable (e.g.,
    yes / no) or a continuous variable (e.g., age).
  • If Y is categorical, the model is a logistic
    regression model if Y is continuous, a simple
    linear regression model.
  • Model
  • Y a bX e
  • a intercept
  • b slope / gradient
  • random error (variation between subjects in y
    even if x is constant, e.g., variation in
    cholesterol for patients of the same age.)

22
Linear Regression Assumptions
  • The relationship is linear in terms of the
    parameter
  • X is measured without error
  • The values of Y are independently from each other
    (e.g., Y1 is not correlated with Y2)
  • The random error term (e) is normally distributed
    with mean 0 and constant variance.
  • If the assumptions are tenable, then
  • The expected value of Y is E(Y x) a bx
  • The variance of Y is var(Y) var(e) s2

23
Estimation of Model Parameters
  • Given two points A(x1, y1) and B(x2, y2) in a
    two-dimensional space, we can derive an equation
    connecting the points

Gradient
y
B(x2,y2)
Equation y mx a What happen if we have
more than 2 points?
dy
A(x1,y1)
dx
a
0
x
24
Method of Least Squares
  • For a series of pairs (x1, y1), (x2, y2), (x3,
    y3), , (xn, yn)
  • Let a and b be sample estimates for parameters a
    and b,
  • We have a sample equation Y a bx
  • Aim finding the values of a and b so that (Y
    Y) is minimal.
  • Let SSE sum of (Yi a bxi)2.
  • Values of a and b that minimise SSE are called
    least square estimates.

25
Criteria of Estimation
yi
Chol
Age
The goal of least square estimator (LSE) is to
find a and b such that the sum of d2 is minimal.
26
Least squares Estimates
  • After some calculus operations, the results can
    be shown to be

Where
  • When the regression assumptions are valid, the
    estimators of a and b have the following
    properties
  • Unbiased
  • Uniformly minimal variance (eg efficient)

27
Goodness-of-fit
  • Now, we have the equation Y a bX
  • Question how well the regression equation
    describe the actual data?
  • Answer coefficient of determination (R2) the
    amount of variation in Y is explained by the
    variation in X.

28
Partitioning of variations geometry
SSE
Chol (Y)
SST
SSR
mean
Age (X)
  • SST sum of squared difference between yi and
    the mean of y.
  • SSR sum of squared difference between the
    predicted value of y and the mean of y.
  • SSE sum of squared difference between the
    observed and predicted value of y.
  • SST SSR SSE
  • The the coefficient of determination is R2
    SSR / SST

29
Linear Regression Analysis by R
  • age lt- c(46,20,52,30,57,25,28,36,22,43,57,33,22,63
    ,40,48,28,49)
  • chol lt- c(3.5,1.9,4.0,2.6,4.5,3.0,2.9,3.8,2.1,3.8
    ,4.1,3.0,2.5,4.6,3.2,4.2,2.3,4.0)
  • lipid lt- data.frame(age,chol)
  • attach(lipid)
  • results lt- lm(chol age)
  • summary(results)

Residuals Min 1Q Median 3Q
Max -0.40729 -0.24133 -0.04522 0.17939
0.63040 Coefficients Estimate Std.
Error t value Pr(gtt) (Intercept) 1.089218
0.221466 4.918 0.000154 age
0.057788 0.005399 10.704 1.06e-08
--- Signif. codes 0 '' 0.001 '' 0.01
'' 0.05 '.' 0.1 ' ' 1 Residual standard error
0.3027 on 16 degrees of freedom Multiple
R-Squared 0.8775, Adjusted R-squared 0.8698
F-statistic 114.6 on 1 and 16 DF, p-value
1.058e-08
30
Interpretation of Model Estimates
  • Cholesterol 1.089 0.0578(Age)
  • Estimate Std. Error t value Pr(gtt)
  • (Intercept) 1.089218 0.221466 4.918 0.000154
  • age 0.057788 0.005399 10.704 1.06e-08
  • Interpretation Cholesterol is increased by
    0.0578 mg/ml for each year increase in age. The
    association between age and cholesterol is
    statistically significant (p 1.06e-08).

R-squared 0.8698
  • Interpretation Variation in age explained 85
    variation in cholesterol.

31
Prediction
  • plot(chol age)
  • abline(results)

Regression line Chol 1.089 0.0578(Age)
32
Checking Assumptions
  • par(mfrowc(2,2))
  • plot(results)

33
The Importance of Assumption BMI and Sexual
Attractiveness
  • bmi lt- c(11.0, 12.0, 12.5, 14.0, 14.0, 14.0,
    14.0, 14.0, 14.0,
  • 14.8, 15.0, 15.0, 15.5, 16.0, 16.5,
    17.0, 17.0, 18.0,
  • 18.0, 19.0, 19.0, 20.0, 20.0, 20.0,
    20.5, 22.0, 23.0,
  • 23.0, 24.0, 24.5, 25.0, 25.0, 26.0,
    26.0, 26.5, 28.0,
  • 29.0, 31.0, 32.0, 33.0, 34.0, 35.5,
    36.0, 36.0)
  • sa lt- c(2.0, 2.8, 1.8, 1.8, 2.0, 2.8, 3.2, 3.1,
    4.0, 1.5, 3.2,
  • 3.7, 5.5, 5.2, 5.1, 5.7, 5.6, 4.8, 5.4,
    6.3, 6.5, 4.9,
  • 5.0, 5.3, 5.0, 4.2, 4.1, 4.7, 3.5, 3.7,
    3.5, 4.0, 3.7,
  • 3.6, 3.4, 3.3, 2.9, 2.1, 2.0, 2.1, 2.1,
    2.0, 1.8, 1.7)
  • beauty lt- data.frame(bmi,sa)
  • attach(beauty)
  • results lt- lm(sa bmi)
  • summary(results)

Coefficients Estimate Std. Error t
value Pr(gtt) (Intercept) 4.92512
0.64489 7.637 1.81e-09 bmi -0.05967
0.02862 -2.084 0.0432 --- Signif.
codes 0 '' 0.001 '' 0.01 '' 0.05 '.' 0.1
' ' 1 Residual standard error 1.354 on 42
degrees of freedom Multiple R-Squared 0.09376,
Adjusted R-squared 0.07218 F-statistic 4.345
on 1 and 42 DF, p-value 0.04323
34
Incorrect Functional Form
35
Cubic Regression
resultslt-lm(sa poly(bmi,3)) summary(results)
  • Coefficients
  • Estimate Std. Error t value
    Pr(gtt)
  • (Intercept) 3.6500 0.1193 30.587 lt
    2e-16
  • poly(bmi, 3)1 -2.8228 0.7915 -3.566
    0.000957
  • poly(bmi, 3)2 -5.9749 0.7915 -7.548
    3.27e-09
  • poly(bmi, 3)3 4.0324 0.7915 5.094
    8.76e-06
  • ---
  • Signif. codes 0 '' 0.001 '' 0.01 '' 0.05
    '.' 0.1 ' ' 1
  • Residual standard error 0.7915 on 40 degrees of
    freedom
  • Multiple R-Squared 0.7051, Adjusted
    R-squared 0.683
  • F-statistic 31.88 on 3 and 40 DF, p-value
    1.077e-10

SA 3.65 2.82(BMI) 5.97(BMI)2 4.03(BMI)3
36
Sexual Attractiveness and BMI Cubic Function
  • bmi.new lt- (1040)
  • sa.pred predict(results, data.frame(bmibmi.new)
    )
  • plot(sa bmi)
  • lines(bmi.new, sa.pred, col"blue", lwd3)
Write a Comment
User Comments (0)
About PowerShow.com