Design and Analysis of Clinical Study 9. Analysis of Cross-sectional Study - PowerPoint PPT Presentation

About This Presentation

Title:

Design and Analysis of Clinical Study 9. Analysis of Cross-sectional Study

Description:

An Example of Calculation of Prevalence ... BMI and Sexual Attractiveness. bmi - c(11.0, 12.0, 12.5, 14.0, 14.0, 14.0, 14.0, 14.0, 14.0, ... – PowerPoint PPT presentation

Number of Views:294

Avg rating:3.0/5.0

Slides: 37

Provided by: sws049

Category:

more less

Transcript and Presenter's Notes

Title: Design and Analysis of Clinical Study 9. Analysis of Cross-sectional Study

1
Design and Analysis of Clinical Study 9.
Analysis of Cross-sectional Study

Dr. Tuan V. Nguyen
Garvan Institute of Medical Research
Sydney, Australia

2
Overview

Estimate of prevalence
Analysis of difference between two proportions
Analysis of difference among proportions
Chi-square
Analysis of difference between two means
Analysis of association I simple linear
regression analysis
Analysis of association II multiple regression
analysis

3
Prevalence of Disease

Prevalence is NOT incidence
Measures the no. of people in a population who
have the disease at a given point in time.
this measure has been called point prevalence, in
contrast to period prevalence, infrequently used,
which sums cases existing at the start of a time
period to new cases that occur during the time
period
A measure of disease status, disease burden
in contrast to incidence which measures disease
onset events

4
1
2
3
4
5
T
Time
Prevalence At time T, 2 out of 5 subjects had
the disease P 2/5 0.4
5
Sampling Variability in Prevalence

Prevalence in the population (p) is UNKNOWN
Sample prevalence (p) is an unbiased estimate of
p.
x number of diseased individuals in the sample
p prevalence
N sample size
Estimates
p x/N
variance of p
standard error of p
95 CI of p

6
An Example of Calculation of Prevalence

The prevalence of ABO hemolytic disease in a
population is 43 out of 3584 subjects.
So, the estimated prevalence
p 43/3584 0.0125
Standard error of the prevalence
95 confidence interval
0.0125(1.96 x 0.002) 0.009 to 0.016.

7
Test for Difference Between Two Proportions
Vietnam Australia
N 700 1287
Osteoporosis 148 345
Prevalence 0.211 0.268
Variance (s2) 0.000238 0.000152

p1 proportion for group 1
p2 proportion for group 2
N1 sample size for group 1
N2 sample size for group 2
d p1 p2
variance of d
z-test

d 0.268 0.211 0.057 variance of d s2
0.000238 0.000152 0.000391 z-test z
0.057 / sqrt(0.00391) 2.87 Significant!
8
Test for Difference Among Proportions
Caffeine consumption 1- 151- 300- None 150 300
900 Total _______________________________________
_____ Marital status Married 652 1537 598 242 3029
Divorced 36 46 38 21 141 Single 218 327 106 67 71
8 Total 906 1910 742 330 3888
652/30290.22 1537/30290.51
598/30290.20 242/30290.08 36/1410.26
46/1410.33 38/1410.27
21/1410.15 218/7180.30 327/7180.46
106/7180.15 67/7180.09 906/38880.23
1910/38880.49 742/3888-.19
330/38880.08
In percent (row) Married 0.22 0.51 0.20 0.08 100 D
ivorced 0.26 0.33 0.27 0.15 100 Single
0.30 0.46 0.15 0.09 100 Total 0.23 0.49 0.19 0.08
100
9
Test for Difference Among Proportions
Caffeine consumption 1- 151- 300- None 150 300
900 Total _______________________________________
________ Marital status Married 652 1537 598 242 3
029 Divorced 36 46 38 21 141 Single 218 327 106 67
718 Total 906 1910 742 330 3888
3029/3888906705.8 3029/388819101488
3029/3888742578.1 3029/3888330257.1 14
1/388890632.9 141/3888191069.3
141/388874226.9
141/388833012.0 718/3888906167.3
718/38881910352.7 718/3888742137.0
718/388833060.9
Caffeine consumption 1- 151- 300- None 150 300
900 Total _______________________________________
________ Expected freq. Married 705.8 1488 578.1 2
57.1 3029 Divorced 32.9 69.3 26.9 12.0 141 Single
167.3 352.7 137.0 60.9 718 Total 906 1910 742 330
3888
10
Test for Difference Among Proportions
Caffeine consumption 1- 151- 300- None 150 300
900 ____________________________________________
___ Marital status Married 652 1537 598 242 O (70
5.8) (1488) (578.1) (257.1) E Divorced 36 46 38 2
1 O (32.9) (69.3) (26.9) (12.0) E Single 218 327
106 67 O (167.3) (352.7) (137.0) (60.9) E
(652-705.8)2 / 705.8 4.11 (1537 1488)2 / 1488
1.61 .
Chisq 51.6 df 3x26 X2 1.63 for a0.05
(O - E)2/E Married 4.11 1.61 0.69 0.89 7.30 Divorc
ed 0.30 7.82 4.57 6.82 19.51 Single
15.30 1.88 7.02 0.60 24.86 Total 19.77 11.31 12.28
8.31 51.66
11
Normal Distribution
Phân ph?i chi?u cao ? ph? n? Vi?t Nam v?i trung
bình 156 cm và d? l?ch chu?n 4.6 cm. Tr?c hoành
là chi?u cao và tr?c tung là xác su?t cho m?i
chi?u cao.
12
Application of the Normal Distribution

The serum cholesterol levels of Californian
children have a mean of 175 mg/100ml and a
standard deviation of 30 mg/100ml. The
distribution of the cholesterol levels is normal.
95 of the children should have cholesterol
levels ranged between 175 (1.96x30) 116 and
234 mg/100ml.
If we let X be the chol. level for any child,
then X can be converted to a variable with mean0
and SD1
Z (X 175) / 30

mg/100l
116
234
175
Z
-1.96
1.96
0
Abnormal?
Abnormal?
13
Two-group comparison unpaired t-test
Mean difference D x1 x2 Variance of D
Group 1 Group2 x11
x21 x12 x22 x13 x23 x14 x24
x15 x25 x1n x2n Sample size n1 n2
Mean x1 x2 SD s1 s2
T-statistic
95 Confidence interval
14
Two-group comparison an example
A B 100 122 108 130 119 138 127 142 132
152 135 154 136 176 164 N 8 7 Mean 127.
6 144.9 SD 19.6 17.8
Mean difference d 127.6 144.9
-17.3 Variance of D
T-statistic
95 Confidence interval
15
Analysis of Correlation
ID Age Chol (mg/ml) 1 46 3.5 2 20 1.9 3 52 4.0
4 30 2.6 5 57 4.5 6 25 3.0 7 28 2.9 8 36 3.8 9 22
2.1 10 43 3.8 11 57 4.1 12 33 3.0 13 22 2.5 14 63
4.6 15 40 3.2 16 48 4.2 17 28 2.3 18 49 4.0
16
Variance, Covariance and Correlation Theory

Let x and y be two random variables from a sample
of n obervations.
Measure of variability of x and y variance

Measure of covariation between x and y ?

Coefficient of correlation (r)

17
Positive and Negative Correlation
r 0.9
r -0.9
18
Test of Hypothesis of Correlation

Hypothesis Ho r 0 versus Ho r not equal to
0.
Step 1 Fishers z-transformation

Step 2 calculate standard error of z

Step 3 calculate t-statistic

19
An Example of Correlation Analysis

ID Age Cholesterol
(x) (y mg/100ml)
1 46 3.5
2 20 1.9
3 52 4.0
4 30 2.6
5 57 4.5
6 25 3.0
7 28 2.9
8 36 3.8
9 22 2.1
10 43 3.8
11 57 4.1
12 33 3.0
13 22 2.5
14 63 4.6
15 40 3.2
16 48 4.2
17 28 2.3

Cov(x, y) 10.68
t-statistic 0.56 / 0.26 2.17 Critical t-value
with 17 df and alpha 5 is 2.11 Conclusion
There is a significant association between age
and cholesterol.
20
Simple Linear Regression Analysis

Only two variables are of interest one response
variable and one predictor variable
No adjustment is needed for confounding or
covariate
Assessment
Quantify the relationship between two variables
Prediction
Make prediction and validate a test
Control
Adjusting for confounding effect (in the case of
multiple variables)

21
Linear Regression Model

Y random variable representing a response
X random variable representing a predictor
variable (predictor, risk factor)
Both Y and X can be a categorical variable (e.g.,
yes / no) or a continuous variable (e.g., age).
If Y is categorical, the model is a logistic
regression model if Y is continuous, a simple
linear regression model.
Model
Y a bX e
a intercept
b slope / gradient
random error (variation between subjects in y
even if x is constant, e.g., variation in
cholesterol for patients of the same age.)

22
Linear Regression Assumptions

The relationship is linear in terms of the
parameter
X is measured without error
The values of Y are independently from each other
(e.g., Y1 is not correlated with Y2)
The random error term (e) is normally distributed
with mean 0 and constant variance.
If the assumptions are tenable, then
The expected value of Y is E(Y x) a bx
The variance of Y is var(Y) var(e) s2

23
Estimation of Model Parameters

Given two points A(x1, y1) and B(x2, y2) in a
two-dimensional space, we can derive an equation
connecting the points

Gradient
y
B(x2,y2)
Equation y mx a What happen if we have
more than 2 points?
dy
A(x1,y1)
dx
a
0
x
24
Method of Least Squares

For a series of pairs (x1, y1), (x2, y2), (x3,
y3), , (xn, yn)
Let a and b be sample estimates for parameters a
and b,
We have a sample equation Y a bx
Aim finding the values of a and b so that (Y
Y) is minimal.
Let SSE sum of (Yi a bxi)2.
Values of a and b that minimise SSE are called
least square estimates.

25
Criteria of Estimation
yi
Chol
Age
The goal of least square estimator (LSE) is to
find a and b such that the sum of d2 is minimal.
26
Least squares Estimates

After some calculus operations, the results can
be shown to be

Where

When the regression assumptions are valid, the
estimators of a and b have the following
properties
Unbiased
Uniformly minimal variance (eg efficient)

27
Goodness-of-fit

Now, we have the equation Y a bX
Question how well the regression equation
describe the actual data?
Answer coefficient of determination (R2) the
amount of variation in Y is explained by the
variation in X.

28
Partitioning of variations geometry
SSE
Chol (Y)
SST
SSR
mean
Age (X)

SST sum of squared difference between yi and
the mean of y.
SSR sum of squared difference between the
predicted value of y and the mean of y.
SSE sum of squared difference between the
observed and predicted value of y.
SST SSR SSE
The the coefficient of determination is R2
SSR / SST

29
Linear Regression Analysis by R

age lt- c(46,20,52,30,57,25,28,36,22,43,57,33,22,63
,40,48,28,49)
chol lt- c(3.5,1.9,4.0,2.6,4.5,3.0,2.9,3.8,2.1,3.8
,4.1,3.0,2.5,4.6,3.2,4.2,2.3,4.0)
lipid lt- data.frame(age,chol)
attach(lipid)
results lt- lm(chol age)
summary(results)

Residuals Min 1Q Median 3Q
Max -0.40729 -0.24133 -0.04522 0.17939
0.63040 Coefficients Estimate Std.
Error t value Pr(gtt) (Intercept) 1.089218
0.221466 4.918 0.000154 age
0.057788 0.005399 10.704 1.06e-08
--- Signif. codes 0 '' 0.001 '' 0.01
'' 0.05 '.' 0.1 ' ' 1 Residual standard error
0.3027 on 16 degrees of freedom Multiple
R-Squared 0.8775, Adjusted R-squared 0.8698
F-statistic 114.6 on 1 and 16 DF, p-value
1.058e-08
30
Interpretation of Model Estimates

Cholesterol 1.089 0.0578(Age)
Estimate Std. Error t value Pr(gtt)
(Intercept) 1.089218 0.221466 4.918 0.000154
age 0.057788 0.005399 10.704 1.06e-08

Interpretation Cholesterol is increased by
0.0578 mg/ml for each year increase in age. The
association between age and cholesterol is
statistically significant (p 1.06e-08).

R-squared 0.8698

Interpretation Variation in age explained 85
variation in cholesterol.

31
Prediction

plot(chol age)
abline(results)

Regression line Chol 1.089 0.0578(Age)
32
Checking Assumptions

par(mfrowc(2,2))
plot(results)

33
The Importance of Assumption BMI and Sexual
Attractiveness

bmi lt- c(11.0, 12.0, 12.5, 14.0, 14.0, 14.0,
14.0, 14.0, 14.0,
14.8, 15.0, 15.0, 15.5, 16.0, 16.5,
17.0, 17.0, 18.0,
18.0, 19.0, 19.0, 20.0, 20.0, 20.0,
20.5, 22.0, 23.0,
23.0, 24.0, 24.5, 25.0, 25.0, 26.0,
26.0, 26.5, 28.0,
29.0, 31.0, 32.0, 33.0, 34.0, 35.5,
36.0, 36.0)
sa lt- c(2.0, 2.8, 1.8, 1.8, 2.0, 2.8, 3.2, 3.1,
4.0, 1.5, 3.2,
3.7, 5.5, 5.2, 5.1, 5.7, 5.6, 4.8, 5.4,
6.3, 6.5, 4.9,
5.0, 5.3, 5.0, 4.2, 4.1, 4.7, 3.5, 3.7,
3.5, 4.0, 3.7,
3.6, 3.4, 3.3, 2.9, 2.1, 2.0, 2.1, 2.1,
2.0, 1.8, 1.7)
beauty lt- data.frame(bmi,sa)
attach(beauty)
results lt- lm(sa bmi)
summary(results)

Coefficients Estimate Std. Error t
value Pr(gtt) (Intercept) 4.92512
0.64489 7.637 1.81e-09 bmi -0.05967
0.02862 -2.084 0.0432 --- Signif.
codes 0 '' 0.001 '' 0.01 '' 0.05 '.' 0.1
' ' 1 Residual standard error 1.354 on 42
degrees of freedom Multiple R-Squared 0.09376,
Adjusted R-squared 0.07218 F-statistic 4.345
on 1 and 42 DF, p-value 0.04323
34
Incorrect Functional Form
35
Cubic Regression
resultslt-lm(sa poly(bmi,3)) summary(results)