Introduction to Biostatistics Descriptive Statistics and Sample Size Justification - PowerPoint PPT Presentation

Loading...

PPT – Introduction to Biostatistics Descriptive Statistics and Sample Size Justification PowerPoint presentation | free to download - id: 4f9547-MDEwZ



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Introduction to Biostatistics Descriptive Statistics and Sample Size Justification

Description:

Introduction to Biostatistics Descriptive Statistics and Sample Size Justification Julie A. Stoner, PhD October 17, 2005 Statistics Seminars Goal: Interpret and ... – PowerPoint PPT presentation

Number of Views:1134
Avg rating:3.0/5.0
Slides: 74
Provided by: PSMU7
Learn more at: http://webmedia.unmc.edu
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Introduction to Biostatistics Descriptive Statistics and Sample Size Justification


1
Introduction to Biostatistics Descriptive
Statistics and Sample Size Justification
  • Julie A. Stoner, PhD
  • October 17, 2005

2
Statistics Seminars
  • Goal Interpret and critically evaluate
    biomedical literature
  • Topics
  • Sample size justification
  • Exploratory data analysis
  • Hypothesis testing

3
Example 1
  • Aim Compare two antihypertensive strategies for
    lowering blood pressure
  • Double-blind, randomized study
  • 5 mg Enalapril 5 mg Felodipine ER to 10 mg
    Enalapril
  • 6-week treatment period
  • 217 patients
  • AJH, 199912691-696

4
Example 2
  • Aim Demonstrate that D-penicillamine (DPA) is
    effective in prolonging the overall survival of
    patients with primary biliary cirrhosis of the
    liver (PBC)
  • Mayo Clinic
  • Double-blind, placebo controlled, randomized
    trial
  • 312 patients
  • Collect clinical and biochemical data on patients
  • Reference NEJM. 3121011-1015.1985.

5
Example 2
  • Patients enrolled over 10 years, between January
    1974 and May 1984
  • Data were analyzed in July 1986
  • Event death (x)
  • Censoring some patients are still alive at end
    of study (o)
  • 1/1974 5/1984 6/1986
  • _____________________________X
  • ___________________________o
  • ________________________o

6
Statistical Inference
  • Goal describe factors associated with
    particular outcomes in the population at large
  • Not feasible to study entire population
  • Samples of subjects drawn from population
  • Make inferences about population based on sample
    subset

7
Why are descriptive statistics important?
  • Identify signals/patterns from noise
  • Understand relationships among variables
  • Formal hypothesis testing should agree with
    descriptive results

8
Outline
  • Types of data
  • Categorical data
  • Numerical data
  • Descriptive statistics
  • Measures of location
  • Measures of spread
  • Descriptive plots

9
Types of Data
  • Categorical data provides qualitative
    description
  • Dichotomous or binary data
  • Observations fall into 1 of 2 categories
  • Example male/female, smoker/non-smoker
  • More than 2 categories
  • Nominal no obvious ordering of the categories
  • Example blood types A/B/AB/O
  • Ordinal there is a natural ordering
  • Example never-smoker/ex-smoker/light
    smoker/heavy smoker

10
Types of Data
  • Numerical data (interval/ratio data)
  • Provides quantitative description
  • Discrete data
  • Observations can only take certain numeric values
  • Often counts of events
  • Example number of doctor visits in a year
  • Continuous data
  • Not restricted to take on certain values
  • Often measurements
  • Example height, weight, age

11
Descriptive Statistics Numerical Data
  • Measures of location
  • Mean average value
  • For n data points, x1, x2,, , xn the mean is the
    sum of the observations divided by the number of
    observations

12
Descriptive Statistics Numerical Data
  • Measures of location
  • Mean
  • Example Find the mean triglyceride level (in
    mg/100 ml) of the following patients
  • 159, 121, 130, 164, 148, 148, 152
  • Sum 1022, Count 7,
  • Mean 1022/7 146

13
Descriptive Statistics Numerical Data
  • Measures of location
  • Percentile value that is greater than a
    particular percentage of the data values
  • Order data
  • Pth percentile has rank r (n1)(P/100)
  • Median the 50th percentile, 50 of the data
    values lie below the median

14
Descriptive Statistics Numerical Data
  • Measures of location
  • Median
  • Example Find the median triglyceride level from
    the sample
  • 159, 121, 130, 164, 148, 148, 152
  • Order 121, 130, 148, 148, 152, 159, 164
  • Median rank (71) (50/100) 4
  • 4TH ordered observation is 148

15
Descriptive Statistics Numerical Data
  • Measures of location
  • Mode most common element of a set
  • Example Find the mode of the triglyceride
    values
  • 159, 121, 130, 164, 148, 148, 152
  • Mode 148

16
Descriptive Statistics Numerical Data
  • Measures of location comparison of mean and
    median
  • Example Compare the mean and median from the
    sample of triglyceride levels
  • 159, 141, 130, 230, 148, 148, 152
  • Mean 1108/7158.29, Median 148
  • The mean may be influenced by extreme data
    points.

17
Skewed Distributions
  • Data that is not symmetric and bell-shaped is
    skewed.
  • Mean may not be a good measure of central
    tendency. Why?

Positive skew, or skewed to the right, mean gt
median
Negative skew, or skewed to the left, mean lt
median
18
Motivation
  • Example
  • 1) 2 60 100 ? 54
  • 2) 53 54 55 ? 54
  • Both data sets have a mean of 54 but scores in
    set 1 have a larger range and variation than the
    scores in set 2.

19
Descriptive Statistics Numerical Data
  • Measures of spread
  • Variance average squared deviation from the
    mean
  • For n data points, x1, x2,, , xn the variance is
  • Standard deviation square root of variance, in
    same units as original data

20
Descriptive Statistics Numerical Data
  • Measures of spread
  • Standard Deviation
  • Example find the standard deviation of the
    triglyceride values
  • 159, 121, 130, 164, 148, 148, 152
  • Distance from mean 13, -25, -16, 18, 2, 2, 6
  • Sum of squared differences 1418
  • Standard deviation sqrt(1418/6)15.37

21
Descriptive Statistics Numerical Data
  • Standard deviation How much variability can we
    expect among individual responses?
  • Standard error of the mean How much variability
    can we expect in the mean response among various
    samples?

22
Descriptive Statistics Numerical Data
  • The standard error of the mean is estimated as
  • where s.d. is the estimated standard deviation
  • Based on the formula, will the standard error of
    the mean will always be smaller or larger than
    the standard deviation of the data?
  • Answer smaller

23
Descriptive Statistics Numerical Data
  • Measures of spread
  • Minimum, maximum
  • Range maximum-minimum
  • Interquartile range difference between 25th and
    75th percentile, values that encompass middle 50
    of data

24
Descriptive Statistics Numerical Data
  • Measures of spread
  • Example find the range and the interquartile
    range for the triglyceride values
  • 159, 121, 130, 164, 148, 148, 152
  • Range 164 - 121 43
  • Interquartile Range
  • Order 121, 130, 148, 148, 152, 159, 164
  • IQR 159 - 130 29

25
Descriptive Statistics Numerical Data
  • Helpful to describe both location and spread of
    data
  • Location mean
  • Spread standard deviation
  • Location median
  • Spread min, max, range
  • interquartile range
  • quartiles

26
Descriptive Statistics Categorical Data
  • Measures of distribution
  • Proportion
  • Number of subjects with characteristics
  • Total number subjects
  • Percentage
  • Proportion 100

27
Descriptive Statistics Categorical Data
  • Measures of distribution example
  • What percentage of vaccinated individuals
    developed the flu?
  • 198/400 0.495 49.5

28
Example
  • Consider the table of descriptive statistics for
    characteristics at baseline
  • What do we conclude about comparability of the
    groups at baseline in terms of gender and age?

29
Descriptive Plots
  • Single variable
  • Bar plot
  • Histogram
  • Box-plot
  • Multiple variables
  • Box-plot
  • Scatter plot
  • Kaplan-Meier survival plots

30
Barplot
  • Goal Describe the distribution of values for a
    categorical variable
  • Method
  • Determine categories of response
  • For each category, draw a bar with height equal
    to the number or proportion of responses

31
Barplot
32
Histogram
  • Goal Describe the distribution of values for a
    continuous variable
  • Method
  • Determine intervals of response (bins)
  • For each interval, draw a bar with height equal
    to the number or proportion of responses

33
Histogram
34
Box-plot
  • Goal Describe the distribution of values for a
    continuous variable
  • Method
  • Determine 25th, 50th, and 75th percentiles of
    distribution
  • Determine outlying and extreme values
  • Draw a box with lower line at the 25th
    percentile, middle line at the median, and upper
    line at the 75th percentile
  • Draw whiskers to represent outlying and extreme
    values

35
Boxplot
75th percentile
Median
25th percentile
36
Box-plot
37
Scatter Plot
  • Goal Describe joint distribution of values from
    2 continuous variables
  • Method
  • Create a 2-dimensional grid (horizontal and
    vertical axis)
  • For each subject in the dataset, plot the pair of
    observations from the 2 variables on the grid

38
Scatter Plot
39
Scatter Plot
40
Kaplan-Meier Survival Curves
  • Goal Summarize the distribution of times to an
    event
  • Method
  • Estimate survival probabilities while accounting
    for censoring
  • Plot the survival probability corresponding to
    each time an event occurred

41
Kaplan-Meier Survival Curves
42
Kaplan-Meier Survival Curves
43
Kaplan-Meier Survival Curves
44
Descriptive Plots Guidelines
  • Clearly label axes
  • Indicate unit of measurement
  • Note the scale when interpreting graphs

45
Descriptive Statistics
  • Exercises

46
Example
  • Below are some descriptive plots and statistics
    from a study designed to investigate the effect
    of smoking on the pulmonary function of children
  • Tager et al. (1979) American Journal of
    Epidemiology. 11015-26

47
Example
  • The primary question, for this exercise, is
    whether or not smoking is associated with
    decreased pulmonary function in children, where
    pulmonary function is measured by forced
    expiratory volume (FEV) in liters per second.
  • The data consist of observations on 654 children
    aged 3 to 19.

48
  • Proportion Male
  • (336/654)100 51.4
  • Proportion Smokers
  • (65/654)100 9.9
  • Proportion of Smokers who are Male
  • (26/65)100 40

49
Compare the FEV1 distribution between smokers and
non-smokers
  • Answer
  • The smokers appear
  • to have higher FEV values
  • and therefore better lung
  • function. Specifically, the
  • median FEV for smokers is
  • 3.2 liters/sec. (IQR 3.75-30.75)
  • compared to a median FEV of
  • 2.5 liters/sec. (IQR 3-21) for
  • non-smokers.

50
Compare the age distribution between smokers and
non-smokers.
  • Answer
  • The smokers are
  • older than the non-
  • smokers in general.
  • Specifically, the median
  • age for the smokers is
  • 13 years (IQR 15-123)
  • compared to 9 years
  • (IQR 11-83) for the
  • non-smokers.

51
Can you explain the apparent differences in
pulmonary function between smokers and
non-smokers displayed in Figure 1?
  • The relationship between FEV and smoking status
    is probably confounded by age (smokers are older
    and older children have better lung function). A
    comparison of FEV between smokers and non-smokers
    should account for age.

52
Sample Size Justification
53
Outline
  • Statistical Concepts hypotheses and errors
  • Effect size and variation
  • Influence on sample size and power

54
Sample Size Justification
  • Example Intensifying Antihypertensive Treatment
  • A sample size calculation indicated that 114
    patients per treatment group would be necessary
    for 90 power to detect a true mean difference in
    change from baseline of 3 mm Hg in sitting DBP
    between the two randomized treatment groups.
    This calculation assumed a two-sided test,
    ?0.05, and standard deviation in sitting DBP of
    7 mm Hg.
  • Source AJH. 199912691-696

55
Importance of Careful Study Design
  • Goal of sample size calculations
  • Adequate sample size to detect clinically
    meaningful treatment differences
  • Ethical use of resources
  • Important to justify sample size early in
    planning stages
  • Examples of inadequate power
  • NEJM 299690-694, 1978

56
Type of Response
  • Sample size calculations depend on type of
    response variable and method of analysis
  • Continuous response
  • Example cholesterol, weight, blood pressure
  • Dichotomous response
  • Example yes/no, presence/absence,
    success/failure
  • Time to event
  • Example survival time, time to adverse event

57
Statistical Concepts Hypotheses
  • Null hypothesis H0
  • Typically a statement of no treatment effect
  • Assumed true until evidence suggests otherwise
  • Example H0 No difference in DBP between
    treatment groups
  • Alternative HA
  • Reject null hypothesis in favor of alternative
    hypothesis
  • Often two-sided
  • Example HA DBP differs between treatment
    groups

58
Statistical Concepts Hypotheses
  • Alternative hypothesis may be one-sided or
    two-sided
  • Example
  • Null hypothesis Mean DBP is same in patients
    receiving different treatments
  • Alternative hypothesis
  • One-sided Mean DBP is lower in patients
    receiving treatment A
  • Two-sided Mean DBP is different in patients
    receiving treatment A relative to treatment B
  • Choice of alternative does affect sample size
    calculations. Typically a two-sided test is
    recommended.

59
Statistical Concepts Errors
  • Errors associated with hypothesis testing

60
Statistical Concepts Significance Level
  • Significance level ?
  • Probability of a Type I error
  • Probability of a false positive
  • Example If the effect on DBP of the treatments
    do not differ, what is the probability of
    incorrectly concluding that there is a difference
    between the treatments?
  • When calculating sample size, we need to specify
    a significance level, meaning, the probability
    that we will detect a treatment effect purely by
    chance.
  • Typically chosen to be 5, or 0.05

61
Statistical Concepts Power
  • Power (1-?)
  • Probability of detecting a true treatment effect
  • (1- probability of a false negative)
  • (1-probability of Type II error)
  • (1-?) probability of a true positive
  • Example If the effects of the treatments do
    differ, what is the probability of detecting such
    a difference?
  • Typically chosen to be 80-99

62
Treatment Effect
  • What is the minimal, clinically significant
    difference in treatments we would like to detect?
  • Pilot studies may indicate magnitude
  • Example The authors felt that a 3 mm Hg
    difference in DBP between the treatment groups
    was clinically significant

63
Variability in Response
  • To estimate sample size, we need an estimate of
    the variability of the response in the population
  • Estimate variability from pilot or previous,
    related study
  • Example The authors estimate that the standard
    deviation of DBP is 7 mm Hg.

64
Factors Influencing Sample Size
  • Assuming all other factors fixed,
  • ? power ?
  • ? sample size
  • ? significance level ?
  • ? sample size
  • ? variability in response ?
  • ? sample size
  • ? significant difference ?
  • ? sample size

65
Factors Influencing Power
  • Assuming all other factors fixed,
  • ? significance level ?
  • ? power
  • ? significant difference ?
  • ? power
  • ? variability in response ?
  • ? power
  • ? sample size ?
  • ? power

66
Summary
  • Sample size calculations are an important
    component of study design
  • Want sufficient statistical power to detect
    clinically significant differences between groups
    when such differences exist
  • Calculated sample sizes are estimates
  • Can manipulate sample size formulas to determine
  • What is the power for detecting a particular
    difference given the sample size employed?
  • What difference can be detected with a certain
    amount of power given the sample size employed?

67
Factors Influencing Sample Size
  • A double-blind randomized trial was conducted to
    determine how inhaled corticosteroids compare
    with oral corticosteroids in the management of
    severe acute asthma in children. In the study,
    100 children were randomized to receive one dose
    of either 2 mg of inhaled fluticasone or 2 mg of
    oral prednisone per kilogram of body weight. The
    primary outcome was forced expiratory volume (as
    a percentage of the predicted value) 4 hours
    after treatment administration.
  • Schuh et al., (2000) NEJM. 343(10)689-694.

68
Factors Influencing Sample Size
  • The null hypothesis is that the mean FEV, as a
    percentage of predicted value, is the same for
    both treatment groups.
  • The alternative hypothesis is that the mean FEV,
    as a percentage of predicted value, is different
    for the two treatment groups.

69
  • What is a Type I Error in this example?
  • Incorrectly concluding that the treatments differ
  • What is a Type II Error in this example?
  • Failing to detect a true treatment difference

70
  • In the article the authors state In order
    to allow detection of a 10 percentage point
    difference between the groups in the degree of
    improvement in FEV (as a percentage of the
    predicted value) from base line to 240 minutes
    and to maintain an ? error of 0.05 and a ? error
    of 0.10, the required size of the sample was 94
    children..
  • What is the power of the study and what does it
    mean?
  • What is the significance level of the study and
    what does this level mean?

71
  • Power
  • The power is 90
  • There is a 90 chance of detecting a treatment
    difference of 10 percentage points, given such a
    difference really exists
  • Significance Level
  • The significance level is 0.05
  • There is a 5 chance of concluding the treatments
    differ when in fact there is no difference

72
  • Assuming a 5 percentage point difference between
    the groups, what happens to power?
  • The power of the study, as proposed, would be
    less than 90
  • Assuming an 0.01 significance level what happens
    to power?
  • The power of the study, as proposed, would be
    less than 90

73
References
  • Descriptive Statistics
  • Altman, D.G., Practical Statistics for Medical
    Research. Chapman Hall/CRC, 1991.
  • Sample Size Justification
  • Freiman, J. A. et al. The importance of beta,
    the type II error and sample size in the design
    and interpretation of the randomized control
    trial Survey of 72 negative trials. N Engl J
    Med. 299690-694, 1978.
  • Friedman, L. M., Furberg, C. D., DeMets, D. L.,
    Fundamentals of Clinical Trials, Springer-Verlag,
    1998, Chapter 7.
  • Lachin, J. M. Introduction to sample size
    determination and power analysis for clinical
    trials. Controlled Clinical Trials. 293-113.
    1981.
About PowerShow.com