Estimation - PowerPoint PPT Presentation

About This Presentation
Title:

Estimation

Description:

Estimation and hypothesis testing Estimation & hypothesis testing (F-test, Chi2-test, t-tests) Introduction t-tests Outlier tests (k SD: Grubbs, Dixon's Q) – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 31
Provided by: ProfD152
Category:
Tags: estimation | test

less

Transcript and Presenter's Notes

Title: Estimation


1
Estimation hypothesis testing (F-test,
Chi2-test, t-tests)
Estimation and hypothesis testing
  • Introduction
  • t-tests
  • Outlier tests (k SD Grubbs, Dixon's Q)
  • F-test, ?2 (Chi2)-test ( 1-sample F-test)
  • Tests and confidence limits
  • Analysis of variance (ANOVA)
  • Introduction
  • Model I ANOVA
  • Performance strategy
  • Testing of outliers
  • Testing of variances (Cochran "C", Bartlett)
  • Model II ANOVA
  • Applications

CINHST CINHST-EXCEL CINHST-Exercise
Grubbs, free download from http//www.graphpad.c
om/articles/outlier.htm
CochranBartlett ANOVA
Power
2
Introduction
Introduction
  • When we have a set/sets of data ("sample"), we
    often want to know whether a statistical estimate
    thereof (e.g., difference in 2 means, difference
    of a SD from a target) is pure coincidence or
    whether it is "statistically significant". We can
    approach this problem in the following way
  • The null hypothesis H0 (no difference) is tested
    against the alternative hypothesis H1 (there is a
    difference) on the basis of collected data. The
    decision acceptance/rejection of the hypothesis
    is made with a certain probability, most often
    with 95 (statistical significance).
  • Because, usually, we have a limited set of data
    ("sample"), we extrapolate the estimates from our
    sample to the underlying populations by use of
    the statistical distribution theory and we assume
    random sampling.
  • Hypothesis testing Example
  • Is the difference
  • between the means
  • of two data sets
  • real or only accidental?
  • Statistical significance in more detail
  • In statistics the words significant and
    significance have specific meanings. A
    significant difference, means a difference that
    is unlikely to have occurred by chance. A
    significance test, shows up differences unlikely
    to occur because of a purely random variation. To
    decide if one set of results is significantly
    different from another depends not only on the
    magnitude of the difference in the means but also
    on the amount of data available and its spread.

Different?
3
Significance testing Qualitative investigation
Introduction
  • Adapted from Shaun Burke, RHM Technology Ltd,
    High Wycombe, Buckinghamshire, UK. Understanding
    the Structure of Scientific Data
  • LC GC Europe Online Supplement

probably not different and would 'pass' the
t-test (tcrit gt tcalc)
probably different and would 'fail' the t-test
(tcrit lt tcalc)
could be different but not enough data to say for
sure (i.e., would 'pass' the t-test tcrit gt
tcalc)
practically identical means, but with so many
data points there is a small but statistically
siginificant ('real') difference and so would
'fail' the t-test (tcrit lt tcalc)
spread in the data as measured by the variance
are similar would 'pass' the F-test (Fcrit gt
Fcalc)
spread in the data as measured by the variance
are different would 'fail' the F-test (Fcrit lt
Fcalc)
could be a different spread but not enough data
to say for sure would 'pass' the F-test (Fcrit gt
Fcalc)
4
General remarks
Introduction
  • General requirements for parametric tests
  • Random sampling
  • Normal distributed data
  • Homogeneity of variances, when applicable
  • Note on the testing of means
  • When we test means, the central limit theorem is
    of great importance because it favours the use of
    parametric statistics.
  • Central limit theorem (see also "sampling
    statistics")
  • The means of independent observations tend to be
    normally distributed
  • irrespective of the primary type of distribution.
  • Implications of the central limit theorem
  • When dealing with mean values, the type of
    primary distribution is of limited importance,
    e.g. the t-test for comparisons of means.
  • When dealing with percentiles, e.g. reference
    intervals, the type of distribution is indeed
    important.

5
Overview of test procedures (parametric)
Introduction
  • Testing levels
  • 1-sample t-test comparison of a mean value with
    a target or limit
  • t-test comparison of mean values (unpaired)
    Perform F-test before
  • t-test equal variances
  • t-test unequal variances
  • paired t-test comparison of paired measurements
    (x, y)
  • Testing outliers
  • k SD Grubbs (http//www.graphpad.com/articles
    /outlier.htm)
  • Dixon's Q (Annex n 3 to 25)
  • Testing dispersions
  • F-test for comparison of variances F s22/s12
  • ?2 (Chi2)-test or 1-sample F-test
  • Testing variances (several groups)
  • Cochran "C"

6
t-tests
t-tests
  • Difference between a mean and a target
    ("One-sample" t-test)
  • With 95-CI xm /- t0.05n s/?N (s/?N
    Standard error)
  • ? t (µ0 - xm)/(s/?N)
  • For t
  • Degrees of freedom n N-1
  • Probability a 0.05
  • Important ? t-distribution (see before sampling
    statistics)
  • Difference between two means
  • Perform F-test before, decide on the outcome to
    use the t-test with equal or unequal variances.
  • Given independence, the difference between two
    variables that are normally distributed is also
    normally distributed.
  • The variance of the difference is the sum of the
    individual variances
  • t (xm2 xm1)/s2/N1 s2/N20.5
  • where s2 is a common estimate of the variance (
    "pooled variance")
  • s2 (N1 1)s12 (N2 1)s22/(N1 N2 2)

7
t-test different variances
t-tests
  • The difference is still normally distributed
    given s1 ? s2 and the difference of means has the
    variance s12/N1 s22/N2, which is estimated as
    s12/N1 s22/N2.
  • However, the t value t (xm2 xm1)/s12/N1
    s22/N20.5 does not strictly follow the
    t-distribution. The problem is mainly of academic
    interest and special tables for t have been
    provided (Behrens, Fisher, Welch).
  • gtPerform F-test before t-test!
  • paired t-test comparison of mean values (paired
    data)
  • Example Measurements before and after treatment
    in patients. When testing for a difference with
    paired measurements, the paired t-test is
    preferable. This is because such measurements are
    correlated and pairing of the data reduces the
    random variation. Thereby, it increases the
    probability of detecting a difference.
  • Calculations
  • The individual paired differences are computed
  • difi x2i x1i
  • The mean and standard deviation of the N (N1
    N2) differences are computed
  • difm S difi /N sdif S (difi
    difm)2/(N-1)0.5
  • SEdif sdif/N0.5
  • Testing for whether the mean paired difference
    deviates from zero
  • t (difm 0)/SEdif (N-1 degrees of freedom)

8
Outliers
Outliers
  • Outliers have great influence on parametric
    statistical tests. Therefore, it is desirable to
    investigate the data for outliers (see Figure,
    for example).
  • Testing for outliers can be done with the
    following techniques
  • k SD Grubbs (http//www.graphpad.com/articles
    /outlier.htm)
  • Dixon's Q (Annex n 3 to 25)
  • All assume normal distributed data.
  • The k SD method (outlier point gt k SD away
    from the mean)
  • With this method, it is important to know that
    the statistical chance to find an outlier will
    increase with the number of data investigated.

The upper point is an outlier according to the
"Grubb's test" (P lt 0.05)
9
F-test Comparing variances
F-test ?2 (Chi2)-test
  • If we have two data sets, we may want to compare
    the dispersions of the distributions.
  • Given normal distributions, the ratio between the
    variances is considered.
  • The variance ratio test was developed by Fisher,
    therefore the ratio is usually referred to as the
    F-ratio and related to tables of the
    F-distribution.
  • Calculation
  • F s22 /s12
  • Note The greater value
  • should be in the numerator ? F ? 1!
  • Example
  • F s22 /s12
  • (0.228)2/(0.182)2
  • 1.6 n.s.
  • Degrees of freedom
  • df2(Num) 14-1 13
  • df1(Denom) 21-1 20
  • Critical(0.05) F 2.25
  • F-test Some notes

Numerator Denominator
10
F-test (ctd.)
F-test ?2 (Chi2)-test
  • ?2 (Chi2)-test (or 1-sample F-test)
  • Comparing a variance with a target or limit
  • Chi2exp s2exp n/s2Man
  • Test whether Chi2exp ? Chi2critical (1-sided,
    0.05).
  • One-sided because we test versus a targhet or a
    limit.
  • The Chi2-test is used in the CLSI EP 5 protocol.
  • Relationships between F, t, and Chi2
  • Relationship between Chi2 and F
  • Chi2/n Fn,? n degrees of freedom.
  • Relationship between F and t
  • The one-tailed F-test with 1 and n degree of
    freedom is equivalent to the t-test with n degree
    of freedom. The relationship t2 F holds for
    both calculated and tabular values of these two
    distributions t(12,0.05) 2.17881 F(1,12,0.05)
    4.7472
  • Peculiarities and problems with the EXCEL F-test

11
Interpretation of the P-value
P-values
  • A test for statistical significance (at a certain
    probability P), tests whether a hypothesis has to
    be rejected or not, for example, the
    nulhypothesis.
  • The nulhypothesis of the F-test is that 2
    variances are not different or that an
    experimentally found difference is only by
    chance.
  • The nulhypothesis of the F-test will not be
    rejected when the calculated probability Pexp is
    greater or equal than the chosen probability P (P
    usually chosen as 0.05 5), or when the
    experimental Fexp value is smaller or equal than
    the critical Fcrit value.
  • Example
  • Fexp (calculated) 1.554
  • Critical value of Fcrit 2.637
  • Pexp (from experiment) 0.182
  • Chosen probability
  • P 0.05
  • Observation
  • The calculated P-value (0.182 18) is greater
    than the chosen P-value (0.05 5). However, the
    experimental F-value is lt the critical F-value.
  • Conclusion
  • The nulhypothesis is not rejected, this means
    that the difference of the variances is only by
    chance.
  • NOTE

12
Tests and confidence limits
Tests and confidence limits
  • We have seen for the 1-sample t-test the close
    relationship between confidence intervals and
    significance testing. In many situations, one can
    use either of them for the same purpose.
    Confidence intervals have the advantage that they
    can be shown in graphs and they provide
    information about the spread of an estimate
    (e.g., a mean).
  • The tables below give an overview about the
    concordance between CI's and significance testing
    for means and variances (SD's).

-t 2-sided, or 1-sided 1-sided for comparison
with claims -When stable s is known, z may be
chosen instead of t
13
Exercises
CINHST CINHST-EXCEL
  • This tutorial/EXCEL template explains the
    connection between Significance Tests and
    Confidence Intervals when the purpose is Null
    Hypothesis Significance Testing (NHST). Indeed,
    for the specific purpose of NHST, P-values as
    well as CI's can be used (look whether the null
    value or target value is inside or outside the
    CI), they are just two sides of the same medal.
  • Examples are the comparison of
  • i) a standard deviation (SD) with a target value,
  • ii) two standard deviations,
  • iii) a mean with a target value,
  • iv) two means, and
  • v) a mean paired difference with a target value.
  • The statistical tests involved are the 1-sample
    F-test, F-test, 1-sample t-test, t-test, and the
    paired t-test, respectively, the CI's of SD, F,
    mean, mean difference, and mean paired
    difference.
  • Another exercise shows how NHST is influenced by
  • -The magnitude of the difference
  • -The number of data-points
  • -The magnitude of the SD
  • Please follow the guidance given in the "Exercise
    Icons" and read the comments.

CINHST-Exercise
Grubbs
14
Notes
Notes
15
Analysis of Variance ANOVA
ANOVA
  • The Three Universal Assumptions of Analysis of
    Variance
  • 1. Independence
  • 2. Normality
  • 3. Homogeneity of Variance
  • Overview of the concepts
  • Model I (Assessing treatment effects)
  • Comparison of mean values of several groups.
  • Model II (Random effects)
  • Study of variances Analysis of components of
    variance
  • Model I and II Identical computations - but
    different purposes and interpretations!
  • Why ANOVA?
  • Model I (Assessing treatment effects)
  • ANOVA is an extension of the commonly used
    t-test for comparing the means of two groups.
  • The aim is a comparison of mean values of
    several groups.
  • The tool is an assessment of variances.

16
Introduction Types of ANOVA
ANOVA
  • One-way Only one type of classification, e.g.
    into various treatment groups
  • Ex. Study of serum cholesterol level in various
    treatment groups
  • Two-way Subclassification within treatment
    groups, e.g. according to gender
  • Ex. Do various treatments influence serum
    cholesterol in the same way in men and women?
    (not considered further here)
  • Principle of One-way ANOVA
  • Case 1 Null-hypothesis valid

Distances within (- - -) and between () groups
are squared and summed, and finally compared.
No significant difference between groups Red
distances are small the main source of
variation is within-groups.
Significant difference between groups Red
distances are large the main source of
variation is between-groups.
17
Introduction Mathematical model
ANOVA
  • One-way ANOVA
  • Mathematical model (example treatment)
  • Yij Grand mean
  • (?j) treatment (between-group) effectj
  • ?ij (within-group)
  • Null hypothesis Treatment group effects are
    zero
  • Alternative hypothesis Treatment group effects
    present
  • Avoiding some of the pitfalls using ANOVA
  • In ANOVA it is assumed that the data are normally
    distributed. Usually in ANOVA we dont have a
    large amount of data so it is difficult to prove
    any departure from normality. It has been shown,
    however, that even quite large deviations do not
    affect the decisions made on the basis of the
    F-test.
  • A more important assumption about ANOVA is that
    the variance (spread) between groups is
    homogeneous (homoscedastic). The best way to
    avoid this pitfall is, as ever, to plot the data.
    There also exist a number of tests for
    heteroscedasity (i.e., Bartlett's test and
    Levene's test). It may be possible to overcome
    this type of problem in the data structure by
    transforming it, such as by taking logs. If the
    variability within a group is correlated with its
    mean value then ANOVA may not be appropriate
    and/or it may indicate the presence of outliers
    in the data. Cochran's test can be used to test
    for variance outliers.

18
Model I ANOVA Violation of assumptions
ANOVA
19
Model I ANOVA Short summary
ANOVA
  • Plot your data
  • Generally, the procedure is robust towards
    deviations from normality.
  • However, it is indeed sensitive towards
    outliers, i.e. investigate for outliers within
    groups.
  • When the variance within groups is not constant,
    e.g. being proportional to the level, logarithmic
    transformation may be appropriate.
  • Testing for variance homogeneity may be carried
    out by Bartletts test.
  • Cochran's test can be used to test for variance
    outliers.
  • When F is significant
  • ? Supplementary analyses will not be addressed
    in more detail!
  • Maximum against minimum (Student-Newman-Keuls
    procedure)
  • Pairwise comparisons with control of type I
    error (Tukey)
  • Post test for trend (regression analysis)
  • Control versus others (Dunnett)
  • Control group (C) versus treatment groups
  • Often, focus is on effects in
  • treatment groups versus

20
Model II (random effects) ANOVA
ANOVA
Example Ranges of serum cholesterol in
different subjects.
  • Model II (random effects) ANOVA
  • (analysis of components of variation)
  • Mathematical model
  • Yij Grand mean
  • Between-group variation ?j (?B)
  • Within-group variation ?i (?W)
  • Reminder

21
Total variance (total standard deviation)
ANOVA
  • The standard deviation (s) of calculated results
    (propagation of s)
  • 1. Sums and differences
  • y a(sa) b(sb) c(sc) ? sy SQRTsa2
    sb2 sc2 (SQRT square root)
  • Do not propagate CV!
  • 2. Products and quotients
  • y a(sa) b(sb) / c(sc) ? sy/y
    SQRT(sa/a)2 (sb/b)2 (sc/c)2
  • 3. Exponents (the x in the exponent is
    error-free)
  • y a(sa)x ? sy/y x sa/a
  • Addition of variances stot SQRTs21 s22
  • A large component will dominate
  • Forms the basis for the suggestion by Cotlove et
    al. SDA lt 0.5 x SDI
  • A analytical variation
  • I within-individual biological variation
  • ? In a monitoring situation the total random
    variation of changes is only increased up to 12
    as long as this relation holds true.

22
Software output
ANOVA
  • One-way ANOVA Output of statistical programs
  • Variances within and between groups are evaluated
  • Interpretation of model I ANOVA The F-ratio
  • If the ratio of between- to within-mean square
    exceeds a critical F-value (refer to a table or
    look at the P-value), a significant difference
    between group means has been disclosed.
  • F Fisher published the ANOVA approach in 1918.
  • Components of variation
  • Relation to standard output of statistics
    programs

XGP Group mean XGM Grand mean
df Degrees of freedom (Mean square Variance
Squared SD)
F MSB/MSW n SDB2 SDW2/SDW2
For unequal group sizes, a sort of average n is
calculated according to a special formula n0
1/(K-1)N - ?ni2/N
23
Conclusion
ANOVA
  • Model I ANOVA
  • A general tool for assessing differences between
    group means
  • Model II ANOVA
  • Useful for assessing components of variation
  • Nonparametric ANOVA
  • Kruskall-Wallis test A generalization of the
    Mann-Whitney test to deal with gt 2 groups.
  • Friedmans test A generalization of Wilcoxons
    paired rank test to more than two repeats.
  • The study of components of variation not
    suitable for nonparametric analysis.
  • Software
  • ANOVA is included in standard statistical
    packages (SPSS, BMDP, StatView, STATA,
    StatGraphics etc.)
  • Variance components may be given or be derived
    from mean squares as outlined in the tables.

24
Exercises
CochranBartlett
  • Many statistical programs do not include the
    Cochran or Bartlett test. Therefore, they have
    been elaborated in an EXCEL-file.
  • The CochranBartlett file contains the formula's
    for the
  • -Cochran test for an outlying variance (including
    the critical values)
  • -Bartlett test for variance homogeneity
  • Both are important for ANOVA
  • -A calculation example
  • More experienced EXCEL users may be able to adapt
    this template to their own applications.
  • This tutorial contains interactive exercises for
    self-education in Analysis of Variance (ANOVA).
  • ANOVA can be used for 2 purposes
  • -Model I (Assessing treatment effects)
  • Comparison of MEAN values of several
    groups.
  • -Model II (Random effects)

ANOVA
25
Notes
Notes
26
Notes
Notes
27
The statistical Power concept sample size
calculations
Power and sample size
  • When testing statistical hypotheses, we can make
    2 types of errors. The so-called type I (or a
    error) and the type II (or b error). The power of
    a statistical test is defined as 1- b error. The
    power concept is demonstrated in the figure
    below, denoting the probability of the a-error by
    p, and the one of the b-error by q. Like
    significance testing, power calculations can be
    done 1-and 2-sided.
  • Purpose of power analysis and sample-size
    calculation
  • Some key decisions in planning any experiment
    are, "How precise will my parameter estimates
    tend to be if I select a particular sample size?"
    and "How big a sample do I need to attain a
    desirable level of precision?
  • Power analysis and sample-size calculation allow
    you to decide (a) how large a sample is needed to
    enable statistical judgments that are accurate
    and reliable and (b) how likely your statistical
    test will be to detect effects of a given size in
    a particular situation. "

28
The statistical Power concept sample size
calculations
Power and sample size
  • Calculations
  • Definitions
  • zp/2 probability of the nul-hypothesis
  • (usually 95, 1- or 2-sided e.g. zp/2 1.65 or
    1.96)
  • z1-q probability of the alternative-hypothesis
  • (usually 90, always 1-sided e.g. z1-q 1.28)
  • N number of measurements to be performed
  • Mean versus a target value
  • N SD/(mean target)2 (zp/2 z1-q)2
  • Detecting a relevant difference (gives the number
    required in each group)
  • N (SDDelta/Delta)2 (zp/2 z1-q)2
  • Delta Difference to be detected
  • SDDelta SQRT(SDx2 SDy2), usually SDx SDy
    gtSDDelta ?2 SD
  • (requires previous knowledge of the SD)
  • Example difference between 2 groups

29
Exercises
Power
  • This file contains 2 worksheets that explain the
    power concept and allow simple sample-size
    calculations.
  • Please use dedicated software for routine power
    calculations.
  • Concept
  • Use the respective "Spinners" to change the
    values (or enter the values directly in the blue
    cells) for
  • -Mean
  • -SD
  • For comparison of a sample mean versus a
    target, use sample SD
  • For comparison of 2 sample means with the
    same SD,
  • use SD SQRT(2)SD
  • -Sample size
  • -Significance level (Only with Spinner!!!)
  • Limited to the same value for alpha- and
    beta-error!
  • NOTE alpha 2-sided, beta 1-sided!!!
  • gtObserve the effect on the power.
  • Calculations

30
Notes
Notes
Write a Comment
User Comments (0)
About PowerShow.com