CS521 Software Engineering Hypothesis Testing - PowerPoint PPT Presentation

1 / 54
About This Presentation
Title:

CS521 Software Engineering Hypothesis Testing

Description:

The outcome of an experiment need not be a number, for example, the outcome when ... as the research hypothesis, or the 'hunch' that the investigator wants to test. ... – PowerPoint PPT presentation

Number of Views:36
Avg rating:3.0/5.0
Slides: 55
Provided by: sky89
Category:

less

Transcript and Presenter's Notes

Title: CS521 Software Engineering Hypothesis Testing


1
CS521 Software Engineering Hypothesis Testing
2
Random Variable
  • The outcome of an experiment need not be a
    number, for example, the outcome when a coin is
    tossed can be 'heads' or 'tails'. However, we
    often want to represent outcomes as numbers.
  • A random variable is a function that associates a
    unique numerical value with every outcome of an
    experiment. The value of the random variable will
    vary from trial to trial as the experiment is
    repeated.
  • There are two types of random variable - discrete
    and continuous.

3
Discrete Random Variables
  • A discrete random variable is one which may take
    on only a countable number of distinct values
    such as 0, 1, 2, 3, 4, ... Discrete random
    variables are usually (but not necessarily)
    counts.
  • If a random variable can take only a finite
    number of distinct values, then it must be
    discrete.
  • Example number of defective light bulbs in a
    box of ten.

4
Continuous Random Variable
  • A continuous random variable is one which takes
    an infinite number of possible values. Continuous
    random variables are usually measurements.
    Examples include height, weight, the amount of
    sugar in an orange, the time required to run a
    mile.

5
Expected Value
  • The expected value (or population mean) of a
    random variable indicates its average or central
    value. It is a useful summary value (a number) of
    the variable's distribution.
  • Stating the expected value gives a general
    impression of the behavior of some random
    variable without giving full details of its
    probability distribution (if it is discrete) or
    its probability density function (if it is
    continuous).
  • Two random variables with the same expected value
    can have very different distributions. There are
    other useful descriptive measures which affect
    the shape of the distribution, for example
    variance.

6
Expected Value
  • The expected value of a random variable X is
    symbolised by E(X) or µ.
  • If X is a discrete random variable with possible
    values x1, x2, x3, ..., xn, and p(xi) denotes P(X
    xi), then the expected value of X is defined
    by

7
Expected Value
  • If X is a continuous random variable with
    probability density function f(x), then the
    expected value of X is defined by

8
Variance
  • The (population) variance of a random variable is
    a non-negative number which gives an idea of how
    widely spread the values of the random variable
    are likely to be the larger the variance, the
    more scattered the observations on average.

9
Probability Distribution
  • The probability distribution of a discrete random
    variable is a list of probabilities associated
    with each of its possible values. It is also
    sometimes called the probability function or the
    probability mass function.
  • More formally, the probability distribution of a
    discrete random variable X is a function which
    gives the probability p(xi) that the random
    variable equals xi, for each value xi
  • p(xi) P(Xxi)

10
An example
  • Suppose that we want to compare the crime rate in
    Portland with the crime rate in the rest of the
    country.
  • Is there more or less crime in Portland than the
    national average?

11
An example
  • First, we start with the hypothesis that the
    crime rate on average in Portland is the same as
    the national average.
  • To test our hypothesis, we ask what sample means
    would occur if many samples of the same size were
    drawn at random from our population if our
    hypothesis is true.

12
An example
  • We can now refer to the sampling distribution of
    the mean, drawn from a population whose mean is
    the same as the national average, and we compare
    our sample mean with those in this sampling
    distribution.
  • If our hypothesis is true, then the distribution
    of sample means will be centered about the
    national average.

13
An example
  • Suppose that the relationship between our sample
    mean and those of the sampling distribution of
    the mean looks like this

Our hypothesized value.
Our obtained value.
14
An example
  • If so, our sample mean is one that could
    reasonably occur if the hypothesis is true, and
    we will retain our hypothesis as one that could
    be true. (The crime rate of Portland is the same
    as the national average.)

15
An example
  • On the other hand, if the relationship between
    our sample mean and those of the sampling
    distribution of the mean looks like this

16
An example
  • Our sample mean is so deviant that it would be
    quite unusual to obtain such a value when our
    hypothesis is true. In this case, we would
    reject our hypothesis and conclude that it is
    more likely that the crime rate of Portland is
    not the same as the national average.
  • The population represented by the sample differs
    significantly from the comparison population.

17
Null Hypothesis
  • The hypothesis that we put to the test is called
    the null hypothesis, symbolized H0.
  • The null hypothesis usually states the situation
    in which there is no difference (the difference
    is null) between populations.

18
Alternative Hypothesis
  • The alternative hypothesis, symbolized HA, is the
    opposite of the null hypothesis.
  • The alternative hypothesis is also identified as
    the research hypothesis, or the hunch that the
    investigator wants to test.

19
Null and Alternative Hypotheses
  • Both H0 and HA are statements about population
    parameters, not sample statistics.
  • A decision to retain the null hypothesis implies
    a lack of support for the alternative hypothesis.
  • A decision to reject the null hypothesis implies
    support for the alternative hypothesis.

20
When do we retain and when do we reject the null
hypothesis?
  • When we draw a random sample from a population,
    our obtained value of the sample mean will almost
    never exactly equal the mean of our population.
  • The decision to reject or retain the null
    hypothesis depends on the selected criterion for
    distinguishing between those sample means that
    would be common and those that would be rare if
    H0 was true.

21
When do we retain and when do we reject the null
hypothesis?
  • If the sample mean is so different from what is
    expected when H0 is true that its appearance
    would be unlikely, H0 should be rejected.
  • But what degree of rarity of occurrence is so
    great that it seems better to reject the null
    hypothesis than to retain it?

22
When do we retain and when do we reject the null
hypothesis?
  • This decision is somewhat arbitrary, but common
    research practice is to reject H0 if the sample
    mean is so deviant that its probability of
    occurrence in random sampling is .05 or less.
  • Such a criterion is called the level of
    significance, symbolized ?.

23
Rejection Regions
  • For our purposes, we will adopt the .05 level of
    significance.
  • Therefore, we will reject H0 only if our obtained
    sample mean is so deviant that it falls in the
    upper 2.5 or lower 2.5 of all the possible
    sample means that would occur when H0 is true.
  • The portions of the sampling distribution that
    include the values of the mean that lead to
    rejection of the null hypothesis are called
    rejection regions.
  • If our sample mean falls in the middle 95 of the
    distribution of all possible values of the mean
    that could occur when H0 is true, we will retain
    the null hypothesis.

24
What sample means would occur if H0 is true?
  • If it is true, the sampling distribution of the
    mean would center on the hypothesized population
    mean.
  • If we assume that the sampling distribution of
    the mean approximates a normal curve (and we can,
    if our sample size satisfies the central limit
    theorem)

25
Critical Values
  • We can use the normal curve table to calculate
    the Z values, called critical values, that
    separate the upper 2.5 and lower 2.5 of sample
    means from the remainder.

26
An example
  • Suppose our obtained sample mean of the crime
    rate in Portland is a score of 90.
  • Suppose that the national average is known to be
    85, with a standard deviation of 20.
  • Even if the population mean really is a score of
    85, because of random sampling variation we do
    not expect the mean of a sample randomly drawn
    from a population to be exactly 85 (although it
    could be).

27
Using the Sampling Distribution of the Mean to
Determine Probability
  • The important question is what is the relative
    position of the obtained sample mean among all
    those that could have been obtained if the
    hypothesis is true?
  • To determine the position of the obtained sample
    mean, it must be expressed as a Z score.

28
Z score
  • In hypothesis testing, you are finding a Z score
    of your samples mean on a distribution of means.

29
Z Score Formulas
  • The method of changing the samples mean to a Z
    score.

30
An example
  • In our study,

31
An example
  • Our sample mean is 2.5 standard errors of the
    mean greater than expected if the null hypothesis
    were true.
  • The value of 2.5 falls in the rejection region,
    so we reject H0 and retain HA.
  • We can conclude that the mean of the population
    from which the sample came from is not 85.

32
An example
  • The crime rate of Portland is, on average,
    different from (greater than) other cities of the
    country.
  • Notice that the conclusion is about the
    population represented by the sample under study
    and not simply the particular sample itself.

33
What if we had used ? .01?
  • Our sample mean, and our Z value would still be
    the same, but the critical values of Z that
    separate the regions of rejection would be
    different, ? 2.58.
  • This is a more conservative value (it is harder
    to reject the null hypothesis).
  • Your decision depends on your criterion.

Using an alpha level of .01, you would fail to
reject the null hypothesis.
34
If we retain H0, what can we conclude?
  • The decision to retain H0 does not mean that it
    is likely that H0 is true.
  • Rather, this decision reflects the fact that we
    do not have sufficient evidence to reject the
    null hypothesis.
  • Certain other hypotheses would also have been
    retained if tested in the same way.

35
If we retain H0, what can we conclude?
  • Consider our example where the hypothesized
    population mean is 85.
  • If we had obtained a sample mean of 86, the null
    hypothesis would have been retained.
  • But suppose the hypothesized population mean was
    87.
  • If we had obtained a sample mean of 86, the null
    hypothesis would also have been retained.

36
Strength of Decision
  • Rejecting the null hypothesis means that H0 is
    probably false, a strong decision.
  • Retaining the null hypothesis is a weak decision.

37
Two-tailed Test
  • The alternative hypothesis states that the
    population parameter may be either less than or
    greater than the value stated in H0.
  • The critical region is divided between both tails
    of the sampling distribution.

38
Two-tailed Test
  • This type of test is desirable in most research
    situations.
  • For example, in most cases in which the
    performance of a group is compared to a known
    standard, it would be of interest to discover
    that the group is superior or inferior.

39
One-tailed Test
  • The alternative hypothesis states that the
    population parameter differs from the value
    stated in H0 in one particular direction.
  • The critical region is located only in one tail
    of the sampling distribution.

40
One-tailed Test
  • Upper-tail Critical
  • Lower-tail Critical

41
One-tailed Test
  • The advantage of a one-tailed test is that it is
    more sensitive to detecting a false hypothesis in
    the direction of concern than a two-tailed test.
  • The major disadvantage of a one-tailed test is
    that it precludes any chance of discovering that
    reality is just the opposite of what the
    alternative hypothesis says.

42
Steps of the Hypothesis Test
  • State the research question.
  • State the statistical hypothesis.
  • Set decision rule.
  • Calculate the test statistic.
  • Decide if result is significant.
  • Interpret result as it relates to your research
    question.

43
An example
  • Robins and John (1997) carried out a study on
    narcissism (self-love), comparing people who had
    scored high versus low on a narcissism
    questionnaire. (An example item was If I ruled
    the world it would be a better place.) They
    also had other questionnaires, including one that
    had an item about how many times the participant
    looked in the mirror on a typical day. They
    hypothesize that people who scored high on the
    narcissism scale look in the mirror significantly
    more often than people who did not score high on
    the scale. Based on previous research, it is
    known that, on average, a person looks in the
    mirror 4.8 times per day, with a standard
    deviation of 2.6. Taking a sample of 25
    narcissistic individuals, they find a mean of 6.3
    visits to the mirror per day. Using the .05
    level of significance, and assuming the
    distribution approximates a normal curve, what
    should the researchers conclude?

44
An example
  • State the research question
  • Do individuals, who score high on a narcissistic
    scale, look at themselves in the mirror
    significantly more often than individuals who are
    not narcissistic?
  • State the statistical hypothesis

45
Statistical Hypotheses
  • Two-tailed Test
  • One-Tailed Test
  • Lower-tailed
  • Upper-tailed

46
An example
  • Set decision rule

47
An example
  • Calculate the test statistic

48
An example
  • Decide if results are significant
  • Reject H0, 2.88 gt 1.65.
  • Interpret results as it relates to the
    statistical hypothesis
  • Narcissistic individuals look in the mirror
    significantly more often than individuals who are
    not narcissistic.

49
Another example
  • A psychologist is working with people who have
    had a particular type of major surgery. The
    psychologist proposes that people will recover
    from the operation more quickly if friends and
    family are in the room with them for the first 48
    hours after the operation (based on several other
    studies on social support), but acknowledges that
    the presence of friends and family may also slow
    recovery time, due to the added activity and
    possible stress associated with visitors. It is
    known that time to recover from this kind of
    surgery is normally distributed with a mean of 12
    days and a standard deviation of 5 days. The
    procedure of having friends and family in the
    room for the period after the surgery is done
    with 9 randomly selected patients. The patients
    recover in an average of 8 days. Using the .01
    level of significance, what should the researcher
    conclude?

50
Another example
  • State the research hypothesis
  • State the statistical hypothesis
  • Set decision rule
  • Calculate the test statistic
  • Decide if results are significant
  • Interpret results as it relates to the
    statistical hypothesis
  • Do patients who have friends and family with them
    following surgery recover more or less quickly
    than people who do not?
  • Retain H0, -2.40 gt -2.58
  • Patients who have friends and family with them
    following surgery do not recover significantly
    faster, or slower, than patients who do not have
    social support.

51
ASSIGNMENT
  • Due 2 week
  • Select a Software Engineering Research Paper
    (Journal)
  • The paper must include an experiment
  • Create a write up and presentation that outlines
    and analyzes the experiment and results.
  • Cover the following topics (Next Slide)

52
ASSIGNMENT
  • Definition This is where the study is defined
    in terms of problem objectives and goals. The
    following questions need to be asked
  • What is the object of the study?
  • What is the purpose of the study?
  • Which effect is studied (quality focus)?
  • From whose perspective are you viewing the study?
  • What is the context of the study (e.g. where is
    it conducted)? The context defines which
    personnel are involved in the experiment and
    which objects will be studied.

53
ASSIGNMENT
  • Planning This is where the details of the
    experiment are defined.
  • Is the experiment off-line or on-line, student or
    industry, toy or real, specific or general?
  • What is the hypothesis and null hypothesis?
  • What are your variables both independent and
    dependent?
  • How did you select your subjects, simple random
    sampling, systematic sampling, stratified random
    sampling, convenience sampling, quota sampling?
  • Does your design use randomization for subjects
    and objects, or did you employ blocking or
    balancing?
  • How many factors and how many treatments did you
    have? For example if you were investigating new
    design methods for producing quality software,
    the factor would be the design method and the
    treatments are the new and old designs. How you
    choose these factors and treatments will define
    what statistical analysis can be applied.
  • What instrumentation was used?
  • What is the validity evaluation? There are
    typically four threats to validity conclusion,
    internal, construct, and external Cook79.
  • Conclusion Validity Is there a significant
    statistical relationship between treatment and
    outcome?
  • Internal Validity Does the treatment actually
    cause the outcome?
  • Construct Validity Relationship between theory
    and observation. Does the treatment reflect the
    construct of the cause and does the outcome
    reflect the construct of the effect?
  • External Validity Here we are concerned with
    generalization of the study. How does it apply
    outside the scope of our study?

54
ASSIGNMENT
  • Operation How was the experiment carried out?
  • How did you obtain participants?
  • Did you obtain consent, did you protect sensitive
    results, did you offer inducements, and was there
    any deception?
  • Did you need to prepare the instrumentation in
    any way?
  • Explain the execution of the experiment? How did
    you collect data, what was the environment like?
    What scale did you use?
  • Did you validate the data and did it seem
    reasonable?
  • Analysis and Interpretation In order to draw
    valid conclusions you must analyze and interpret
    the data.
  • How did you numerically process and present the
    data after the experiment? Did you measure
    central tendency, dispersion, and dependency and
    how did you display your results?
  • Did you apply data set reduction and why?
  • What type of hypothesis testing did you use?
  • What were the results?
Write a Comment
User Comments (0)
About PowerShow.com