Chapter 5 Statistical Methods - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Chapter 5 Statistical Methods

Description:

Descriptive statistics V.S. Statistical inference. Population, Sample, Data set ... in the extent of support for abortion between the male and the female population? ... – PowerPoint PPT presentation

Number of Views:433
Avg rating:3.0/5.0
Slides: 48
Provided by: MISn9
Category:

less

Transcript and Presenter's Notes

Title: Chapter 5 Statistical Methods


1
Chapter 5 Statistical Methods
  • ??????? ??
  • ?????????????

2
Outline
  • 5.1 STATISTICAL INFERENCE
  • 5.2 ASSESSING DIFFERENCES IN DATA SETS
  • 5.3 BAYESIAN INFERENCE
  • 5.4 PREDICTIVE REGRESSION
  • 5.5 ANALYSIS OF VARIANCE
  • 5.6 LOGISTIC REGRESSION
  • 5.7 LOG-LINEAR MODELS
  • 5.8 LINEAR DISCRIMINANT ANALYSIS

3
5.1 STATISTICAL INFERENCE
  • Descriptive statistics V.S
  • Statistical inference
  • Population, Sample, Data set
  • Parameter V.S Statistic
  • Inference methods estimation, and tests of
    hypotheses

4
5.1 STATISTICAL INFERENCE (cont.)
  • Estimation The goal is to gain information from
    a data set T in order to estimate one or more
    parameters w belonging to the model of the
    real-world system f(X, w)

5
5.1 STATISTICAL INFERENCE (cont.)
  • statistical testing to decide whether a
    hypothesis concerning the value of the population
    characteristic should be accepted or rejected
  • null hypothesis V.S alternative hypothesis

6
5.2 ASSESSING DIFFERENCES IN DATA SETS
  • central tendency
  • 1
  • 2
  • 3

7
5.2 ASSESSING DIFFERENCES IN DATA SETS (cont.)
  • data dispersion
  • 1
  • 2
  • p.95 ??

8
5.2 ASSESSING DIFFERENCES IN DATA SETS (cont.)
  • Boxplot
  • In many statistical software tools, a popularly
    used visualization tool of descriptive
    statistical measures for central tendency and
    dispersion

9
5.3 BAYESIAN INFERENCE
  • Naïve Bayesian Classification Process (Simple
    Bayesian Classified)
  • ?????????,?????????????????????
  • ???????(?????)
  • P(H/X)????
  • P(H)????

10
5.3 BAYESIAN INFERENCE (cont.)
  • Given an additional data sample X (its class is
    unknown), it is possible to predict the class for
    X using the highest conditional probability
    P(Ci/X)
  • P(X)constant for all classes,only the product
    P(X/Ci) P(Ci) needs to be maximized
  • P(Ci)Ci/m (m is total number of training
    samples)

11
5.3 BAYESIAN INFERENCE -example
Table 5.1 Training data set for a classification
using Naïve Bayesian Classifier
12
5.3 BAYESIAN INFERENCE example (cont.)
  • Goalto predict classification of the new sample
    X 1, 2, 2, class ?
  • maximize the product P(X/Ci) P(Ci) for i 1,2
  • Step1compute prior probabilities P(Ci)

13
5.3 BAYESIAN INFERENCE example (cont.)
  • Step2compute conditional probabilities P(xt/Ci)
    for every attribute value given in the new sample
    X 1, 2, 2, C ?

14
5.3 BAYESIAN INFERENCE example (cont.)
  • Step3Under the assumption of conditional
    independence of attributes, compute conditional
    probabilities P(X/Ci)

15
5.3 BAYESIAN INFERENCE example (cont.)
  • Finally multiplying these conditional
    probabilities with corresponding priori
    probabilities
  • obtain values proportional (?) to P(Ci/X) and
    find the maximum

16
5.4 PREDICTIVE REGRESSION
  • The prediction of continuous values can be
    modeled by a statistical technique called
    regression.
  • Regression analysis is the process of determining
    how a variable Y is related to one or more other
    variables X1, X2, , Xn.

17
  • Modeling this type of relationship is often
    called linear regression.
  • The relationship that fits a set of data is
    characterized by a prediction model called a
    regression equation. The most widely used form of
    the regression model is the general linear model
    formally written as
  • Yaß1X1ß2X2 ß3X3 ßnXn

18
Simple regression
  • Simple regressionY a ßX
  • SSE
  • a and ß

19
Multiple regression
  • Multiple regression
  • Y a ß1X1 ß2X2 ß3X3 ßnXn
  • SSE (Y - ß.X) . (Y - ß.X)
  • d(SSE)/dß0
  • ?ß(X.X)-1(X.Y)

20
correlation coefficient
21
correlation coefficient (cont.)
  • A correlation coefficient r 0.85 indicates a
    good linear relationship between two variables.
    Additional interpretation is possible. Because r2
    0.72, we can say that approximately 72 of the
    variations in the values of Y is accounted for by
    a linear relationship with X.

22
5.5 ANALYSIS OF VARIANCE
  • Often the problem of analyzing the quality of the
    estimated regression line and the influence of
    the independent variables on the final regression
    is handled through an analysis-of-variance
    approach.

23
  • The size of the residuals, for all m samples in a
    data set, is related to the size of variance s2
    and it can be estimated by
  • The numerator is called the residual sum while
    the denominator is called the residual degree of
    freedom.

24
  • The criteria are basic decision steps in the
    ANOVA algorithm in which we analyze the influence
    of input variables on a final model.
  • First, we start with all inputs and compute S2
    for this model. Then, we omit inputs from the
    model one by one.
  • F S2new / S2old

25
(No Transcript)
26
Multivariate analysis of variance
  • Multivariate analysis of variance is a
    generalization of the previously explained ANOVA
    analysis.
  • Yjaß1X1jß2X2j ß3X3jßnXnjej
  • J1,2,,m
  • The corresponding residuals for each dimension
    will be (Yj - Yj').

27
  • Classical multivariate analysis also includes the
    method of principal component analysis. This
    method has been explained in Chapter 3 when we
    were talking about data reduction and data
    transformation as preprocessing phases for data
    mining.

28
5.6 Logistic Regression
  • The probability of some event occurring as a
    linear function of a set of predictor variables.
  • Try to estimate the probability p that the
    dependent variable will have a given value.
  • The output variable of the model is defined as a
    binary categorical.
  • p( y 1 ) p, p( y 0 ) 1 - p

29
Logistic Regression(cont.)
The logit form of output is to prevent the
predicting probability pj from going out of range
0,1.
Success probability
failure probability
30
Logistic Regression(ex.)
  • logit(p) 1.5 - 0.6x1 0.4x2 0.3x3
  • Input value x1,x2,x3 1,0,1

?????
31
Logistic Regression(ans.)
  • logit(p) 1.5-0.610.40-0.31
  • 0.6
  • ln(p/(1-p)) 0.6
  • p e0.6/(1 e0.6) 0.65
  • y1, p 0.65
  • y0, 1 p 0.35

32
5.7 Log-Linear Models
  • Log-linear modeling is a way of analyzing the
    relationship between categorical.
  • The log-linear model approximates discrete,
    multidimensional probability distribution.
  • All given variables are categorical.
  • A date set is defined without output variables.

33
Log-Linear Models(cont.)
  • The aim in log-linear modeling is to identify
    associations between categorical variables.
  • A problem of finding out which of all ßs are 0
    in the model.
  • Correspondence analysis

34
Correspondence analysis
  • Correspondence analysis represents the set of
    categorical data for analysis within incident
    matrices, also called contingency table(???).
  • The result of an analysis of the contingency
    table answers the question
  • ?Is there a relationship between analyzed
  • attributes or not?

35
Algorithm
  • Transform a given contingency table into a table
    with expected values.
  • Compare these two matrices using the chi-square
    test as criteria of association for two
    categorical variables.

36
Algorithm(cont.)
  • d.f.(degree of freedom)
  • (m-1)(n-1)
  • T(a) ?2 (d.f., a)
  • if?2 T(a), H0 is rejected
  • Otherwise, H0 is accepted

37
Log-Linear Models(ex.)
  • Are there any differences in the extent of
    support for abortion between the male and the
    female population?

38
Log-Linear Models(ans.)
  • This question may be translated into
  • ?Whats the level of dependency between the two
    given attributessex and support?
  • step1

H0sex?support?? H1sex?support??
E11500628/1100 285.5
39
Log-Linear Models(ans.)
  • Step2
  • ?2
  • (309-285.5)2/285.5 (191-214.5)2/214.5
  • (319-342.5)2/342.5 (281-257.5)2/257.5
  • 8.2816

40
Log-Linear Models(ans.)
  • d.f. (2-1)(2-1) 1
  • T(0.05) ?2 (0.05,1) 3.84
  • ?2 8.2816 3.84 ? H0 is rejected

41
5.8 Linear Discriminant Analysis
  • LDA is concerned with classification problems
    where the dependent variable is categorical and
    the independent variables are metric.
  • The objective of LDA is to construct a
    discriminant function that yields different
    scores when computed with data from different
    output classes.

42
Discriminant Function
zdiscriminant score, xindependent variable,
wweight
  • The discriminant function z is used to predict
    the class of a new nonclassified sample.

43
Cutting score(???)
  • Cutting score serve as the criteria against which
    each individual discriminant score is judged.
  • The choice of cutting scores depend upon a
    distribution of sample in classes.

44
Cutting score(cont.)
  • When the two classes of sample are of equal size
    and are distribution with uniform variance.
  • zcut-ab ( za zb ) / 2
  • zamean discriminant scores of class A
  • zbmean discriminant scores of class B
  • A new sample will be classified to one or another
    class depending on its score
  • z gt zcut-ab or z lt zcut-ab .

45
Cutting score(cont.)
  • A weighted average of mean discriminant scores is
    used as an optimal cutting score when the set of
    sample for each of the classes are not equal
    size.
  • zcut-ab(nazanbzb) / (nanb)
  • zamean discriminant scores of class A
  • zbmean discriminant scores of class B
  • nanumber of sample of class A
  • nbnumber of sample of class B

46
Multiple Discriminant Analysis
  • Multiple discriminant analysis is used in
    situations when separate discriminant function
    are constructed for each class.
  • Decide in favor of the class whose discriminant
    score is the hightest.

47
QA
Write a Comment
User Comments (0)
About PowerShow.com