Logistic Regression presentation

About This Presentation

Transcript and Presenter's Notes

Title: Logistic Regression

1
Logistic Regression

Rong Jin

2
Logistic Regression Model

In Gaussian generative model
Generalize the ratio to a linear model
Parameters w and c

3
Logistic Regression Model

In Gaussian generative model
Generalize the ratio to a linear model
Parameters w and c

4
Logistic Regression Model

The log-ratio of positive class to negative class
Results

5
Logistic Regression Model

The log-ratio of positive class to negative class
Results

6
Logistic Regression Model

Assume the inputs and outputs are related in the
log linear function
Estimate weights MLE approach

7
Example 1 Heart Disease
1 25-29 2 30-34 3 35-39 4 40-44 5 45-49 6
50-54 7 55-59 8 60-64

Input feature x age group id
output y having heart disease or not
1 having heart disease
-1 no heart disease

8
Example 1 Heart Disease

Logistic regression model
Learning w and c MLE approach
Numerical optimization w 0.58, c -3.34

9
Example 1 Heart Disease

W 0.58
An old person is more likely to have heart
disease
C -3.34
xwc lt 0 ? p(x) lt p(-x)
xwc gt 0 ? p(x) gt p(-x)
xwc 0 ? decision boundary
x 5.78 ? 53 year old

10
Naïve Bayes Solution

Inaccurate fitting
Non Gaussian distribution
i 5.59
Close to the estimation by logistic regression
Even though naïve Bayes does not fit input
patterns well, it still works fine for the
decision boundary

11
Problems with Using Histogram Data?
12
Uneven Sampling for Different Ages
13
Solution
w 0.63, c -3.56 ? i 5.65
14
Solution
w 0.63, c -3.56 ? i 5.65 lt 5.78
15
Solution
w 0.63, c -3.56 ? i 5.65 lt 5.78
16
Example Text Classification

Learn to classify text into predefined categories
Input x a document
Represented by a vector of words
Example (president, 10), (bush, 2), (election,
5),
Output y if the document is politics or not
1 for political document, -1 for not political
document
Training data

17
Example 2 Text Classification

Logistic regression model
Every term ti is assigned with a weight wi
Learning parameters MLE approach
Need numerical solutions

18
Example 2 Text Classification

Logistic regression model
Every term ti is assigned with a weight wi
Learning parameters MLE approach
Need numerical solutions

19
Example 2 Text Classification

Weight wi
wi gt 0 term ti is a positive evidence
wi lt 0 term ti is a negative evidence
wi 0 term ti is irrelevant to the category of
documents
The larger the wi , the more important ti term
is determining whether the document is
interesting.
Threshold c

20
Example 2 Text Classification

Weight wi
wi gt 0 term ti is a positive evidence
wi lt 0 term ti is a negative evidence
wi 0 term ti is irrelevant to the category of
documents
The larger the wi , the more important ti term
is determining whether the document is
interesting.
Threshold c

21
Example 2 Text Classification

Dataset Reuter-21578
Classification accuracy
Naïve Bayes 77
Logistic regression 88

22
Why Logistic Regression Works better for Text
Classification?

Optimal linear decision boundary
Generative model
Weight logp(w) - logp(w-)
Sub-optimal weights
Independence assumption
Naive Bayes assumes that each word is generated
independently
Logistic regression is able to take into account
of the correlation of words

23
Discriminative Model

Logistic regression model is a discriminative
model
Models the conditional probability p(yx), i.e.,
the decision boundary
Gaussian generative model
Models p(xy), i.e., input patterns of different
classes

24
Comparison

Generative Model
Model P(xy)
Model the input patterns
Usually fast converge
Cheap computation
Robust to noise data
But
Usually performs worse

Discriminative Model
Model P(yx) directly
Model the decision boundary
Usually good performance
But
Slow convergence
Expensive computation
Sensitive to noise data

25
Comparison

Generative Model
Model P(xy)
Model the input patterns
Usually fast converge
Cheap computation
Robust to noise data
But
Usually performs worse

Discriminative Model
Model P(yx) directly
Model the decision boundary
Usually good performance
But
Slow convergence
Expensive computation
Sensitive to noise data

26
A Few Words about Optimization

Convex objective function
Solution could be non-unique

27
Problems with Logistic Regression?
How about words that only appears in one class?
28
Overfitting Problem with Logistic Regression

Consider word t that only appears in one document
d, and d is a positive document. Let w be its
associated weight
Consider the derivative of l(Dtrain) with respect
to w
w will be infinite !

29
Overfitting Problem with Logistic Regression

Consider word t that only appears in one document
d, and d is a positive document. Let w be its
associated weight
Consider the derivative of l(Dtrain) with respect
to w
w will be infinite !

30
Example of Overfitting for LogRes
Decrease in the classification accuracy of test
data
Iteration
31
Solution Regularization

Regularized log-likelihood
sw2 is called the regularizer
Favors small weights
Prevents weights from becoming too large

32
The Rare Word Problem

Consider word t that only appears in one document
d, and d is a positive document. Let w be its
associated weight

33
The Rare Word Problem

Consider the derivative of l(Dtrain) with respect
to w
When s is small, the derivative is still positive
But, it becomes negative when w is large

34
The Rare Word Problem

Consider the derivative of l(Dtrain) with respect
to w
When w is small, the derivative is still positive
But, it becomes negative when w is large

35
Regularized Logistic Regression
36
Interpretation of Regularizer

Many interpretation of regularizer
Bayesian stat. model prior
Statistical learning minimize the generalized
error
Robust optimization min-max solution

37
Regularizer Robust Optimization

assume each data point is unknown-but-bounded in
a sphere of radius s and center xi
find the classifier w that is able to classify
the unknown-but-bounded data point with high
classification confidence

38
Sparse Solution

What does the solution of regularized logistic
regression look like ?
A sparse solution
Most weights are small and close to zero

39
Sparse Solution

What does the solution of regularized logistic
regression look like ?
A sparse solution
Most weights are small and close to zero

40
Why do We Need Sparse Solution?

Two types of solutions
Many non-zero weights but many of them are small
Only a small number of non-zero weights, and many
of them are large
Occams Razor the simpler the better
A simpler model that fits data unlikely to be
coincidence
A complicated model that fit data might be
coincidence
Smaller number of non-zero weights
? less amount of evidence to consider
? simpler model
? case 2 is preferred

41
Occams Razer
42
Occams Razer Power 1
43
Occams Razer Power 3
44
Occams Razor Power 10
45
Finding Optimal Solutions

Concave objective function
No local maximum
Many standard optimization algorithms work

46
Gradient Ascent

Maximize the log-likelihood by iteratively
adjusting the parameters in small increments
In each iteration, we adjust w in the direction
that increases the log-likelihood (toward the
gradient)

47
Graphical Illustration
No regularization case
48
Gradient Ascent

Maximize the log-likelihood by iteratively
adjusting the parameters in small increments
In each iteration, we adjust w in the direction
that increases the log-likelihood (toward the
gradient)

49
When should Stop?

Log-likelihood will monotonically increase during
the gradient ascent iterations
When should we stop?

50
(No Transcript)
51
When should Stop?

The gradient ascent learning method converges
when there is no incentive to move the parameters
in any particular direction

Write a Comment

User Comments (0)

About PowerShow.com

Logistic Regression PowerPoint PPT Presentation