Predictive Modeling: Value Prediction - PowerPoint PPT Presentation

1 / 33
About This Presentation
Title:

Predictive Modeling: Value Prediction

Description:

The main traditional technique used for value prediction is Linear Regression ... Yg=154.2. S xy= 616.32. S x^2= 191.68. S y^2= 2659.68 ... – PowerPoint PPT presentation

Number of Views:83
Avg rating:3.0/5.0
Slides: 34
Provided by: ahmed8
Category:

less

Transcript and Presenter's Notes

Title: Predictive Modeling: Value Prediction


1
Predictive Modeling Value Prediction
  • The main traditional technique used for value
    prediction is Linear Regression which attempts to
    fit a straight line through a plot of the data,
    such that the line is the best representation of
    the average of all observations at that point in
    the plot.

2
Disadvantages of Linear Regression
  • 1. This technique works fine if the data is
    linear.

True Regression line
Predicted Regression line
3
  • 2. The outcome can be influenced by just a few
    outliers.

Predicted Regression line
True Regression line
4
Regression
  • The relationship between the mean of a random
    variable and the values of one or more
    independent variables on which it depends.
  • Example we might want to predict the sales of a
    new product in terms of the amount of money spent
    advertising it on TV.

5
  • Example we might want to predict family
    expenditures on entertainment in terms of family
    income.
  • Example we might want to predict a college
    students grade-point average based on the number
    of hours he/she spent studying.

6
  • Example we can predict the average earnings of
    college graduates ten years after graduation.

7
Curve fitting
  • Non-linear regression, solves these two problems
    of linear regression but is still not flexible
    enough to handle all possible shapes of the data
    plot.
  • We shall consider only linear equation in two
    unknowns y a bx, because it is.....

Slope of the line
y-intercept
8
  • Because it is useful and important, not only
    because many relationships are actually of this
    form, but also because they often provide close
    approximations to the relationships which would
    otherwise be difficult to describe in
    mathematical forms.
  • The values of a and b are estimated from the
    data.

9
Least Square Method
  • Most experimental data will not lie exactly on a
    straight line (even when it should.) However,
    there is a mathematical method for determining
    the equation for the best-fit straight line. This
    method is called the Least-Square method or
    linear regression.
  • Using the Least-Square method y is a linear
    function of x
  •   i.e. y f (x)
  • or more specifically y bx a

10
  • The easiest way to implement this method by hand
    is to set up a table with columns of data and
    columns of the product xiyi and xi2 (where
    i1,2,.., N where N is the number of data
    points!). The columns can then be totaled to the
    summations used in the above formulas.

11
  • Example This sample of data obtained in a study
    of the relationship between the number of years
    that applicants for certain foreign service jobs
    have studied English in high school or college
    and the grades which they received in a
    proficiency test in that language

12
  • No. of years (x) Grade in Test (y) x2 x.y
  • 3 57 9 171
  • 4 78 16 312
  • 4 72 16 288
  • 2 58 4 116
  • 5 89 25 445
  • 3 63 9 189
  • 4 73 16 292
  • 5 84 25 420
  • 3 75 9 225
  • 2 48 4 96
  • 35 697 133 2,554

13
  • The goal is to find one line which fits the data.
  • Normal Equations
  • ?y na b?x n no. of pairs
  • ?xy a?x b ?x2
  • 697 10 a 35 b
  • 2,554 35 a 133 b

14
  • Two methods to find a and b
  • (1)
  • 1st 7 ? 4,879 70 a 245 b
  • 2nd 2 ? 5,108 70 a 266 b
  • ? (2nd 1st) 229 0 21 b
  • ? b 10.90 ? a 31.55
  • ? y 31.55 10.90 x

15
  • (2)
  • 1st 697 10 a 35 b
  • ? a (697 35b)/10
  • 2nd ? 2,554 35 (697 35b)/10 133 b
  • ? b 10.90 ? a 32.55

16
  • (3) Where the slope of the line is given by b
    (Dx over Dy or the rise over the run) and the
    intercept of the y axis is given as a. For n
    data pairs, the equations used to find the slope
    b and intercept a are
  •  
  • 1.) a ( Sy Sx2 - Sx Sxy ) / ( n Sx2
    (Sx)2 )
  •  
  • 2.) b ( n Sxy SySx ) / (n Sx2 (Sx) 2 )
  •  

17
Example eleven data points are recorded during a
test each pair consists of X and Y value. The
following table can be constructed and each
column total shown on the last row
18
  • Equation 1 gives
  •  
  • a (12578854 76012940) / (1178854 760760)
    0.0771 lbs
  •  
  • Equation 2 gives
  •  
  • b (1112940 760125) / (1178854 760760)
    0.1634 lbs/me
  •  
  • Therefore the best straight line fit is given by
    the equation
  •  
  • y 0.1634 x 0.0771

19
The linear regression plot would look like
20
(No Transcript)
21
(No Transcript)
22
  • The scatter diagram is obtained by plotting the
    points (70,155), 63,150).(68,152). By using a
    ruler you find several straight line which
    apparently suites the relation in question.
    Choosing any two point on the line just drawn you
    can account the slope of a fitting line.
  • Y Y1 (X X1)(Y2 -Y1)/(X2 -X1)
  • (170 - 156)/(68 66) 7
  • Y - 156 7 (X - 66)
  • Y 7X 306

23
  • If X63 then Y 763 - 306 135 provided that
    the line expresses the relation between height
    and weight among females in right way. We chose
    the best fitting line in the diagram, we hope.
  • As we shall see below this method is certainly
    not exactly. Instead of the point (66,156) we
    could have chosen (65,139) and got the result Y
    10.33X 316.

24
Example
  • Find the least square line to the following data
    (1,1), (3,2), (4,4), (6,4), (8,5), (9,7), (11,8),
    (14,9)
  • The equation of the line is Y a0 a1X. The
    normal equations are
  • S Y a0N a1S X
  • S XY a0S X a1S X2
  • The work involved in computing the sums can be
    arranged as in the following table. Although the
    last column is not needed for this part of the
    problem, it has been added to the table for use
    with X as a dependent variable which gives quite
    another result. The last called regression of X
    on Y.

25
(No Transcript)
26
  • Since there are 8 pairs of values of X and Y, N
    8 and the normal equations become
  • 8a0 56a1 40
  • 56a0 524a1 364
  • Solved simultaneously,
  • a0 6/11 or 0.545,
  • a1 7/11 or 0.636

27
  • Another method
  • a0 (S Y)(S X2) - (S X)(S XY)/NS X2 - (S
    X)2 (40524) - (56364)/(8524) - (56)2
    6/11 or 0.545
  • a1 NS XY - (S X)(S Y)/NS X2 - (S X)2
    (8364) - (5640)/(8524) (56)2
  • 7/11 or 0.636

28
Perhaps we should try to estimate the regression
line from the example with heigts and weights a
little more exactly
29
  • The required least square line has this equation
  • Y - 154.2 3.22(X 66.8) or
  • Y 3.22X - 60.9
  • Sometimes when the raw figures are large it is an
    advantage to subtract a large figure at least
    from one of, perhaps from both of the variables
    before accounting. Then you must remember to add
    the same figures to the averages you account to
    get the right result.

30
Assignment (Deadline 14th Sep 06)
  • The following area data on the IQs of 25
    students, the number of hours they studies for a
    certain achievement test, and their scores on the
    test

31
(No Transcript)
32
  • 1. Use the computer to find the least squares
    line which will enable us to predict a student
    score on the test in terms of his/her IQ. And
    draw the line.
  • 2. Use the computer to find the least squares
    line which will enable us to predict a students
    score on the test in terms of the numbers of
    hours he/she studied for the test. And draw the
    line.
  • 3. Use the computer to predict how many hours a
    student will study for the test given his/her IQ.
    Draw the line.

33
  • http//www.math.csusb.edu/faculty/stanton/m262/reg
    ress/regress.html
Write a Comment
User Comments (0)
About PowerShow.com