Getting Started with Regression - PowerPoint PPT Presentation

1 / 49
About This Presentation
Title:

Getting Started with Regression

Description:

James Galton created Regression Analysis in 1885 when he was attempting to ... Screen Porch. View. Location. Correlation Analysis. Notice the high. correlation between ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 50
Provided by: timwi5
Category:

less

Transcript and Presenter's Notes

Title: Getting Started with Regression


1
Getting Started with Regression
  • Presented By Larry Zirbel
  • Software Techniques, Inc.
  • Tim Wilmath, MAI
  • Hillsborough County Property Appraisers Office
  • Prepared For 69th Annual IAAO Conference
    Nashville, TN September 17, 2003

2
History of Regression
James Galton created Regression Analysis in 1885
when he was attempting to predict a persons
height based on the height of his or her parent.
3
History of Regression
Galton found that children born to tall parents
would be shorter than their parents - and
children born to short parents would be taller
than their parents. Both groups of children
regressed toward the mean height of all
children.
4
Uses of Regression
Predicting the Weather
5
Uses of Regression
Predicting Election Results
6
Uses of Regression
Predicting Sales Prices
7
What is Regression?
When Regression Analysis is used to predict sales
prices or establish assessments it becomes an
Automated Sales Comparison Approach
8
Steps in Regression
1. Data Exploration and cleanup
2. Specifying the model
3. Calibrating the model
4. Interpreting the results
9
Data Exploration Cleanup
Is there a pattern suggesting a relationship
between variables?
Note the outliers. These will adversely affect
our final values if we dont deal with them now
Because of the potential for extreme values to
influence the mean, modelers often remove or
trim extreme values.
10
Model Specification
Specifying the model means picking the
appropriate equation and which variables that
will be used.
Models can be
  • Additive - Most common for residential
    properties
  • Multiplicative- Often used for land valuation
  • Hybrid - Most advanced

We are going to use an Additive Model in this
presentation
11
Regression Components
  • Dependent Variable
  • Sales Price
  • Independent Variables
  • Size
  • Age
  • Location
  • Condition
  • Lot size
  • Construction
  • Quality
  • Amenities

12
Simple Regression
Simple Regression includes one Dependent Variable
(sales price) and only one Independent Variable
- such as Square Footage.
Using this model, a 1,000 sf home would be valued
at 75,000
13
Simple Regression
Simple Regression using only size as the
independent variable will predict sales prices,
however, it will treat all homes with the same
size equally.
1,000 square feet - 75,000?
1,000 square feet - 75,000
14
Multiple Regression
We know square footage is an important
variable but what other variables should we
include and how do we decide?
Roof Type
Swimming Pool
Exterior Wall Type
Heated Area
Effective Age
Quality
Heat/Ac Type
Lot Size
Actual Age
Screen Porch
Garage
Location
View
15
Correlation Analysis
Pearsons Correlation tells you the degree of
relationships between variables.
Notice the high correlation between sales price
and size
Very little correlation between sales price and
dock
Correlation Analysis also helps identify
Collinearity, which is a correlation between 2
independent variables. For example, the living
area of a home is highly correlated to the number
of bedrooms. It would only be necessary to have
one of these variables in the model.
16
Regression Equations
Ymxb
Y b0 b1 X1 b2 X2 . . . bK XK
17
Running Regression
Statistical Software makes using Regression much
easier, performing the necessary calculations
quickly and accurately.
Lets Run This!
18
Regression Results
Model 1
The Output tells us how good our model is working
The closer the Adj. R-Square is to 1 the
better
And - it gives us the coefficients (or
adjustments)
6,838 Bldsize x 75.07 Property Value
The adjusted R2 statistic measures the amount of
total variation explained by the Regression
Model. It ranges from 0.00 to 1.00 with 1.00
being the desired value. A high number, say
0.910 means that approximately 91 of the value
can be explained by the model.
19
Regression Results
The output includes the coefficient and the
Constant
The Constant represents the un-explained value
that is not included in the model.
20
Running Regression
Lets add another variable to the model - Say
Land Size
Lets run this model and see if results improve.
21
Regression Results
Model 2
Our Adj. R2 went up from .731 to .801!
We also have new coefficients (or adjustments)
6,119 Bldsize x 72.66 Landsf x 0.382
Property Value
22
Running Regression
Lets add Age to the model
If Age is significant to value, the model should
improve. Lets run it.
23
Regression Results
Model 3
Our Adj. R2 went up from .801 to .832!
Notice the age coefficient is negative
22,855 Bldsize x 67.28 Landsf x 0.44
Age x (630.76) Property Value
24
Running Regression
Lets add Building Quality to the model
We may have a problem. Lets run it and see.
25
Regression Results
Model 4
Our Adj. R2 went up from .832 to .854 after
adding quality, but
Notice the constant is now negative - thats not
good!
What do we do with this quality adjustment?
26
Regression Results
This doesnt make sense because the codes 1,2,3,
etc. were not meant to be a rank
Quality 1 - Fair 2 - Average 3 - Good 4 -
Excellent 5 - Superior
Resulting Adjustment
1 x 26,110 26,110
2 x 26,110 52,220
3 x 26,110 78,330
4 x 26,110 104,440
5 x 26,110 130,550
27
A Note about Data Types
There are 3 primary types of property
Characteristics
  • Continuous Based on a size or measurement.
  • Examples Square Footage or Lot Size
  • Discrete Specific pre-defined value.
  • Examples Roof Material, Building Quality
  • Binary Either the item is present or not
  • Examples corner location, Lakefront Location

28
Transformations
To solve the problem we need to convert the
discrete variable Quality into individual
binary variables which allows Regression to
distinguish each type
Fair - Yes/No Average -
Yes/No Good - Yes/No
Excellent - Yes/No Superior -
Yes/No
Quality
BECOMES
29
Running Regression
Now that we have transformed the variable
Quality we can put it back in the model
Notice we left Average out
30
Regression Results
Our Adj. R2 went up from .832 to .869.
Model 5
These Quality adjustments are all relative
to Average
31
Running Regression
Lets transform Neighborhood into a binary and
add it to the model
Notice we left out theBase Neighborhood (the
most typical)
32
Regression Results
Model 6
Our Adj. R2 went up from .869 to .874.
These Neighborhood adjustments are all relative
to our Base Neighborhood
33
Running Regression
Multiplicative Transformations combine two
variables into one Square Footage x Quality
SQFT1 Reflects the fact that quality may
contribute greater value in larger homes and less
value in smaller homes. In other words, without
combining these variables, all Good Quality homes
get the same adjustment regardless of their size.
Lets add this new combined variable to the model.
Since we combined SF and Quality, we remove them
as stand-alone variables
34
Regression Results
Our Adj. R2 went up from .874 to .879.
Model 7
Notice the adjustments went from fixed
dollar amounts to per square foot
35
Advanced Transformations
Exponential transformations - Raise variable to a
power Land Size x .75 LAND75 Reflects the
principle of diminishing returns. The unit price
of land tends to decrease as size increases.
Without this transformation land would get the
same adjustment, regardless of size. Raising
land size to the power of .75 reflects the curve
shown below.
36
Running Regression
Lets add our new transformed land variable to
the model
37
Regression Results
Our Adj. R2 went up from .879 to .881.
Model 8
38
Running Regression
Lets add garages, pools, and baths just to round
out our model.
39
Regression Results
Our Adj. R2 went up from .881 to .895.
Model 9
40
Regression Results
The Beta value in column 4 indicates the
partial correlation of the variable. It is used
in stepwise regression in deciding which
variable to add next.
41
Regression Results
The significance of each variable to the model
can be determined by looking at the t values.
Rule of Thumb t scores should be 2.0 or greater
NB211002 NB211003 NB211006 are insignificant
42
Regression Results
The t-statistic is calculated by dividing the
coefficient of a variable by its standard error.
For example for the variable BLDSIZE, the
t-statistic is calculated as follows 58.537 /
1.045 56.0
43
Regression Results
The Standard Error of the Estimate in the
regression model tells us how much a sale
estimate will vary from its actual value. This
number alone is meaningless unless related to the
average sales price in the sale sample. Dividing
the Standard Error by the Average SalesPrice
produces the Coefficient of Variation
(COV) 15,854 / 134,043 11.82 COV
44
Regression Options
Enter is the default regression method in most
statistical software programs. This method
includes all variables entered by the modeler.
Stepwise multiple regression automatically
eliminates redundant or insignificant variables.
Notice that Stepwise Regression kicked out the
neighborhoods that had low t-scores"
45
Creating New Assessments
Once you have calibrated your model, the
Regression software allows you to predict the
new values (or assessments) using the
coefficients (or adjustments) you created.
46
Reviewing Ratio Statistics
Once the new assessments are created using our
final model, we can review the accuracy of our
new values using traditional ratio statistics.
47
Valuing the Population
Valuing the population requires transforming the
same variables you used in the model, then
applying the coefficients to those variables.
This can be done internally within some CAMA
systems, using Microsoft Excel or other
spreadsheet software, or within the regression
software. Valuing the population is one of the
most difficult aspects of regression modeling
because changes in the physical attributes of
any one parcel often requires re-running the
entire model and re-calculating values.
48
Conclusion
Predicting assessments using Regression requires
the appraiser to
  • Explore data to determine relationships and
    cleanup outliers
  • Specify which model and variables will be used
  • transform variables and run regression
  • Review Results, modify or add variables
  • Create predicted assessments and review ratio
    statistics
  • Value Population using final coefficients

49
The End
Write a Comment
User Comments (0)
About PowerShow.com