Lecture 12. Bayesian Regression with conjugate and convenient priors - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Lecture 12. Bayesian Regression with conjugate and convenient priors

Description:

Lecture 12. Bayesian Regression with conjugate and convenient priors ... the NES' 7-point liberal-conservative ideology scale among a party's identifiers ... – PowerPoint PPT presentation

Number of Views:152
Avg rating:3.0/5.0
Slides: 26
Provided by: jeffgry
Category:

less

Transcript and Presenter's Notes

Title: Lecture 12. Bayesian Regression with conjugate and convenient priors


1
Lecture 12. Bayesian Regression with conjugate
and convenient priors
  • Conjugate Prior Analysis
  • Convenient Priors

2
The Bayesian Setup
  • For the normal linear model, we have
  • yi N(?i, ?2) for i ? 1,,n
  • where ?i is just an indicator for the expression
  • ?i B0 B1X1i BkXki
  • The object of statistical inference is the
    posterior distribution of the parameters B0,,Bk
    and ?2.
  • By Bayes Rule, we know that this is simply
  • p(B0,,Bk, ?2 Y, X) ? p(B0,,Bk, ?2) ? ?i p(yi
    ?i, ?2)

3
Conjugate priors and the normal linear model
  • Suppose that instead of an improper prior, we
    decide to use the conjugate prior.
  • For the normal regression model, the conjugate
    prior distribution for p(B0,,Bk, ?2) is the
    normal-inverse-gamma distribution.
  • Weve seen this distribution before when we
    studied the normal model with unknown mean and
    variance. We know that this distribution can be
    factored such that
  • p(B0,,Bk, ?2) p(B0,,Bk ?2) p(?2)
  • p(B0,,Bk ?2) NMV(Bprior , ?prior),
  • where ?prior is the prior covariance matrix for
    the Bs.
  • and p(?2) Inverse-Gamma(aprior, bprior).

4
Conjugate priors and the normal linear model
  • If we use a conjugate prior, then the prior
    distribution will have the same form. Thus, the
    posterior distribution will also be
    normal-inverse-gamma. If we integrate out ?2 the
    marginal for B will be a multivariate t-dist

- Notice that the coefficients are essentially a
weighted average of the prior coefficients
described by Bprior and the standard OLS
estimates B. - The weights are provided by the
conditional prior precision ?-1 and the data XTX.
This should make clear that as we increase our
prior precision (decrease our prior variance) for
B, we place greater posterior weight on our prior
beliefs relative to the data. Note Zellner
(1971) treats Bprior and the conditional prior
variance ? in the following way suppose you have
two data setsY1,X1 and Y2,X2. He sets Bprior
equal to the posterior mean for a regression
analysis of X1, Y1 with the improper prior 1/?2
and sets ? equal to X1TX1.
5
Conjugate priors and the normal linear model
  • To summarize our uncertainty about the
    coefficients, the variance-covariance matrix for
    B is given by

The posterior standard deviation can be taken
from the square root of the diagonal of this
matrix.
The second term is the maximum likelihood
estimate of the variance. The third terms states
that our variance estimates will be greater if
our prior values for the regression coefficients
differ from their posterior values, especially if
we indicate a great deal of confidence in our
prior beliefs by assigning small variances in the
matrix ?. The fourth term states that our
variance estimates for the regression
coefficients will be greater if the standard OLS
estimates differ from the posterior values,
especially if XTX is a large number.
Winbugs implementation would proceed in a manner
akin to the earlier example.
6
Example OneParty Activists and Partisan
Polarization
  • Parties reward their core supporters with policy
    in exchange for activists assistance during the
    campaign.
  • Research Design. Regression Analysis by party.
  • Dependent variable Party-in-Government Ideology
  • Partys Median DW-Nominate Score for the 93rd
    through the 107th Congress
  • Independent variable Mean Party Activist
    Ideology
  • Average response to the NES 7-point
    liberal-conservative ideology scale among a
    partys identifiers who were active in the
    campaign.
  • Independent variable Mean Party Non-Activist
    Ideology
  • Average response to the NES 7-point
    liberal-conservative ideology among a partys
    identifiers who were not active in the campaign

7
Classical OLS Estimates
Denotes statistical significance in a
one-tailed test at p lt .10 (n 15). Denotes
statistical significance in a one-tailed test at
p lt .05. Denotes statistical significance in
a one-tailed test at p lt .025.
8
Is there a fairer test?
  • With the small sample size, it is difficult to
    say conclusively that the parties are not
    following their activists.
  • - To improve the statistical power of the test,
    we could wait.
  • - Someone more clever could come up with a
    better research design.
  • - We could go the Bayesian route and incorporate
    prior beliefs about the data-generating process
    to the model.

9
The probability model (conjugate prior analysis)
Likelihood Party Ideologyi iid Normal(?i,
?2) ?i ?0 ?1 Activist Ideology ?2
Non-Activist Ideology Priors ?0, ?1, ?2 ?2
Normal( BParty , ? ), where BDem -100,
.099, .234T, BRep -100, .189, .301T and
? Diag10000, prior varB1, 1 ?2
Inv-Gamma(1, 1) We will vary prior varB1 from 1
to a small number to see how strongly our prior
beliefs have to be to find a statistically
significant posterior value for ?1
Based on the bivariate regressions with
significant Activist coefficients
10
Significant at p lt .1, one-tailed test
11
Significant at p lt .1, one-tailed test
12
Example 2-Western and Jackman (1994)
  • What explains cross-national variation in union
    density?
  • Union density is defined as the percentage of the
    work force who belongs to a labor union.
  • Competing theories
  • Wallerstein union density depends on the size of
    the civilian labor force.
  • Stephens union density depends on industrial
    concentration.
  • Note These two predictors correlate at -.92.
  • Control variable presence of a pro-labor
    government.
  • Sample n 20 countries with a continuous
    history of democracy since World War II.

13
Results with non-informative priors
14
Justification for the Bayesian Approach
  • Wallerstein and Stephens reach an empirical
    impasse, where because of the small sample size
    and multicollinear variables, they are not able
    to adjudicate between the two theories.
  • The incorporation of prior information provides
    additional structure to the data, which helps to
    uniquely identify the two coefficients.
  • Priors can be developed as equivalent to prior
    data sets, inflating the de facto n.
  • The data set contains all available observations
    from a population of interestit is not a random
    sample. More generally, cross-national data sets
    are not generated by a repeatable
    data-generating process.
  • Frequentist inference about a statistic (e.g. a
    regression coef.) is obtained through the
    assumption that the process generating the data
    could be repeated a large number of times.
  • Specifically, frequentist inference is about the
    proportion of the time that, in the long-run,
    realizations of this statistic will fall within
    some interval.
  • If there is no long-run, or possibility of
    repetition, then probabilistic summaries are not
    appropriate

15
The probability model
  • As best as I can tell since Western and Jackman
    didnt specify the full probability model, we
    have
  • union densityi N(?i , ?2)
  • ?i ?0 ?1Left Govt ?2Labor Force
    ?3Industrial Concentration
  • ?Wallerstein NMV (0, .3 , -5 , 0T ,
    Diag100000, 0.15, 2.5,100000 )
  • So, informative priors are chosen for Left Govt
    and Labor Force while diffuse priors are chosen
    for the Intercept and Industrial Concentration.
  • ?Stephens NMV (0, .3 , 0 , 10T , Diag100000,
    0.15, 100000, 5 )
  • So, informative priors are chosen for Left Govt
    and Industrial Concentration while diffuse priors
    are chosen for the Intercept and Labor Force.
  • I believe that ?2 is assumed to be known.

16
Results with Wallersteins priors
17
Results with Stephens priors
18
Final comments on Western and Jackman
  • 1) Even with Stephens priors, Wallersteins
    hypothesis appears robust however, the opposite
    relationship does not hold.
  • 2) Western and Jackman reanalyze the data with
    the same prior means, but inflate the prior
    variances for the regression coefficients to see
    how sensitive the results are
  • 3) Western and Jackman report the Bayesian
    influence statistic which describes the influence
    of the ith observation on the joint posterior
    distribution of the regression coefficients.
  • ? this statistic is interesting because it shows
    how observations tend to become more influential
    with larger prior variances.
  • ? like the traditional Cooks distance and other
    measures of leverage, this statistic provides
    evidence about the effects of outliers on
    posterior inference.

19
Convenient, but non-conjugate priors
  • In WinBugs, we typically would not trick the
    program into implementing conjugate or improper
    priors.
  • Instead, we would typically assume that our prior
    beliefs for the regression coefficients and the
    variance can be factored into separate
    distributions. Thus, p(? , ?) p(?)p(?)
  • A common model assume the following form for the
    likelihood
  • yi N(?i , ?), where ?i ?0 ?1X1i
    ?nXni for all i
  • The non-informative priors would be defined as
    follows
  • ?j N(0, .00001) for all j
  • and ? Gamma(.00001, .00001)
  • WinBugs implementation is straightforward, except
    that we may need to write-out a list of initial
    values for our parameters, especially for ?. This
    is because when WinBugs creates initial values
    for these priors, it is possible that the program
    will make implausible choices (e.g. ? -10).
  • See Congdons code online and the WinBugs help
    for examples.

20
Job SatisfactionCongdon Example 4.3
  • Theory Job satisfaction is a function of age,
    autonomy, and income.
  • The Likelihood
  • Satisfactioni Normal(?i , ?)
  • ?i ?0 ?1Agei ?2Autonomyi ?3Incomei
  • The Priors
  • ?i N(0, .001) for all i and ?
    Gamma(.01,.01)

21
WinBugs Code
  • model
  • define the likelihood
  • for (i in 168)
  • satisfactioni dnorm(mui,tau)
  • mui lt- b1agei b2autonomyi
    b3incomei b4
  • define the priors
  • for (j in 14)
  • bj dnorm(0, .001)
  • tau dgamma(0.0001, 0.0001)

22
Bayesian Path Analysis
  • Path analysis is a method that purports to
    examine causal effects for systems of equations.
    The causal models are assumed to look something
    like

Variable 1
Variable 3
Variable 2
Variable 5
Variable 4
A regression model of the effects of Variables
2-4 on Variable 5 will often provide estimates of
the regression coefficients with desirable
properties, but may understate, for example, the
effect of Variable 2 on Variable 5, since
Variable 2 influences outcomes directly and
through its effects on Variables 3 and 4. Basic
methodological approach estimate a separate
regression for each variable that is dependent at
some point in the system of equations with every
variable standardized. A path coefficient is the
sum of an independent variables effects on a
particular dependent variable.
23
Path Analysis of Job Satisfaction
  • Theory
  • Age is an unmoved mover
  • Autonomyi a0 a1Agei
  • Incomei b0 b1Agei b2Autonomyi
  • Satisfactioni c0 c1Agei c2Autonomyi
    c3Incomei
  • Probability model
  • Autonomyi Norm(a1 a2Agei , ?autonomy)
  • aj Norm(0, .001) for j 1,2 and ?autonomy
    Gamma(.001, .001)
  • Incomei Norm(b1 b2Agei b3Autonomyi ,
    ?income)
  • bj Norm(0, .001) for j 1,2,3 and ?income
    Gamma(.001, .001)
  • Satisfactioni Norm(c1 c2Agei c3Autonomyi
    c4Incomei , ?satisfaction)
  • cj Norm(0, .001) for j 1,2,3,4 and
    ?satisfaction Gamma(.001, .001)

24
WinBugs Implementation of Path Analysis
  • model
  • for (i in 168) specify the likelihood
  • Autonomyi dnorm(muauti, tauaut)
  • muauti lt- a1 a2Agei
  • Incomei dnorm(muinci, tauinc)
  • muinci lt- b1 b2Agei
    b3Autonomyi
  • Satisfactioni dnorm(musati , tausat)
  • musati lt- c1 c2Agei
    c3Autonomyi c4Incomei
  • specify priors
  • for (j in 12) aj dnorm(0,.001)
  • for (j in 13) bj dnorm(0,.001)
  • for (j in 14) cj dnorm(0,.001)
  • tauaut dgamma(0.001, 0.001)
  • tauinc dgamma(0.001, 0.001)
  • tausat dgamma(0.001, 0.001)

25
WinBugs Implementation of Path Analysis
  • model
  • for (i in 168) specify the likelihood
  • Oldnessi lt- (Agei mean(Age)) / sd(Age)
  • Auti lt- (Autonomyi mean(Autonomy)) /
    sd(Autonomy)
  • Auti dnorm(muauti, tauaut)
  • muauti lt- a1 a2Oldnessi
  • Inci lt- (Incomei mean(Income)) /
    sd(Income)
  • Inci dnorm(muinci, tauinc)
  • muinci lt- b1 b2Oldnessi b3Auti
  • Sati lt- (Satisfactioni mean(Satisfaction)
    ) / sd(Satisfaction)
  • Satisfactioni dnorm(musati , tausat)
  • musati lt- c1 c2Oldnessi c3Auti
    c4Inci
  • C2 is the direct effect of age c3a2 is
    the effect of age through autonomy c4b2 is
    the effect
  • of age through income and c4b3a2 is
    the effect of age through autonomy through income
  • TotAge lt- c2 c3a2 c4b2
    c4b3a2
Write a Comment
User Comments (0)
About PowerShow.com