petri.nokelainen@uta.fi School of Education University of Tampere, Finland - PowerPoint PPT Presentation

1 / 146
About This Presentation
Title:

petri.nokelainen@uta.fi School of Education University of Tampere, Finland

Description:

Introduction to Discrete Bayesian Methods Petri Nokelainen petri.nokelainen_at_uta.fi School of Education University of Tampere, Finland * * * * * * * Solid lines ... – PowerPoint PPT presentation

Number of Views:348
Avg rating:3.0/5.0
Slides: 147
Provided by: petrinok
Category:

less

Transcript and Presenter's Notes

Title: petri.nokelainen@uta.fi School of Education University of Tampere, Finland


1
petri.nokelainen_at_uta.fiSchool of Education
University of Tampere, Finland
Introduction to Discrete Bayesian Methods
Petri Nokelainen
2
Outline
  • Overview
  • Introduction to Bayesian Modeling
  • Bayesian Classification Modeling
  • Bayesian Dependency Modeling
  • Bayesian Unsupervised Model-based Visualization

3
Overview
(Nokelainen, 2008.)
4
Overview
BDM Bayesian Dependency Modeling BCM
Bayesian Classification Modeling BUMV Bayesian
Unsupervised Model-based Visualization
5
Bayesian Classification Modeling
http//b-course.cs.helsinki.fi
The classification accuracy of the best model
found is 83.48 (58.57).
COMMON FACTORS PUB_T CC_PR CC_HE PA C_SHO C_FAIL
CC_AB CC_ES
6
Bayesian Dependency Modeling
http//b-course.cs.helsinki.fi
7
Bayesian Unsupervised Model-based Visualization
http//www.bayminer.com
8
Outline
  • Overview
  • Introduction to Bayesian Modeling
  • Bayesian Classification modeling
  • Bayesian Dependency modeling
  • Bayesian Unsupervised Model-based Visualization

9
Introduction to Bayesian Modeling
  • In the social science researchers point of view,
    the requirements of traditional frequentistic
    statistical analysis are very challenging.
  • For example, the assumption of normality of both
    the phenomena under investigation and the data is
    prerequisite for traditional parametric
    frequentistic calculations.

10
Introduction to Bayesian Modeling
  • In situations where
  • a latent construct cannot be appropriately
    represented as a continuous variable,
  • ordinal or discrete indicators do not reflect
    underlying continuous variables,
  • the latent variables cannot be assumed to be
    normally distributed,
  • traditional Gaussian modeling is clearly not
    appropriate.
  • In addition, normal distribution analysis sets
    minimum requirements for the number of
    observations, and the measurement level of
    variables should be continuous.

11
Introduction to Bayesian Modeling
  • Frequentistic parametric statistical techniques
    are designed for normally distributed (both
    theoretically and empirically) indicators that
    have linear dependencies.
  • Univariate normality
  • Multivariate normality
  • Bivariate linearity

12
(Nokelainen, 2008, p. 119)
13
  • The upper part of the figure contains two
    sections, namely parametric and
    non-parametric divided into eight sub-sections
    (DNIMMOCS OLD).
  • Parametric approach is viable only if
  • 1) Both the phenomenon modeled and the sample
    follow normal distribution.
  • 2) Sample size is large enough (at least 30
    observations).
  • 3) Continuous indicators are used.
  • 4) Dependencies between the observed variables
    are linear.
  • Otherwise non-parametric techniques should be
    applied.

D Design (ce controlled experiment, co
correlational study) N Sample size IO
Independent observations ML Measurement level
(c continuous, d discrete, n nominal) MD
Multivariate distribution (n normal, similar) O
Outliers C Correlations S Statistical
dependencies (l linear, nl non-linear)
14
Introduction to Bayesian Modeling
N 11 500
15
Introduction to Bayesian Modeling
  • Bayesian method
  • (1) is parameter-free and the user input is not
    required, instead, prior distributions of the
    model offer a theoretically justifiable method
    for affecting the model construction
  • (2) works with probabilities and can hence be
    expected to produce robust results with discrete
    data containing nominal and ordinal attributes
  • (3) has no limit for minimum sample size
  • (4) is able to analyze both linear and
    non-linear dependencies
  • (5) assumes no multivariate normal model
  • (6) allows prediction.

16
Introduction to Bayesian Modeling
  • Probability is a mathematical construct that
    behaves in accordance with certain rules and can
    be used to represent uncertainty.
  • The classical statistical inference is based on a
    frequency interpretation of probability, and the
    Bayesian inference is based on subjective or
    degree of belief interpretation.
  • Bayesian inference uses conditional probabilities
    to represent uncertainty.
  • P(H E,I) - the probability of unknown things or
    hypothesis (H), given the evidence (E) and
    background information (I).

17
Introduction to Bayesian Modeling
  • The essence of Bayesian inference is in the rule,
    known as Bayes' theorem, that tells us how to
    update our initial probabilities P(H) if we see
    evidence E, in order to find out P(HE).
  • A priori probability
  • Conditional probability
  • Posteriori probability

P(EH) P(H) P(HE) P(EH)P(H
) P(EH) P(H)
18
Introduction to Bayesian Modeling
  • The theorem was invented by an english reverend
    Thomas Bayes (1701-1761) and published
    posthumously (1763).

19
Introduction to Bayesian Modeling
  • Bayesian inference comprises the following three
    principal steps
  • (1) Obtain the initial probabilities P(H) for the
    unknown things. (Prior distribution.)
  • (2) Calculate the probabilities of the evidence E
    (data) given different values for the unknown
    things, i.e., P(E H). (Likelihood or
    conditional distribution.)
  • (3) Calculate the probability distribution of
    interest P(H E) using Bayes' theorem.
    (Posterior distribution.)
  • Bayes' theorem can be used sequentially.

20
Introduction to Bayesian Modeling
  • If we first receive some evidence E (data), and
    calculate the posterior P(H E), and at some
    later point in time receive more data E', the
    calculated posterior can be used in the role of
    prior to calculate a new posterior P(H E,E')
    and so on.
  • The posterior P(H E) expresses all the
    necessary information to perform predictions.
  • The more evidence we get, the more certain we
    will become of the unknowns, until all but one
    value combination for the unknowns have
    probabilities so close to zero that they can be
    neglected.

21
C_Example 1 Applying Bayes Theorem
  • Company A is employing workers on short term
    jobs that are well paid.
  • The job sets certain prerequisites to applicants
    linguistic abilities.
  • Earlier all the applicants were interviewed, but
    nowadays it has become an impossible task as both
    the number of open vacancies and applicants has
    increased enormously.
  • Personnel department of the company was ordered
    to develop a questionnaire to preselect the most
    suitable applicants for the interview.

22
C_Example 1 Applying Bayes Theorem
  • Psychometrician who developed the instrument
    estimates that it would work out right on 90 out
    of 100 applicants, if they are honest.
  • We know on the basis of earlier interviews that
    the terms (linguistic abilities) are valid for
    one per 100 person living in the target
    population.
  • The question is If an applicant gets enough
    points to participate in the interview, is he or
    she hired for the job (after an interview)?

23
C_Example 1 Applying Bayes Theorem
  • A priori probability P(H) is described by the
    number of those people in the target population
    that really are able to meet the requirements of
    the task (1 out of 100 .01).
  • Counter assumption of the a priori is P(H) that
    equals to 1-P(H), thus it is .99.
  • Psychometricians beliefs about how the instrument
    works is called conditional probability P(EH)
    .9.
  • Instruments failure to indicate non-valid
    applicants, i.e., those that are not able to
    succeed in the following interview, is stated as
    P(EH) that equals to .1.
  • These values need not to sum to one!

24
C_Example 1 Applying Bayes Theorem
  • A priori probability
  • Conditional probability
  • Posterior probability
  • P(EH) P(H)
  • P(HE)
  • P(EH) P(H) P(EH) P(H)

(.9) (.01) P(HE) (.9)
(.01) (.1) (.99)
.08
25
C_Example 1 Applying Bayes Theorem
26
C_Example 1 Applying Bayes Theorem
  • What if the measurement error of the
    psychometricians instrument would have been 20
    per cent?
  • P(EH)0.8 P(EH)0.2

27
C_Example 1 Applying Bayes Theorem
28
C_Example 1 Applying Bayes Theorem
  • What if the measurement error of the
    psychometricians instrument would have been only
    one per cent?
  • P(EH)0.99 P(EH)0.01

29
C_Example 1 Applying Bayes Theorem
30
C_Example 1 Applying Bayes Theorem
  • Quite often people tend to estimate the
    probabilities to be too high or low, as they are
    not able to update their beliefs even in simple
    decision making tasks when situations change
    dynamically (Anderson, 1995).

31
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • One of the most important rules educational
    science scientific journals apply to judge the
    scientific merits of any submitted manuscript is
    that all the reported results should be based on
    so called null hypothesis significance testing
    procedure (NHSTP) and its featured product,
    p-value.
  • Gigerenzer, Krauss and Vitouch (2004, p. 392)
    describe the null ritual as follows
  • 1) Set up a statistical null hypothesis of no
    mean difference or zero correlation. Dont
    specify the predictions of your research or of
    any alternative substantive hypotheses
  • 2) Use 5 per cent as a convention for rejecting
    the null. If significant, accept your research
    hypothesis
  • 3) Always perform this procedure.

32
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • A p-value is the probability of the observed data
    (or of more extreme data points), given that the
    null hypothesis H0 is true, P(DH0) (id.).
  • The first common misunderstanding is that the
    p-value of, say t-test, would describe how
    probable it is to have the same result if the
    study is repeated many times (Thompson, 1994).
  • Gerd Gigerenzer and his colleagues (id., p. 393)
    call this replication fallacy as P(DH0) is
    confused with 1P(D).

33
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • The second misunderstanding, shared by both
    applied statistics teachers and the students, is
    that the p-value would prove or disprove H0.
    However, a significance test can only provide
    probabilities, not prove or disprove null
    hypothesis.
  • Gigerenzer (id., p. 393) calls this fallacy an
    illusion of certainty Despite wishful thinking,
    p(DH0) is not the same as P(H0D), and a
    significance test does not and cannot provide a
    probability for a hypothesis.
  • A Bayesian statistics provide a way of
    calculating a probability of a hypothesis
    (discussed later in this section).

34
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • My statistics course grades (Autumn 2006, n 12)
    ranged from one to five as follows 1) n 3 2)
    n 2 3) n 4 4) n 2 5) n 1, showing that
    the lowest grade frequency (1) from the course
    is three (25.0).
  • Previous data from the same course (2000-2005)
    shows that only five students out of 107 (4.7)
    had the lowest grade.
  • Next, I will use the classical statistical
    approach (the likelihood principle) and Bayesian
    statistics to calculate if the number of the
    lowest course grades is exceptionally high on my
    latest course when compared to my earlier stat
    courses.

35
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • There are numerous possible reasons behind such
    development, for example, I have become more
    critical on my assessment or the students are
    less motivated in learning quantitative
    techniques.
  • However, I believe that the most important
    difference between the last and preceding courses
    is that the assessment was based on a computer
    exercise with statistical computations.
  • The preceding courses were assessed only with
    essay answers.

36
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • I assume that the 12 students earned their grade
    independently (independent observations) of each
    other as the computer exercise was conducted
    under my or my assistants supervision.
  • I further assume that the chance of getting the
    lowest grade (?), is the same for each student.
  • Therefore X, the number of lowest grades (1) in
    the scale from 1 to 5 among the 12 students in
    the latest stat course, has a binomial (12, ?)
    distribution X Bin(12, ?).
  • For any integer r between 0 and 12,

37
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • The expected number of lowest grades is 12(5/107)
    0.561.
  • Theta is obtained by dividing the expected number
    of lowest grades with the number of students
    0.561 / 12 ? 0.05.
  • The null hypothesis is formulated as follows H0
    ? 0.05, stating that the rate of the lowest
    grades from the current stat course is not a big
    thing and compares to the previous courses rates.

38
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • Three alternative hypotheses are formulated to
    address the concern of the increased number of
    lowest grades (6, 7 and 8, respectively) H1 ?
    0.06 H2 ? 0.07 H3 ? 0.08.
  • H1 12/(107/6) .67 -gt .67/12.056 ? .06
  • H2 12/(107/7) .79 -gt .79/12.065 ? .07
  • H3 12/(107/8) .90 -gt .90/12.075 ? .08

39
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • To compare the hypotheses, we calculate binomial
    distributions for each value of ?.
  • For example, the null hypothesis (H0) calculation
    yields

40
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • The results for the alternative hypotheses are as
    follows
  • PH1(3.06, 12) ? .027
  • PH2(3.07, 12) ? .039
  • PH3(3.08, 12) ? .053.
  • The ratio of the hypotheses is roughly 1223
    and could be verbally interpreted with statements
    like the second and third hypothesis explain the
    data about equally well, or the fourth
    hypothesis explains the data about three times as
    well as the first hypothesis.

41
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • Lavine (1999) reminds that P(r?, n), as a
    function of r (3) and ? .05 .06 .07 .08,
    describes only how well each hypotheses explains
    the data no value of r other than 3 is relevant.
  • For example, P(4.05, 12) is irrelevant as it
    does not describe how well any hypothesis
    explains the data.
  • This likelihood principle, that is, to base
    statistical inference only on the observed data
    and not on a data that might have been observed,
    is an essential feature of Bayesian approach.

42
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • The Fisherian, so called classical approach to
    test the null hypothesis (H0 ? .05) against
    the alternative hypothesis (H1 ? gt .05) is to
    calculate the p-value that defines the
    probability under H0 of observing an outcome at
    least as extreme as the outcome actually
    observed

43
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • As an example, the first part of the formula is
    solved as follows

44
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • After calculations, the p-value of .02 would
    suggest H0 rejection, if the rejection level of
    significance is set at 5 per cent.
  • Calculation of p-value violates the likelihood
    principle by using P(r?, n) for values of r
    other than the observed value of r 3 (Lavine,
    1999)
  • The summands of P(4.05, 12), P(5.05, 12), ,
    P(12.05, 12) do not describe how well any
    hypothesis explains observed data.

45
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • A Bayesian approach will continue from the same
    point as the classical approach, namely
    probabilities given by the binomial
    distributions, but also make use of other
    relevant sources of a priori information.
  • In this domain, it is plausible to think that the
    computer test (SPSS exam) would make the number
    of total failures more probable than in the
    previous times when the evaluation was based
    solely on the essays.
  • On the other hand, the computer test has only 40
    per cent weight in the equation that defines the
    final stat course grade .3(Essay_1)
    .3(Essay_2) .4(Computer test)/3 Final grade.

46
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • Another aspect is to consider the nature of the
    aforementioned tasks, as the essays are distance
    work assignments while the computer test is to be
    performed under observation.
  • Perhaps the course grades of my earlier stat
    courses have a narrower dispersion due to
    violence of the independent observation
    assumption?
  • For example, some students may have copy-pasted
    text from other sources or collaborated without a
    permission.
  • As we see, there are many sources of a priori
    information that I judge to be inconclusive and,
    thus, define that null hypothesis is as likely to
    be true or false.

47
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • This a priori judgment is expressed
    mathematically as P(H0) ? 1/2 ? P(H1) P(H2)
    P(H3).
  • I further assume that the alternative hypotheses
    H1, H2 or H3 share the same likelihood P(H1) ?
    P(H2) ? P(H3) ? 1/6.
  • These prior distributions summarize the knowledge
    about ? prior to incorporating the information
    from my course grades.

48
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • An application of Bayes' theorem yields

49
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • Similar calculations for the alternative
    hypotheses yields P(H1r3) ? .16 P(H2r3) ?
    .29 P(H3r3) ? .31.
  • These posterior distributions summarize the
    knowledge about ? after incorporating the grade
    information.
  • The four hypotheses seem to be about equally
    likely (.30 vs. .16, .29, .31).
  • The odds are about 2 to 1 (.30 vs. .70) that the
    latest stat course had higher rate of lowest
    grades than 0.05.

50
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • The difference between the classical and Bayesian
    statistics would be only philosophical
    (probability vs. inverse probability) if they
    would always lead to similar conclusions.
  • In this case the p-value would suggest rejection
    of H0 (p .02).
  • Bayesian analysis would also suggest evidence
    against ? .05 (.30 vs. .70, ratio of .43).

51
C_Example 2 Comparison of Traditional
Frequentistic and Bayesian Approach
  • What if the number of the lowest grades in the
    last course would be two?
  • The classical approach would not anymore suggest
    H0 rejection (p .12).
  • Bayesian result would still say that there is
    more evidence against than for the H0 (.39 vs.
    .61, ratio of .64).

52
Outline
  • Overview
  • Introduction to Bayesian Modeling
  • Bayesian Classification Modeling
  • Bayesian Dependency Modeling
  • Bayesian Unsupervised Model-based Visualization

53
BCM Bayesian Classification Modeling BDM
Bayesian Dependency Modeling BUMV Bayesian
Unsupervised Model-based Visualization
B-Course
54
Bayesian Classification Modeling
  • Bayesian Classification Modeling (BCM) is
    implemented in B-Course software that is based on
    discrete Bayesian methods.
  • This also applies to Bayesial Dependency Modeling
    that is discussed later.
  • Quantitative indicators with high measurement
    lever (continuous, interval) lose more
    information in the discretization process than
    qualitative indicators (ordinal, nominal) as
    they all are treated in the analysis as nominal
    (discrete) indicators.

55
Bayesian Classification Modeling
  • For example, variable gender may include
    numerical values 1 (Female) or 2 (Male) or
    text values Female and Male in discrete
    Bayesian analysis.
  • This will inevitably lead to a loss of power
    (Cohen, 1988 Murphy Myors, 1998), however,
    ensuring that sample size is large enough is a
    simple way to address this problem.

56
Sample size estimation
  • N
  • Population size.
  • n
  • Estimated sample size.
  • Sampling error (e)
  • Difference between the true (unknown) value and
    observed values, if the survey were repeated
    (sample collected) numerous times.
  • Confidence interval
  • Spread of the observed values that would be seen
    if the survey were repeated numerous times.
  • Confidence level
  • How often the observed values would be within
    sampling error of the true value if the survey
    were repeated numerous times.

(Murphy Myors, 1998.)
57
Bayesian Classification Modeling
  • Aim of the BCM is to select the variables that
    are best predictors for different class
    memberships (e.g., gender, job title, level of
    giftedness).
  • In the classification process, the automatic
    search is looking for the best set of variables
    to predict the class variable for each data item.

58
Bayesian Classification Modeling
  • The search procedure resembles the traditional
    linear discriminant analysis (LDA, see Huberty,
    1994), but the implementation is totally
    different.
  • For example, a variable selection problem that is
    addressed with forward, backward or stepwise
    selection procedure in LDA is replaced with a
    genetic algorithm approach (e.g., Hilario,
    Kalousisa, Pradosa Binzb, 2004 Hsu, 2004) in
    the Bayesian classification modeling.

59
Bayesian Classification Modeling
  • The genetic algorithm approach means that
    variable selection is not limited to one (or two
    or three) specific approach instead many
    approaches and their combinations are exploited.
  • One possible approach is to begin with the
    presumption that the models (i.e., possible
    predictor variable combinations) that resemble
    each other a lot (i.e., have almost same
    variables and discretizations) are likely to be
    almost equally good.
  • This leads to a search strategy in which models
    that resemble the current best model are selected
    for comparison, instead of picking models
    randomly.

60
Bayesian Classification Modeling
  • Another approach is to abandon the habit of
    always rejecting the weakest model and instead
    collect a set of relatively good models.
  • The next step is to combine the best parts of
    these models so that the resulting combined model
    is better than any of the original models.
  • B-Course is capable of mobilizing many more
    viable approaches, for example, rejecting the
    better model (algorithms like hill climbing,
    simulated annealing) or trying to avoid picking
    similar model twice (tabu search).

61
Bayesian Classification Modeling
Nokelainen, P., Ruohotie, P., Tirri, H. (1999).
62
For an example of practical use of BCM, see
Nokelainen, Tirri, Campbell and Walberg (2007).
63
The results of Bayesian classification modeling
showed that the estimated classification accuracy
of the best model found was 60. The left-hand
side of Figure 3 shows that only three variables,
Olympians Conducive Home Atmosphere (SA),
Olympians School Shortcomings (C_SHO), and
Computer literacy composite (COMP), were
successful predictors for the A or C group
membership. All the other variables that were
not accepted in the model are to be considered as
connective factors between the two groups. The
middle section of Figure 3 shows that the two
strongest predictors were Olympians Conducive
Home Atmosphere (20.9) and Olympians School
Shortcomings (22.6). The confusion matrix shows
that most of the A (25 correct out of 39) and the
C (29 out of 47) group members were correctly
classified. The matrix also shows that nine
participants of the group A were incorrectly
classified into group C and vice versa.
64
(No Transcript)
65
Figure 4 presents predictive modeling of the A
and C groups (A_C, A or C group membership)
by Olympians Conducive Home Atmosphere (SA),
Olympians School Shortcomings (C_SHO), and
Computer Literacy Composite (COMP). The
left-hand side of the figure presents the initial
model with no values fixed. The model in the
middle presents a scenario where all the A group
members are selected. When we compare this model
to the one on the right-hand side (i.e.,
presenting a situation where all the C group
members are selected), we notice, for example,
that conditional distribution of the Olympians
Conducive Home Atmosphere (SA) has changed. It
shows that highly productive Olympians have
reported more Conducive home atmosphere (54.0)
than the members of the low productivity group C
(23.0).
66
(No Transcript)
67
  • Modeling of Vocational Excellence in Air Traffic
    Control
  • This paper aims to describe the characteristics
    and predictors that explain air traffic
    controllers (ATCO) vocational expertise and
    excellence.
  • The study analyzes the role of natural abilities,
    self-regulative abilities and environmental
    conditions in ATCOs vocational development.

(Pylväs, Nokelainen Roisko, in press.)
68
  • Modeling of Vocational Excellence in Air Traffic
    Control
  • The target population of the study consisted of
    ATCOs in Finland (N300) of which 28,
    representing four different airports, were
    interviewed.
  • The research data also included interviewees
    aptitude test scoring, study records and employee
    assessments.

69
  • Modeling of Vocational Excellence in Air Traffic
    Control
  • The research questions were examined by using
    theoretical concept analysis.
  • The qualitative data analysis was conducted with
    content analysis and Bayesian classification
    modeling.

70
  • Modeling of Vocational Excellence in Air Traffic
    Control

71
  • Modeling of Vocational Excellence in Air Traffic
    Control
  • (RQ1a)
  • What are the differences in characteristics
    between the air traffic controllers representing
    vocational expertise and vocational excellence?

72
  • Modeling of Vocational Excellence in Air Traffic
    Control
  • "the natural ambition of wanting to be good. Air
    traffic controllers have perhaps generally a
    strong professional pride."
  • Interesting and rewarding work, that is the
    basis of wanting to stay in this work until
    retiring.

73
  • Modeling of Vocational Excellence in Air Traffic
    Control
  • "I read all the regulations and instructions
    carefully and precisely, and try to think the
    majority wave aside of them. It reflects on
    work." "but still I consider myself more
    precise than the majority a bad air traffic
    controller have delays, good air traffic
    controllers do not have delays which is something
    that also pilots appreciate because of the strict
    time limits.

74
  • Modeling of Vocational Excellence in Air Traffic
    Control

75
  • Modeling of Vocational Excellence in Air Traffic
    Control

76
  • Modeling of Vocational Excellence in Air Traffic
    Control

77
  • Modeling of Vocational Excellence in Air Traffic
    Control

78
  • Modeling of Vocational Excellence in Air Traffic
    Control

79
Classification accuracy 89.
80
  • Modeling of Vocational Excellence in Air Traffic
    Control

81
  • Modeling of Vocational Excellence in Air Traffic
    Control

82
Outline
  • Research Overview
  • Introduction to Bayesian Modeling
  • Investigating Non-linearities with Bayesian
    Networks
  • Bayesian Classification Modeling
  • Bayesian Dependency Modeling
  • Bayesian Unsupervised Model-based Visualization

83
BCM Bayesian Classification Modeling BDM
Bayesian Dependency Modeling BUMV Bayesian
Unsupervised Model-based Visualization
B-Course
84
Bayesian Dependency Modeling
  • Bayesian dependency modeling (BDM) is applied to
    examine dependencies between variables by both
    their visual representation and probability ratio
    of each dependency
  • Graphical visualization of Bayesian network
    contains two components
  • 1) Observed variables visualized as ellipses.
  • 2) Dependences visualized as lines between nodes.

85
C_Example 4 Calculation of Bayesian Score
  • Bayesian score (BS), that is, the probability of
    the model P(MD), allows the comparison of
    different models.

Figure 9. An Example of Two Competing Bayesian
Network Structures
(Nokelainen, 2008, p. 121.)
86
C_Example 4 Calculation of Bayesian Score
  • Let us assume that we have the following data
  • x1 x2
  • 1 1
  • 1 1
  • 2 2
  • 1 2
  • 1 1
  • Model 1 (M1) represents the two variables, x1 and
    x2 respectively, without statistical dependency,
    and the model 2 (M2) represents the two variables
    with a dependency (i.e., with a connecting arc).
  • The binomial data might be a result of an
    experiment, where the five participants have
    drinked a nice cup of tea before (x1) and after
    (x2) a test of geographic knowledge.

87
C_Example 4 Calculation of Bayesian Score
  • In order to calculate P(M1,2D), we need to solve
    P(DM1,2) for the two models M1 and M2.
  • Probability of the data given the model is solved
    by using the following marginal likelihood
    equation (Congdon, 2001, p. 473 Myllymäki,
    Silander, Tirri, Uronen, 2001 Myllymäki
    Tirri, 1998, p. 63)

88
C_Example 4 Calculation of Bayesian Score
- Nij describes the number of rows in the data
that have jth configuration for parents of ith
variable - Nijk describes how many rows in the
data have kth value for the ith variable also
have jth configuration for parents of ith
variable - N is the equivalent sample size set
to be the average number of values divided by two.
  • In the Equation 4, following symbols are used
  • n is the number of variables (i indexes variables
    from 1 to n)
  • ri is the number of values in ith variable (k
    indexes these values from 1 to ri
  • qi is the number of possible configurations of
    parents of ith variable
  • The marginal likelihood equation produces a
    Bayesian Dirichlet score that allows model
    comparison (Heckerman et al., 1995 Tirri, 1997
    Neapolitan Morris, 2004).

89
C_Example 4 Calculation of Bayesian Score
  • First, P(DM1) is calculated given the values of
    variable x1

(2/2)/1
(2/2)/21
x1 x2 1 1 1 1 2 2 1 2 1 1
90
C_Example 4 Calculation of Bayesian Score
  • Second, the values for the x2 are calculated

x1 x2 1 1 1 1 2 2 1 2 1 1
91
C_Example 4 Calculation of Bayesian Score
  • The BS, probability for the first model P(M1D),
    is 0.027 0.012 ? 0.000324.

92
C_Example 4 Calculation of Bayesian Score
  • Third, P(DM2) is calculated given the values of
    variable x1

93
C_Example 4 Calculation of Bayesian Score
  • Fourth, the values for the first parent
    configuration (x1 1) are calculated

94
C_Example 4 Calculation of Bayesian Score
  • Fifth, the values for the second parent
    configuration (x1 2) are calculated

95
C_Example 4 Calculation of Bayesian Score
  • The BS, probability for the second model P(M2D),
    is 0.027 0.027 0.500 ? 0.000365.

96
C_Example 4 Calculation of Bayesian Score
  • Bayes theorem enables the calculation of the
    ratio of the two models, M1 and M2.
  • As both models share the same a priori
    probability, P(M1) P(M2), both probabilities
    are canceled out.
  • Also the probability of the data P(D) is canceled
    out in the following equation as it appears in
    both formulas in the same position

97
C_Example 4 Calculation of Bayesian Score
  • The result of model comparison shows that since
    the ratio is less than 1, the M2 is more probable
    than M1.
  • This result becomes explicit when we investigate
    the sample data more closely.
  • Even a sample this small (n 5) shows that there
    is a clear tendency between the values of x1 and
    x2 (four out of five value pairs are identical).

x1 x2 1 1 1 1 2 2 1 2 1 1
98
  • How many models are there?

99
For an example of practical use of BDM, see
Nokelainen and Tirri (2010).
100
Our hypothesis regarding the first research
question was that intrinsic goal orientation
(INT) is positively related to moral judgment
(Batson Thompson, 2001 Kunda Schwartz,
1983). It was also hypothesized, based on
Blasis (1999) argumentation that emotions cannot
be predictors of moral action, that fear of
failure (affective motivational section) is not
related to moral judgment. Research evidence
showed support for both hypotheses firstly, only
intrinsic motivation was directly (positively)
related to moral judgment, and secondly,
affective motivational section was not present in
the predictive model.
(Nokelainen Tirri, 2010.)
101
Conditioning the three levels of moral judgment
showed that there is a positive statistical
relationship between moral judgment and intrinsic
goal orientation. The probability of belonging to
the highest intrinsically motivated group three
(M 3.7 5.0) increases from 15 per cent to 90
per cent alongside with the moral judgment
abilities. There is also similar but less steep
increase in extrinsic goal orientation (from 5
to 12), but we believe that it is mostly tied to
increase in extrinsic goal orientation.
(Nokelainen Tirri, 2010.)
102
For an example of practical use of BDM see
Nokelainen and Tirri (2007).
103
(Nokelainen Tirri, 2007.)
104
(Nokelainen Tirri, 2007.)
105
2 vs. 90
21 vs. 78
EL_iv_17_49 In conflict situations, my superior
is able to draw out all parties and
understand the differing
perspectives. EL_ii_09_26 My superior sees
other people in positive rather than in negative
light. EL_ii_09_25 My superior has an
optimistic "glass half full" outlook.
(Nokelainen Tirri, 2007.)
106
69
66
EL_iv_17_49 In conflict situations, my superior
is able to draw out all parties and understand
the differing perspectives. EL_ii_09_26 My
superior sees other people in positive rather
than in negative light. EL_ii_09_25 My
superior has an optimistic "glass half full"
outlook.
(Nokelainen Tirri, 2007.)
107
95
85
EL_iv_17_49 In conflict situations, my superior
is able to draw out all parties and understand
the differing perspectives. EL_ii_09_26 My
superior sees other people in positive rather
than in negative light. EL_ii_09_25 My
superior has an optimistic "glass half full"
outlook.
(Nokelainen Tirri, 2007.)
108
Outline
  • Overview
  • Introduction to Bayesian Modeling
  • Bayesian Classification Modeling
  • Bayesian Dependency Modeling
  • Bayesian Unsupervised Model-based Visualization

109
BCM Bayesian Classification Modeling BDM
Bayesian Dependency Modeling BUMV Bayesian
Unsupervised Model-based Visualization
BayMiner
110
Bayesian Unsupervised Model-based Visualization
LDA
BSMV
SUPERVISED
UNSUPERVISED
VISUALIZATION TECH.
CLUSTER ANALYSIS
EFA
DISC. MULTIV. ANAL.
REDUCING
NON-REDUC.
PROJECTION TECH.
NON-LINEAR
LINEAR
NEUR.N.
MDS
PCA
SOM
PRIN.C.
ICA
BUMV
PROJ.PUR.
111
Bayesian Unsupervised Model-based Visualization
  • Supervised techniques, for example, linear
    discriminant analysis (LDA) and supervised
    Bayesian networks (BSMV, see Kontkanen, Lahtinen,
    Myllymäki, Silander Tirri, 2000) assume a given
    structure (Venables Ripley, 2002, p. 301).
  • Unsupervised techniques, for example, exploratory
    factor analysis (EFA) discover variable structure
    from the evidence of the data matrix.
  • Unsupervised techniques are further divided into
    four sub categories 1) Visualization techniques
    2) Cluster analysis 3) Factor analysis 4)
    Discrete multivariate analysis.

112
Bayesian Unsupervised Model-based Visualization
LDA
BSMV
SUPERVISED
UNSUPERVISED
VISUALIZATION TECH.
CLUSTER ANALYSIS
EFA
DISC. MULTIV. ANAL.
113
Bayesian Unsupervised Model-based Visualization
  • According to Venables and Ripley (id.),
    visualization techniques are often more effective
    than clustering techniques discovering
    interesting groupings in the data, and they avoid
    the danger of over-interpretation of the results
    as researcher is not allowed to input the number
    of expected latent dimensions.
  • In cluster analysis the centroids that represent
    the clusters are still high-dimensional, and some
    additional illustration techniques are needed for
    visualization (Kaski, 1997), for example MDS
    (Kim, Kwon Cook, 2000).

114
Bayesian Unsupervised Model-based Visualization
  • Several graphical means have been proposed for
    visualizing high-dimensional data items directly,
    by letting each dimension govern some aspect of
    the visualization and then integrating the
    results into one figure.
  • These techniques can be used to visualize any
    kinds of high-dimensional data vectors, either
    the data items themselves or vectors formed of
    some descriptors of the data set like the
    five-number summaries (Tukey, 1977).

115
Bayesian Unsupervised Model-based Visualization
  • Simplest technique to visualize a data set is to
    plot a profile of each item, that is, a
    two-dimensional graph in which the dimensions are
    enumerated on the x-axis and the corresponding
    values on the y-axis.
  • Other alternatives are scatter plots and pie
    diagrams.

116
Bayesian Unsupervised Model-based Visualization
  • The major drawback that applies to all these
    techniques is that they do not reduce the amount
    of data.
  • If the data set is large, the display consisting
    of all the data items portrayed separately will
    be incomprehensible. (Kaski, 1997.)
  • Techniques reducing the dimensionality of the
    data items are called projection techniques.

117
Bayesian Unsupervised Model-based Visualization
LDA
BSMV
SUPERVISED
UNSUPERVISED
VISUALIZATION TECH.
CLUSTER ANALYSIS
EFA
DISC. MULTIV. ANAL.
REDUCING
NON-REDUC.
PROJECTION TECH.
118
Bayesian Unsupervised Model-based Visualization
  • The goal of the projection is to represent the
    input data items in a lower-dimensional space in
    such a way that certain properties of the
    structure of the data set are preserved as
    faithfully as possible.
  • The projection can be used to visualize the data
    set if a sufficiently small output dimensionality
    is chosen. (id.)
  • Projection techniques are divided into two major
    groups, linear and non-linear projection
    techniques.

119
Bayesian Unsupervised Model-based Visualization
LDA
BSMV
SUPERVISED
UNSUPERVISED
VISUALIZATION TECH.
CLUSTER ANALYSIS
EFA
DISC. MULTIV. ANAL.
REDUCING
NON-REDUC.
PROJECTION TECH.
NON-LINEAR
LINEAR
120
Bayesian Unsupervised Model-based Visualization
  • Linear projection techniques consist of principal
    component analysis (PCA) and projection pursuit.
  • In exploratory projection pursuit (Friedman,
    1987) the data is projected linearly, but this
    time a projection, which reveals as much of the
    non-normally distributed structure of the data
    set as possible is sought.
  • This is done by assigning a numerical
    interestingness index to each possible
    projection, and by maximizing the index.
  • The definition of interestingness is based on how
    much the projected data deviates from normally
    distributed data in the main body of its
    distribution.

121
Bayesian Unsupervised Model-based Visualization
LDA
BSMV
SUPERVISED
UNSUPERVISED
VISUALIZATION TECH.
CLUSTER ANALYSIS
EFA
DISC. MULTIV. ANAL.
REDUCING
NON-REDUC.
PROJECTION TECH.
NON-LINEAR
LINEAR
PCA
PROJ.PUR.
122
Bayesian Unsupervised Model-based Visualization
  • Non-linear unsupervised projection techniques
    consist of multidimensional scaling, principal
    curves and various other techniques including
    SOM, neural networks and Bayesian unsupervised
    networks (Kontkanen, Lahtinen, Myllymäki Tirri,
    2000).

123
Bayesian Unsupervised Model-based Visualization
LDA
BSMV
SUPERVISED
UNSUPERVISED
VISUALIZATION TECH.
CLUSTER ANALYSIS
EFA
DISC. MULTIV. ANAL.
REDUCING
NON-REDUC.
PROJECTION TECH.
NON-LINEAR
LINEAR
NEUR.N.
MDS
PCA
SOM
PRIN.C.
ICA
BUMV
PROJ.PUR.
124
Bayesian Unsupervised Model-based Visualization
  • Aforementioned PCA technique, despite its
    popularity, cannot take into account non-linear
    structures, structures consisting of arbitrarily
    shaped clusters or curved manifolds since it
    describes the data in terms of a linear subspace.
  • Projection pursuit tries to express some
    non-linearities, but if the data set is
    high-dimensional and highly non-linear it may be
    difficult to visualize it with linear projections
    onto a low-dimensional display even if the
    projection angle is chosen carefully (Friedman,
    1987).

125
Bayesian Unsupervised Model-based Visualization
  • Several approaches have been proposed for
    reproducing non-linear higher-dimensional
    structures on a lower-dimensional display.
  • The most common techniques allocate a
    representation for each data point in the
    lower-dimensional space and try to optimize these
    representations so that the distances between
    them would be as similar as possible to the
    original distances of the corresponding data
    items.
  • The techniques differ in how the different
    distances are weighted and how the
    representations are optimized. (Kaski, 1997.)

126
Bayesian Unsupervised Model-based Visualization
  • Multidimensional scaling (MDS) is not one
    specific tool, instead it refers to a group of
    techniques that is widely used especially in
    behavioral, econometric, and social sciences to
    analyze subjective evaluations of pairwise
    similarities of entities.
  • The starting point of MDS is a matrix consisting
    of the pairwise dissimilarities of the entities.
  • The basic idea of the MDS technique is to
    approximate the original set of distances with
    distances corresponding to a configuration of
    points in a Euclidean space.

127
Bayesian Unsupervised Model-based Visualization
  • MDS can be considered to be an alternative to
    factor analysis.
  • In general, the goal of the analysis is to detect
    meaningful underlying dimensions that allow the
    researcher to explain observed similarities or
    dissimilarities (distances) between the
    investigated objects.
  • In factor analysis, the similarities between
    objects (e.g., variables) are expressed in the
    correlation matrix.

128
Bayesian Unsupervised Model-based Visualization
  • With MDS we may analyze any kind of similarity or
    dissimilarity matrix, in addition to correlation
    matrices, specifying that we want to reproduce
    the distances based on n dimensions.
  • After formation of matrix MDS attempts to arrange
    objects (e.g., factors of growth-oriented
    atmosphere) in a space with a particular number
    of dimensions so as to reproduce the observed
    distances.
  • As a result, the distances are explained in terms
    of underlying dimensions.

129
Bayesian Unsupervised Model-based Visualization
  • MDS based on Euclidean distance do not generally
    reflect properly to the properties of complex
    problem domains.
  • In real-world situations the similarity of two
    vectors is not a universal property in different
    points of view they in the end may appear quite
    dissimilar (Kontkanen, Lahtinen, Myllymäki,
    Silander Tirri, 2000).
  • Another problem with the MDS techniques is that
    they are computationally very intensive for large
    data sets.

130
Bayesian Unsupervised Model-based Visualization
  • Bayesian unsupervised model-based visualization
    (BUMV) is based on Bayesian Networks (BN).
  • BN is a representation of a probability
    distribution over a set of random variables,
    consisting of a directed acyclic graph (DAG),
    where the nodes correspond to domain variables,
    and the arcs define a set of independence
    assumptions which allow the joint probability
    distribution for a data vector to be factorized
    as a product of simple conditional probabilities.
    Two vectors are considered similar if they lead
    to similar predictions, when given as input to
    the same Bayesian network model. (Kontkanen,
    Lahtinen, Myllymäki, Silander Tirri, 2000.)

131
Bayesian Unsupervised Model-based Visualization
  • Naturally, there are numerous viable options to
    BUMV, such as Self-Organizing Map (SOM) and
    Independent Component Analysis (ICA).
  • SOM is a neural network algorithm that has been
    used for a wide variety of applications, mostly
    for engineering problems but also for data
    analysis (Kohonen, 1995).
  • SOM is based on neighborhood preserving
    topological map tuned according to geometric
    properties of sample vectors.
  • ICA minimizes the statistical dependence of the
    components trying to find a transformation in
    which the components are as statistically
    independent as possible (Hyvärinen Oja, 2000).
  • The usage of ICA is comparable to PCA where the
    aim is to present the data in a manner that
    facilitates further analysis.

132
Bayesian Unsupervised Model-based Visualization
  • First major difference between Bayesian and
    neural network approaches for educational science
    researcher is that the former operates with a
    familiar symmetrical probability range from 0 to
    1 while the upper limit of asymmetrical
    probability scale in the latter approach is
    unknown.
  • The second fundamental difference between the two
    types of networks is that a perceptron in the
    hidden layers of neural networks does not in
    itself have an interpretation in the domain of
    the system, whereas all the nodes of a Bayesian
    network represent concepts that are well defined
    with respect to the domain (Jensen, 1995).

133
Bayesian Unsupervised Model-based Visualization
  • The meaning of a node and its probability table
    can be subject to discussion, regardless of their
    function in the network, but it does not make any
    sense to discuss the meaning of the nodes and the
    weights in a neural network Perceptrons in the
    hidden layers only have a meaning in the context
    of the functionality of the network.
  • Construction of a Bayesian network requires
    detailed knowledge of the domain in question.
  • If such knowledge can only be obtained through a
    series of examples (i.e., a data base of cases),
    neural networks seem to be an easier approach.
    This might be true in cases such as the reading
    of handwritten letters, face recognition, and
    other areas where the activity is a 'craftsman
    like' skill based solely on experience.

(Jensen, 1995.)
134
Bayesian Unsupervised Model-based Visualization
  • It is often criticized that in order to construct
    a Bayesian network you have to know too many
    probabilities.
  • However, there is not a considerable difference
    between this number and the number of weights and
    thresholds that have to be known in order to
    build a neural network, and these can only be
    learnt by training.
  • A weakness of neural networks tis hat you are
    unable to utilize the knowledge you might have in
    advance.
  • Probabilities, on the other hand, can be assessed
    using a combination of theoretical insight,
    empiric studies independent of the constructed
    system, training, and various more or less
    subjective estimates.

(Jensen, 1995.)
135
Bayesian Unsupervised Model-based Visualization
  • In the construction of a neural network, it is
    decided in advance about which relations
    information is gathered, and which relations the
    system is expected to compute (the route of
    inference is fixed).
  • Bayesian networks are much more flexible in that
    respect.

(Jensen, 1995.)
136
For an example of practical use of BUMV, see
Nokelainen and Ruohotie (2009).
137
Results showed that managers and teachers had
higher growth motivation and level of commitment
to work than other personnel, including job
titles such as cleaner, caretaker, accountant and
computer support. Employees across all job
titles in the organization, who have temporary or
part-time contracts, had higher self-reported
growth motivation and commitment to work and
organization than their established colleagues.
138
(No Transcript)
139
Links
  • B-Course http//b-course.cs.helsinki.fi
  • BayMiner http//www.bayminer.com

140
References
  • Anderson, J. (1995). Cognitive Psychology and its
    Implications. New York Freeman.
  • Bayes, T. (1763). An essay towards solving a
    problem in the doctrine of chances. Philosophical
    Transactions of the Royal Society, 53, 370-418.
  • Bernardo, J., Smith, A. (2000). Bayesian
    theory. New York Wiley.
  • Congdon, P. (2001). Bayesian Statistical
    Modelling. Chichester John Wiley Sons.
  • Friedman, J. (1987). Exploratory Projection
    Pursuit. Journal of American Statistical
    Association, 82, 249-266.
  • Gigerenzer, G. (2000). Adaptive thinking. New
    York Oxford University Press.
  • Gigerenzer, G., Krauss, S., Vitouch, O. (2004).
    The null ritual What you always wanted to know
    about significance testing but were afraid to
    ask. In D. Kaplan (Ed.), The SAGE handbook of
    quantitative methodology for the social sciences
    (pp. 391-408). Thousand Oaks Sage.

141
References
  • Gill, J. (2002). Bayesian methods. A Social and
    Behavioral Sciences Approach. Boca Raton Chapman
    Hall/CRC.
  • Heckerman, D., Geiger, D., Chickering, D.
    (1995). Learning Bayesian networks The
    combination of knowledge and statistical data.
    Machine Learning, 20(3), 197-243.
  • Hilario, M., Kalousisa, A., Pradosa, J., Binzb,
    P.-A. (2004). Data mining for mass-spectra based
    diagnosis and biomarker discovery. Drug Discovery
    Today BIOSILICO, 2(5), 214-222.
  • Huberty, C. (1994). Applied Discriminant
    Analysis. New York John Wiley Sons.
  • Hyvärinen, A., Oja, E. (2000). Independent
    Component Analysis Algorithms and Applications.
    Neural Networks, 13(4-5), 411-430.
  • Jensen, F. V. (1995). Paradigms of Expert
    Systems. HUGIN Lite 7.4 User Manual.

142
References
  • Kaski, S. (1997). Data exploration using
    self-organizing maps. Doctoral dissertation. Acta
    Polytechnica Scandinavica, Mathematics, Computing
    and Management in Engineering Series No. 82.
    Espoo Finnish Academy of Technology.
  • Kim, S., Kwon, S., Cook, D. (2000). Interactive
    Visualization of Hierarchical Clusters Using MDS
    and MST. Metrika, 51(1), 3951.
  • Kohonen, T. (1995). Self-Organizing Maps. Berlin
    Springer.
  • Kontkanen, P., Lahtinen, J., Myllymäki, P.,
    Silander, T., Tirri, H. (2000). Supervised
    Model-based Visualization of High-dimensional
    Data. Intelligent Data Analysis, 4, 213-227.
  • Kontkanen, P., Lahtinen, J., Myllymäki, P.,
    Tirri, H. (2000). Unsupervised Bayesian
    Visualization of High-Dimensional Data. In R.
    Ramakrishnan, S. Stolfo, R. Bayardo, I. Parsa
    (Eds.), Proceedings of the Sixth International
    Conference on Knowledge Discovery and Data Mining
    (pp. 325-329). New York, NY The Association for
    Computing Machinery.

143
References
  • Lavine, M. L. (1999). What is Bayesian Statistics
    and Why Everything Else is Wrong. The Journal of
    Undergraduate Mathematics and Its Applications,
    20, 165-174.
  • Lindley, D. V. (1971). Making Decisions. London
    Wiley. Lindley, D. V. (2001). Harold Jeffreys. In
    C. C. Heyde E. Seneta (Eds.), Statisticians of
    the Centuries, (pp. 402-405). New York Springer.
  • Murphy, K. R., Myors, B. (1998). Statistical
    Power Analysis. A Simple and General Model for
    Traditional and Modern Hypothesis Tests. Mahwah,
    NJ Lawrence Erlbaum Associates.
  • Myllymäki, P., Silander, T., Tirri, H., Uronen,
    P. (2002). B-Course A Web-Based Tool for
    Bayesian and Causal Data Analysis. International
    Journal on Artificial Intelligence Tools, 11(3),
    369-387.
  • Myllymäki, P., Tirri, H. (1998).
    Bayes-verkkojen mahdollisuudet Possibilities of
    Bayesian Networks. Teknologiakatsaus 58/98.
    Helsinki TEKES.

144
References
  • Neapolitan, R. E., Morris, S. (2004).
    Probabilistic Modeling Using Bayesian Networks.
    In D. Kaplan (Ed.), The SAGE handbook of
    quantitative methodology for the social sciences
    (pp. 371-390). Thousand Oaks, CA Sage.
  • Nokelainen, P. (2008). Modeling of Professional
    Growth and Learning Bayesian Approach. Tampere
    Tampere University Press.
  • Nokelainen, P., Ruohotie, P. (2009).
    Investigating Growth Prerequisites in a Finnish
    Polytechnic for Higher Education. Journal of
    Workplace Learning, 21(1), 36-57.
  • Nokelainen, P., Silander, T., Ruohotie, P.,
    Tirri, H. (2007). Investigating the Number of
    Non-linear and Multi-modal Relationships Between
    Observed Variables Measuring A Growth-oriented
    Atmosphere. Quality Quantity, 41(6), 869-890.
  • Nokelainen, P., Tirri, K. (2007). Empirical
    Investigation of Finnish School Principals'
    Emotional Leadership Competencies. In S. Saari
    T. Varis (Eds.), Professional Growth (pp.
    424-438). Hämeenlinna RCVE.

145
References
  • Nokelainen, P., Ruohotie, P., Tirri, H. (1999).
    Professional Growth Determinants-Comparing
    Bayesian and Linear Approaches to Classification.
    In P. Ruohotie, H. Tirri, P. Nokelainen, T.
    Silander (Eds.), Modern Modeling of Professional
    Growth, vol. 1 (pp. 85-120). Hämeenlinna RCVE.
  • Nokelainen, P., Tirri, K. (2010). Role of
    Motivation in the Moral and Religious Judgment of
    Mathematically Gifted Adolescents. High Ability
    Studies, 21(2), 101-116.
  • Nokelainen, P., Tirri, K., Campbell, J. R.,
    Walberg, H. (2004). Cross-cultural Factors that
    Account for Adult Productivity. In J. R.
    Campbell, K. Tirri, P. Ruohotie, H. Walberg
    (Eds.), Cross-cultural Research Basic Issues,
    Dilemmas, and Strategies (pp. 119-139).
    Hämeenlinna RCVE.
  • Nokelainen, P., Tirri, K., Merenti-Välimäki,
    H.-L. (2007). Investigating the Influence of
    Attribution Styles on the Development of
    Mathematical Talent. Gifted Child Quarterly,
    51(1), 64-81.
  • Pylväs, L., Nokelainen, P., Roisko, H. (in
    press). Modeling of Vocational Excellence in Air
    Traffic Control. Submitted for review.

146
References
  • Tirri, H. (1997). Plausible Prediction by
    Bayesian Interface. Department of Computer
    Science. Series of Publications A. Report
    A-1997-1. Helsinki University of Helsinki.
  • Tukey, J. (1977). Exploratory Data Analysis.
    Reading, MA Addison-Wesley.
  • Venables, W. N., Ripley, B. D. (2002). Mo
Write a Comment
User Comments (0)
About PowerShow.com