Statistical Relational Learning - PowerPoint PPT Presentation

1 / 37
About This Presentation
Title:

Statistical Relational Learning

Description:

Probabilistic graphical models (particularly Bayesian network) have been shown ... Good test taker {yes, no} Homework grade {A, B, C, D, E} Exam grade {A, B, C, D, E} ... – PowerPoint PPT presentation

Number of Views:64
Avg rating:3.0/5.0
Slides: 38
Provided by: Eva951
Category:

less

Transcript and Presenter's Notes

Title: Statistical Relational Learning


1
Statistical Relational Learning

2
Acknowledgements
  • Lise Getoor, Nir Friedman, Daphne Koller, Ben
    Taskar, Avi Pfeffer, David Jensen, Pedro
    Domingos, Indrajit Bhattacharya and many others

3
Why SRL?
  • Probabilistic graphical models (particularly
    Bayesian network) have been shown to be a useful
    way of representing statistical patterns in real
    world domain
  • Probabilistic relational models (PRMs) are a
    recent development the extend the standard
    attribute-based Bayesian network representation
    to incorporate a much richer relational structure
  • Allow the specification of a probability model
    for classes of objects rather than simple
    attributes.
  • Allow properties of an entity to depend
    probabilistically on properties of other related
    entities.

4
An example
  • A simple model of the performance of a student in
    a course
  • Random variables
  • Course difficulty high, medium, low
  • Student intelligence high, low
  • Understands material yes, no
  • Good test taker yes, no
  • Homework grade A, B, C, D, E
  • Exam grade A, B, C, D, E

5
Complete joint probability distribution
  • Must specify a probability for each of the
    exponentially many different instantiations of
    the set
  • P(I,D,G,U,E,H) will consider all possible
    assignment of the value of these variables
  • 222355600
  • The naïve representationof the joint
    distribution isinfeasible

6
Bayesian networks
  • Key insight each variable is directly influenced
    by only a few others
  • Probabilistic conditional independence each node
    is conditionally independent of its
    non-descendants given values for its parents
  • Associate with each node a conditional
    probability distribution (CPD), which specifies
    for each node X the probability distribution over
    the values of X given each combination of values
    for its parents, denoted as Pa(X)
  • The joint distribution can be factorized into a
    product of CPDs of all the variables

7
Bayesian networks
  • Bayesian networks provide a compact
    representation of complex joint distributions

8
However
  • Bayesian networks are often inadequate to
    properly model aspects of complex relational
    domains.
  • A Bayesian network for a given domain involves a
    pre-specified set of random variables, whose
    relationship to each other is fixed in advance.
  • They cannot deal with domains where we may
    encounterseveral entities in a varietyof
    configurations, becausethey lack the concept
    ofan object (or domain entity)

If we treat the circles as random variables,then
how to handle the AVG dependence
9
Introduction to PRMs
  • PRMs extend Bayesian networks with the concepts
    of individuals, their properties and relations
    between them
  • The relational framework is motivated primarily
    by the concepts of relational database
  • Relational database vs. PRM schema instance

10
PRMs definition Relational schema
  • A schema for a relational model describe a set of
    classes, ?X1, X2,Xn.
  • The domain entities of a class is called objects
  • Each class is associated with a set of
    descriptive attributes and a set of reference
    slots
  • Each class correspond to a single table
  • Descriptive attributes correspond to standard
    attributes in the table
  • Reference slots correspond to attributes that are
    foreign keys

11
A more complex school example
  • The rectangles are classes
  • The underlined attributes are reference slots and
    the others are descriptive attributes

Each single table is a class
Foreign keys are reference slots
Standard attributes are descriptive attributes
12
Another movie example
  • Movies and actors

Each single table is a class
Foreign keys are reference slots
Standard attributes are descriptive attributes
13
Descriptive attributes
  • The set of descriptive attribute of a class X is
    denoted A(X), attributes of class X is denoted
    X.A, and its domain value is V(X.A)
  • Examples
  • A(Student) Intelligence, Ranking
  • V(Student.Intellegence) high, low
  • V(Actor.Gender) male, female

14
Reference slots
  • The set of reference slots of a class X is
    denoted R(X). we use X.? to denote the reference
    slot ? of X
  • The domain type Dom? X
  • The range type Range? Y, Y is some class in ?
  • Examples
  • R(Registration) Course, Student
  • RangeCourse.Instructor Professor
  • R(Role) Actor, Movie
  • RangeRole.Actor Actor

15
Inverse slot slot chain
  • For each reference slot ?, we can define an
    inverse slot ?-1, interpreted as the inverse
    function of ?
  • We define a slot chain t ?1,,?k to be a
    sequence of slots such that for all i. Range?i
    Dom?i1. The slot chain allows us to compose
    slots, defining functions from objects to other
    indirectly related objects
  • Examples
  • The inverse slot for the Student slot of
    Registration is called Registered-In
  • Student.Registered-In.Course.Instructor can be
    used to denote a students set of instructors

16
Schema Instance
  • An instance I of a schema specifies
  • A set of objects x, partitioned into classes
  • A value for each descriptive attribute x.A
  • A value for each reference slot x.?, which is an
    object of the appropriate range type
  • A complete instantiation I is a set of object
    with no missing values and no dangling references
  • School example
  • One Professor
  • Two Classes
  • Three Registration
  • Two Students

17
PRMs definition relational skeleton
  • A relational skeleton s of a relational schema is
    a partial specification of an instance of the
    schema.
  • It specifies the set of objects for each class
    and the relations(dependency structure) that hold
    between objects (similar to Bayesian network
    structure)
  • It leaves the value of the attributes
    unspecified
  • A PRM will specify a probability distributions
    over all complete instantiations that extend
    the skeleton

18
PRMs definition relational skeleton
  • The dependency structure of a relational skeleton
    is defined by associating with each attributes
    X.A a set of formal parents Pa(X.A)
  • X.A can depend on another probabilistic attribute
    B of X
  • X.A can also depend on attributes of related
    object X.t.C, where t is a slot chain
  • The class-level dependencies are instantiated
    according to the relational skeleton, to define
    object-level dependency
  • We use s(X) to refer the set of objects of class
    X
  • Let x be some object in s(X), the actual parent
    of x.A is x.B
  • The formal parent of x.A is y.C, where y belongs
    to x.t

19
Differences to Bayesian networks
  • The PRM defines the dependency model at the class
    level, allowing it to be used for any object in
    the class. The class dependency model is
    universally quantified and instantiated for every
    object in the class domain
  • The PRM explicitly uses the relational structure
    of the model, in that it allows the probabilistic
    model of an attribute of an object to depend also
    on attribute of related objects.The specific set
    of related objects can vary with the relational
    skeleton s

20
Definition of PRM
  • A probabilistic relational model (PRM) ? for a
    relational schema S define for each class X ??and
    each descriptive attribute A ? A(X), a set of
    formal parents Pa(X.A) ,and a conditional
    probability distribution (CPD) that represents
    P(X.APa(X.A))
  • A PRM consists of two components the qualitative
    dependency structure S, and the set of parameters
    ?S associated with it. For the basic PRM for
    attribute uncertainty, we assume that the
    relational skeleton sr is given

21
Qualitative structure
  • The qualitative structure of the network is
    defined via an instance dependency graph Gs,
    whose nodes correspond to descriptive attributes
    x.A of objects in the skeleton, and the edges
    correspond to the direct attributes dependence
    and the slot chain dependence
  • Note that the slot chain x.t might be
    multi-valued, we must specify the probabilistic
    dependence of x.A on the multi-set y.B y ?
    x.t.
  • It is impractical to provide a dependency model
    for each of the unboundedly many possible
    multi-set size.
  • We use an aggregate function and define a
    dependence on the computed aggregate value

22
Aggregate function
  • The dependence of x.A on x.t.B is interpreted as
    a probabilistic dependence of x.A on some
    aggregate property of this multi-set.
  • Many natural and useful notions of
    aggregate?Mean, median, maximum, cardinality
    etc.
  • We allow X.A to have a parent ?(X.t.B ).The
    semantic is that for any x ? X, x.A will depend
    on the value of ?(x.t.B ). We define V(?(X.t.B
    )) to be the set of possible values of this
    aggregate

23
Parameters
  • A PRM associates a CPD for each attribute of each
    class. As for dependencies, we assume that the
    parameters are shared by each object in the
    class.
  • The school example
  • P(GD,I)
  • P(Ravg(G))An aggregate is used here

24
PRM semantics
  • Given a skeleton sr, we have a set of random
    variables of interest. The set of random
    variables for sr is the set of attributes of the
    form x.A, where x ?s(Xi) ,and A ? A(Xi) fpr some
    class Xi
  • The PRM specifies a probability distribution over
    the possible joint assignments if values to these
    random variables. It basically defines a ground
    Bayesian network.
  • This ground Bayesian network leads to the
    following chain rule which defines a distribution
    over the instantiations compatible with the
    skeleton sr

Each attribute
Each object
Each class
25
Differences to Bayesian networks
  • Our random variables are the attributes of a set
    of objects.
  • The set of parents of a random variable can vary
    according to the relational context of the object
    the set of object to which it is related
  • The school example

The attribute Grade of the class Registration
depends on the attributes Intelligence of the
class Student.
For the Registration object 5639, it references
Jane-Doe.Intelligence.
But for some other Registration objects, they
might reference Bob.Intelligence or
Tony.Intelligence
26
Coherent probability distribution
  • We have to ensure that the resulting function
    from instances to numbers does indeed define a
    coherent probability, where the sum of the
    probability of all instances is 1
  • In Bayesian network, the requirement is satisfied
    if the dependency graph is acyclic a variable is
    not an ancestor of itself
  • We need to check whether a dependency structure S
    is acyclic relative to a fixed skeleton s

27
Dependency graph
  • A stronger guarantee acyclic class dependency
    graph
  • The class dependency graph has an edge from Y.B
    to X.A if either XY and X.B is a parent of X.A,
    or ?(X.t.B ) is a parent of X.A and
    RangeX.tY
  • It is clear that if the classdependency graph is
    acyclic, we can neverhave that x.A depends on
    itself. The school example

28
HoweverAnother example
  • Blood test
  • A cycle in the class dependency graph does not
    imply that all skeleton induce cyclic instance
    dependencies
  • Although the model appears to be cyclic at the
    class level, we know that the cyclicity is always
    resolved at the level of individual objects

29
Learning PRMs
  • Input
  • A relational schema which specifies the basic
    vocabulary in the domain the set of classes,
    the attributes associated with different classes,
    and the possible types of relations between
    objects in the different classes
  • The training data consists of a fully specified
    instance of that schema
  • Learning task
  • Parameter estimation
  • Structure learning

30
Parameter estimation
  • We assume that the qualitative dependency
    structure S is known
  • The key ingredient in parameter estimation is the
    likelihood function, the probability of the data
    given the model
  • Then performing Maximum likelihood estimation
    (MLE), to find the parameter setting ?S that
    maximize the likelihood L(?SI,s,S) for a given
    I,s, and S. The maximum likelihood model is the
    model that best predicts the training data
  • We can also take a Bayesian approach to parameter
    estimation by incorporating parameter priors

31
Structure learning
  • Hypothesis space
  • Specify which structures are legal candidate
    hypothesisAcyclic dependency graph
  • Scoring structures
  • Bayesian scoreThis score is composed of the
    prior probability of the structure and the
    posterior probability of the structure given the
    data
  • Structure search
  • Hill-climbing search
  • A heuristic search algorithm

32
A heuristic search algorithm
Phase 0 consider only dependencies within a class
Author
Review
Paper
33
A heuristic search algorithm
Phase 1 consider dependencies from neighboring
classes, via schema relations
Author
Review
Paper
Author
Review
Paper
Add P.A?R.M
? score
Author
Review
Paper
34
A heuristic search algorithm
Phase 2 consider dependencies from further
classes, via relation chains
Author
Review
Paper
Author
Review
Paper
Add R.M?A.W
Author
Review
Paper
? score
35
Limitations of BNs
  • In BN, each instance has its own dependency
    model, cannot generalize over instances
  • If John tends to like sitcoms, he will probably
    likenext seasons offerings
  • whether a person enjoys sitcom reruns dependson
    whether they watch primetime sitcoms
  • BN can only model relationships between atmost
    one class of instances at a time
  • In previous model, cannot model
    relationshipsbetween people
  • if my roommate watches Seinfeld I am morelikely
    to join in

36
PRM Summary
  • PRMs inherit key advantages of probabilistic
    graphical models
  • Coherent probabilistic semantics
  • Exploit structure of local interactions
  • Relational models inherently more expressive
  • Web of influence use multiple sources of
    information to reach conclusions
  • Exploit both relational information and power of
    probabilistic reasoning

37
Probabilistic Relational Models (PRMs)
  • Developed by Daphne Kollers group at Stanford
  • representation Avi Pfeffer
  • builds on work in KBMC (knowledge-based model
    construction) by Haddawy, Poole, Wellman and
    others
  • Object Oriented Bayesian Networks
  • Relational Probability Models
  • learning Lise Getoor, Nir Friedman, Avi
  • Attribute Uncertainty
  • Structural Uncertainty
  • Class Uncertainty
  • Identity Uncertainty
  • undirected models Ben Taskar
  • Reference
  • Learning Probabilistic Models of Link Structure.
     Lise Getoor, Nir Friedman, Daphne Koller,
    Benjamin Taskar. Journal of Machine Learning
    Research, Volume 3, page 679- -707 - 2002

38
Families of SRL Approaches
  • Frame-based Probabilistic Models
  • Probabilistic Relational Models (PRMs),
  • Probabilistic Entity Relation Models (PERs),
  • Object Oriented Bayesian Networks (OOBNs)
  • First Order Probabilistic Logic (FOPL)
  • BLOGs
  • Relational Markov Logic (RML)
  • Stochastic Functional Programs
  • PRISM
  • Stochastic Logic Programs (SLPs)
  • IBAL

39
Conclusion
  • Statistical Relational Learning
  • Supports multi-relational, heterogeneous domains
  • Supports noisy, uncertain, non-IID data
  • aka, real-world data!
  • Different approaches
  • rule-based vs. frame-based
  • directed vs. undirected
  • Many common issues
  • Need for collective classification and
    consolidation
  • Need for aggregation and combining rules
  • Need to handle labeled and unlabeled data
  • Need to handle structural uncertainty
  • etc.
  • Great opportunity for combining machine learning
    for hierarchical statistical models with
    probabilistic databases which can efficiently
    store, query, update models

40
Recent SRL Activities
  • Dagstuhl Workshop on Probabilistic, Logical and
    Relational Learning - Towards a
    Synthesishttp//www.dagstuhl.de/05051/
  • ICML 2004 workshop on Statistical Relational
    Learning and its Connections to Other
    Fieldshttp//www.cs.umd.edu/projects/srl2004/
  • IJCAI 2003 workshop on Statistical Relational
    Learninghttp//kdl.cs.umass.edu/srl2003/
  • AAAI 2000 workshop on Statistical Relational
    Learninghttp//robotics.stanford.edu/srl
  • Several related workshops
  • KDD MRDM workshops
  • http//www-ai.ijs.si/SasoDzeroski/MRDM2004/
  • http//www-ai.ijs.si/SasoDzeroski/MRDM2003/
  • http//www-ai.ijs.si/SasoDzeroski/MRDM2002/
Write a Comment
User Comments (0)
About PowerShow.com