Machine Learning For the Web: A Unified View - PowerPoint PPT Presentation

1 / 73
About This Presentation
Title:

Machine Learning For the Web: A Unified View

Description:

Includes joint work with Stanley Kok, Daniel Lowd, Hoifung Poon, ... The lifted network construction algo. finds it. BP on lifted network gives same result as ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 74
Provided by: pedr47
Category:
Tags: algo | learning | machine | unified | view | web

less

Transcript and Presenter's Notes

Title: Machine Learning For the Web: A Unified View


1
Machine LearningFor the WebA Unified View
  • Pedro Domingos
  • Dept. of Computer Science Eng.
  • University of Washington
  • Includes joint work with Stanley Kok, Daniel
    Lowd,Hoifung Poon, Matt Richardson, Parag
    Singla,Marc Sumner, and Jue Wang

2
Overview
  • Motivation
  • Background
  • Markov logic
  • Inference
  • Learning
  • Software
  • Applications
  • Discussion

3
Web Learning Problems
  • Hypertext classification
  • Search ranking
  • Personalization
  • Recommender systems
  • Wrapper induction
  • Information extraction
  • Information integration
  • Deep Web
  • Semantic Web
  • Ad placement
  • Content selection
  • Auctions
  • Social networks
  • Mass collaboration
  • Spam filtering
  • Reputation systems
  • Performance optimization
  • Etc.

4
Machine Learning Solutions
  • Naïve Bayes
  • Logistic regression
  • Max. entropy models
  • Bayesian networks
  • Markov random fields
  • Log-linear models
  • Exponential models
  • Gibbs distributions
  • Boltzmann machines
  • ERGMs
  • Hidden Markov models
  • Cond. random fields
  • SVMs
  • Neural networks
  • Decision trees
  • K-nearest neighbor
  • K-means clustering
  • Mixture models
  • LSI
  • Etc.

5
How Do We Make Sense of This?
  • Does a practitioner have to learn all the
    algorithms?
  • And figure out which one to use each time?
  • And which variations to try?
  • And how to frame the problem as ML?
  • And how to incorporate his/her knowledge?
  • And how to glue the pieces together?
  • And start from scratch each time?
  • There must be a better way

6
Characteristics of Web Problems
  • Samples are not i.i.d.(objects depend on each
    other)
  • Objects have lots of structure (or none at all)
  • Multiple problems are tied together
  • Massive amounts of data (but unlabeled)
  • Rapid change
  • Too many opportunities . . .and not enough
    experts

7
We Need a Language
  • That allows us to easily define standard models
  • That provides a common framework
  • That is automatically compiled into learning and
    inference code that executes efficiently
  • That makes it easy to encode practitioners
    knowledge
  • That allows models to be composedand reused

8
Markov Logic
  • Syntax Weighted first-order formulas
  • Semantics Templates for Markov nets
  • Inference Lifted belief propagation, etc.
  • Learning Voted perceptron, pseudo-likelihood,
    inductive logic programming
  • Software Alchemy
  • Applications Information extraction,text
    mining, social networks, etc.

9
Overview
  • Motivation
  • Background
  • Markov logic
  • Inference
  • Learning
  • Software
  • Applications
  • Discussion

10
Markov Networks
  • Undirected graphical models

Cancer
Smoking
Cough
Asthma
  • Potential functions defined over cliques

Smoking Cancer ?(S,C)
False False 4.5
False True 4.5
True False 2.7
True True 4.5
11
Markov Networks
  • Undirected graphical models

Cancer
Smoking
Cough
Asthma
  • Log-linear model

Weight of Feature i
Feature i
12
First-Order Logic
  • Symbols Constants, variables, functions,
    predicatesE.g. Anna, x, MotherOf(x), Friends(x,
    y)
  • Logical connectives Conjunction, disjunction,
    negation, implication, quantification, etc.
  • Grounding Replace all variables by
    constantsE.g. Friends (Anna, Bob)
  • World Assignment of truth values to all ground
    atoms

13
Example Friends Smokers
14
Example Friends Smokers
15
Example Friends Smokers
16
Overview
  • Motivation
  • Background
  • Markov logic
  • Inference
  • Learning
  • Software
  • Applications
  • Discussion

17
Markov Logic
  • A logical KB is a set of hard constraintson the
    set of possible worlds
  • Lets make them soft constraintsWhen a world
    violates a formula,It becomes less probable, not
    impossible
  • Give each formula a weight(Higher weight ?
    Stronger constraint)

18
Definition
  • A Markov Logic Network (MLN) is a set of pairs
    (F, w) where
  • F is a formula in first-order logic
  • w is a real number
  • Together with a set of constants,it defines a
    Markov network with
  • One node for each grounding of each predicate in
    the MLN
  • One feature for each grounding of each formula F
    in the MLN, with the corresponding weight w

19
Example Friends Smokers
20
Example Friends Smokers
Two constants Anna (A) and Bob (B)
21
Example Friends Smokers
Two constants Anna (A) and Bob (B)
Smokes(A)
Smokes(B)
Cancer(A)
Cancer(B)
22
Example Friends Smokers
Two constants Anna (A) and Bob (B)
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
23
Example Friends Smokers
Two constants Anna (A) and Bob (B)
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
24
Example Friends Smokers
Two constants Anna (A) and Bob (B)
Friends(A,B)
Smokes(A)
Friends(A,A)
Smokes(B)
Friends(B,B)
Cancer(A)
Cancer(B)
Friends(B,A)
25
Markov Logic Networks
  • MLN is template for ground Markov nets
  • Probability of a world x
  • Typed variables and constants greatly reduce size
    of ground Markov net
  • Functions, existential quantifiers, etc.
  • Infinite and continuous domains

Weight of formula i
No. of true groundings of formula i in x
26
Relation to Statistical Models
  • Special cases
  • Markov networks
  • Markov random fields
  • Bayesian networks
  • Log-linear models
  • Exponential models
  • Max. entropy models
  • Gibbs distributions
  • Boltzmann machines
  • Logistic regression
  • Hidden Markov models
  • Conditional random fields
  • Markov logic allows objects to be interdependent
    (non-i.i.d.)
  • Markov logic makes it easy to combine and reuse
    these models

27
Relation to First-Order Logic
  • Infinite weights ? First-order logic
  • Satisfiable KB, positive weights ? Satisfying
    assignments Modes of distribution
  • Markov logic allows contradictions between
    formulas

28
Overview
  • Motivation
  • Background
  • Markov logic
  • Inference
  • Learning
  • Software
  • Applications
  • Discussion

29
Inference
  • MAP/MPE state
  • MaxWalkSAT
  • LazySAT
  • Marginal and conditional probabilities
  • MCMC Gibbs, MC-SAT, etc.
  • Knowledge-based model construction
  • Lifted belief propagation

30
Inference
  • MAP/MPE state
  • MaxWalkSAT
  • LazySAT
  • Marginal and conditional probabilities
  • MCMC Gibbs, MC-SAT, etc.
  • Knowledge-based model construction
  • Lifted belief propagation

31
Lifted Inference
  • We can do inference in first-order logic without
    grounding the KB (e.g. resolution)
  • Lets do the same for inference in MLNs
  • Group atoms and clauses into indistinguishable
    sets
  • Do inference over those
  • First approach Lifted variable elimination(not
    practical)
  • Here Lifted belief propagation

32
Belief Propagation
Features (f)
Nodes (x)
33
Lifted Belief Propagation
Features (f)
Nodes (x)
34
Lifted Belief Propagation
Features (f)
Nodes (x)
35
Lifted Belief Propagation
?,? Functions of edge counts
?
?
Features (f)
Nodes (x)
36
Lifted Belief Propagation
  • Form lifted network composed of supernodesand
    superfeatures
  • Supernode Set of ground atoms that all send
    andreceive same messages throughout BP
  • Superfeature Set of ground clauses that all send
    and receive same messages throughout BP
  • Run belief propagation on lifted network
  • Guaranteed to produce same results as ground BP
  • Time and memory savings can be huge

37
Forming the Lifted Network
  • 1. Form initial supernodesOne per predicate and
    truth value(true, false, unknown)
  • 2. Form superfeatures by doing joins of their
    supernodes
  • 3. Form supernodes by projectingsuperfeatures
    down to their predicatesSupernode Groundings
    of a predicate with same number of projections
    from each superfeature
  • 4. Repeat until convergence

38
Theorem
  • There exists a unique minimal lifted network
  • The lifted network construction algo. finds it
  • BP on lifted network gives same result ason
    ground network

39
Representing SupernodesAnd Superfeatures
  • List of tuples Simple but inefficient
  • Resolution-like Use equality and inequality
  • Form clusters (in progress)

40
Overview
  • Motivation
  • Background
  • Markov logic
  • Inference
  • Learning
  • Software
  • Applications
  • Discussion

41
Learning
  • Data is a relational database
  • Closed world assumption (if not EM)
  • Learning parameters (weights)
  • Generatively
  • Discriminatively
  • Learning structure (formulas)

42
Generative Weight Learning
  • Maximize likelihood
  • Use gradient ascent or L-BFGS
  • No local maxima
  • Requires inference at each step (slow!)

No. of true groundings of clause i in data
Expected no. true groundings according to model
43
Pseudo-Likelihood
  • Likelihood of each variable given its neighbors
    in the data Besag, 1975
  • Does not require inference at each step
  • Consistent estimator
  • Widely used in vision, spatial statistics, etc.
  • But PL parameters may not work well forlong
    inference chains

44
Discriminative Weight Learning
  • Maximize conditional likelihood of query (y)
    given evidence (x)
  • Approximate expected counts by counts in MAP
    state of y given x

No. of true groundings of clause i in data
Expected no. true groundings according to model
45
Voted Perceptron
  • Originally proposed for training HMMs
    discriminatively Collins, 2002
  • Assumes network is linear chain

wi ? 0 for t ? 1 to T do yMAP ? Viterbi(x)
wi ? wi ? counti(yData) counti(yMAP) return
?t wi / T
46
Voted Perceptron for MLNs
  • HMMs are special case of MLNs
  • Replace Viterbi by MaxWalkSAT
  • Network can now be arbitrary graph

wi ? 0 for t ? 1 to T do yMAP ?
MaxWalkSAT(x) wi ? wi ? counti(yData)
counti(yMAP) return ?t wi / T
47
Structure Learning
  • Generalizes feature induction in Markov nets
  • Any inductive logic programming approach can be
    used, but . . .
  • Goal is to induce any clauses, not just Horn
  • Evaluation function should be likelihood
  • Requires learning weights for each candidate
  • Turns out not to be bottleneck
  • Bottleneck is counting clause groundings
  • Solution Subsampling

48
Structure Learning
  • Initial state Unit clauses or hand-coded KB
  • Operators Add/remove literal, flip sign
  • Evaluation function Pseudo-likelihood
    Structure prior
  • Search
  • Beam Kok Domingos, 2005
  • Shortest-first Kok Domingos, 2005
  • Bottom-up Mihalkova Mooney, 2007

49
Overview
  • Motivation
  • Background
  • Markov logic
  • Inference
  • Learning
  • Software
  • Applications
  • Discussion

50
Alchemy
  • Open-source software including
  • Full first-order logic syntax
  • MAP and marginal/conditional inference
  • Generative discriminative weight learning
  • Structure learning
  • Programming language features

alchemy.cs.washington.edu
51
Overview
  • Motivation
  • Background
  • Markov logic
  • Inference
  • Learning
  • Software
  • Applications
  • Discussion

52
Applications
  • Information extraction
  • Entity resolution
  • Link prediction
  • Collective classification
  • Web mining
  • Natural language processing
  • Social network analysis
  • Ontology refinement
  • Activity recognition
  • Intelligent assistants
  • Etc.

53
Information Extraction
Parag Singla and Pedro Domingos,
Memory-Efficient Inference in Relational
Domains (AAAI-06). Singla, P., Domingos, P.
(2006). Memory-efficent inference in relatonal
domains. In Proceedings of the Twenty-First
National Conference on Artificial
Intelligence (pp. 500-505). Boston, MA AAAI
Press. H. Poon P. Domingos, Sound and
Efficient Inference with Probabilistic and
Deterministic Dependencies, in Proc. AAAI-06,
Boston, MA, 2006. P. Hoifung (2006). Efficent
inference. In Proceedings of the Twenty-First
National Conference on Artificial Intelligence.
54
Segmentation
Author
Title
Venue
Parag Singla and Pedro Domingos,
Memory-Efficient Inference in Relational
Domains (AAAI-06). Singla, P., Domingos, P.
(2006). Memory-efficent inference in relatonal
domains. In Proceedings of the Twenty-First
National Conference on Artificial
Intelligence (pp. 500-505). Boston, MA AAAI
Press. H. Poon P. Domingos, Sound and
Efficient Inference with Probabilistic and
Deterministic Dependencies, in Proc. AAAI-06,
Boston, MA, 2006. P. Hoifung (2006). Efficent
inference. In Proceedings of the Twenty-First
National Conference on Artificial Intelligence.
55
Entity Resolution
Parag Singla and Pedro Domingos,
Memory-Efficient Inference in Relational
Domains (AAAI-06). Singla, P., Domingos, P.
(2006). Memory-efficent inference in relatonal
domains. In Proceedings of the Twenty-First
National Conference on Artificial
Intelligence (pp. 500-505). Boston, MA AAAI
Press. H. Poon P. Domingos, Sound and
Efficient Inference with Probabilistic and
Deterministic Dependencies, in Proc. AAAI-06,
Boston, MA, 2006. P. Hoifung (2006). Efficent
inference. In Proceedings of the Twenty-First
National Conference on Artificial Intelligence.
56
Entity Resolution
Parag Singla and Pedro Domingos,
Memory-Efficient Inference in Relational
Domains (AAAI-06). Singla, P., Domingos, P.
(2006). Memory-efficent inference in relatonal
domains. In Proceedings of the Twenty-First
National Conference on Artificial
Intelligence (pp. 500-505). Boston, MA AAAI
Press. H. Poon P. Domingos, Sound and
Efficient Inference with Probabilistic and
Deterministic Dependencies, in Proc. AAAI-06,
Boston, MA, 2006. P. Hoifung (2006). Efficent
inference. In Proceedings of the Twenty-First
National Conference on Artificial Intelligence.
57
State of the Art
  • Segmentation
  • HMM (or CRF) to assign each token to a field
  • Entity resolution
  • Logistic regression to predict same
    field/citation
  • Transitive closure
  • Alchemy implementation Seven formulas

58
Types and Predicates
token Parag, Singla, and, Pedro, ... field
Author, Title, Venue citation C1, C2,
... position 0, 1, 2, ... Token(token,
position, citation) InField(position, field,
citation) SameField(field, citation,
citation) SameCit(citation, citation)
59
Types and Predicates
token Parag, Singla, and, Pedro, ... field
Author, Title, Venue, ... citation C1, C2,
... position 0, 1, 2, ... Token(token,
position, citation) InField(position, field,
citation) SameField(field, citation,
citation) SameCit(citation, citation)
Optional
60
Types and Predicates
token Parag, Singla, and, Pedro, ... field
Author, Title, Venue citation C1, C2,
... position 0, 1, 2, ... Token(token,
position, citation) InField(position, field,
citation) SameField(field, citation,
citation) SameCit(citation, citation)
Evidence
61
Types and Predicates
token Parag, Singla, and, Pedro, ... field
Author, Title, Venue citation C1, C2,
... position 0, 1, 2, ... Token(token,
position, citation) InField(position, field,
citation) SameField(field, citation,
citation) SameCit(citation, citation)
Query
62
Formulas
Token(t,i,c) gt InField(i,f,c) InField(i,f,c)
ltgt InField(i1,f,c) f ! f gt
(!InField(i,f,c) v !InField(i,f,c)) Token(t,i
,c) InField(i,f,c) Token(t,i,c)
InField(i,f,c) gt SameField(f,c,c) SameField(
f,c,c) ltgt SameCit(c,c) SameField(f,c,c)
SameField(f,c,c) gt SameField(f,c,c) SameCit
(c,c) SameCit(c,c) gt SameCit(c,c)
63
Formulas
Token(t,i,c) gt InField(i,f,c) InField(i,f,c)
ltgt InField(i1,f,c) f ! f gt
(!InField(i,f,c) v !InField(i,f,c)) Token(t,i
,c) InField(i,f,c) Token(t,i,c)
InField(i,f,c) gt SameField(f,c,c) SameField(
f,c,c) ltgt SameCit(c,c) SameField(f,c,c)
SameField(f,c,c) gt SameField(f,c,c) SameCit
(c,c) SameCit(c,c) gt SameCit(c,c)
64
Formulas
Token(t,i,c) gt InField(i,f,c) InField(i,f,c)
ltgt InField(i1,f,c) f ! f gt
(!InField(i,f,c) v !InField(i,f,c)) Token(t,i
,c) InField(i,f,c) Token(t,i,c)
InField(i,f,c) gt SameField(f,c,c) SameField(
f,c,c) ltgt SameCit(c,c) SameField(f,c,c)
SameField(f,c,c) gt SameField(f,c,c) SameCit
(c,c) SameCit(c,c) gt SameCit(c,c)
65
Formulas
Token(t,i,c) gt InField(i,f,c) InField(i,f,c)
ltgt InField(i1,f,c) f ! f gt
(!InField(i,f,c) v !InField(i,f,c)) Token(t,i
,c) InField(i,f,c) Token(t,i,c)
InField(i,f,c) gt SameField(f,c,c) SameField(
f,c,c) ltgt SameCit(c,c) SameField(f,c,c)
SameField(f,c,c) gt SameField(f,c,c) SameCit
(c,c) SameCit(c,c) gt SameCit(c,c)
66
Formulas
Token(t,i,c) gt InField(i,f,c) InField(i,f,c)
ltgt InField(i1,f,c) f ! f gt
(!InField(i,f,c) v !InField(i,f,c)) Token(t,i
,c) InField(i,f,c) Token(t,i,c)
InField(i,f,c) gt SameField(f,c,c) SameField(
f,c,c) ltgt SameCit(c,c) SameField(f,c,c)
SameField(f,c,c) gt SameField(f,c,c) SameCit
(c,c) SameCit(c,c) gt SameCit(c,c)
67
Formulas
Token(t,i,c) gt InField(i,f,c) InField(i,f,c)
ltgt InField(i1,f,c) f ! f gt
(!InField(i,f,c) v !InField(i,f,c)) Token(t,i
,c) InField(i,f,c) Token(t,i,c)
InField(i,f,c) gt SameField(f,c,c) SameField(
f,c,c) ltgt SameCit(c,c) SameField(f,c,c)
SameField(f,c,c) gt SameField(f,c,c) SameCit
(c,c) SameCit(c,c) gt SameCit(c,c)
68
Formulas
Token(t,i,c) gt InField(i,f,c) InField(i,f,c)
ltgt InField(i1,f,c) f ! f gt
(!InField(i,f,c) v !InField(i,f,c)) Token(t,i
,c) InField(i,f,c) Token(t,i,c)
InField(i,f,c) gt SameField(f,c,c) SameField(
f,c,c) ltgt SameCit(c,c) SameField(f,c,c)
SameField(f,c,c) gt SameField(f,c,c) SameCit
(c,c) SameCit(c,c) gt SameCit(c,c)
69
Formulas
Token(t,i,c) gt InField(i,f,c) InField(i,f,c)
!Token(.,i,c) ltgt InField(i1,f,c) f ! f
gt (!InField(i,f,c) v !InField(i,f,c)) Token(
t,i,c) InField(i,f,c) Token(t,i,c)
InField(i,f,c) gt SameField(f,c,c) SameField(
f,c,c) ltgt SameCit(c,c) SameField(f,c,c)
SameField(f,c,c) gt SameField(f,c,c) SameCit
(c,c) SameCit(c,c) gt SameCit(c,c)
70
Results Segmentation on Cora
71
ResultsMatching Venues on Cora
72
Overview
  • Motivation
  • Background
  • Markov logic
  • Inference
  • Learning
  • Software
  • Applications
  • Discussion

73
Conclusion
  • Web provides plethora of learning problems
  • Machine learning provides plethora of solutions
  • We need a unifying language
  • Markov logic Use weighted first-order logicto
    define statistical models
  • Efficient inference and learning algorithms(but
    Web scale still requires manual coding)
  • Many successful applications(e.g., information
    extraction)
  • Open-source software / Web site Alchemy

alchemy.cs.washington.edu
Write a Comment
User Comments (0)
About PowerShow.com