Feature Construction, Aggregation, and Propositionalization Session Intro - PowerPoint PPT Presentation

1 / 14
About This Presentation
Title:

Feature Construction, Aggregation, and Propositionalization Session Intro

Description:

How to search for relevant objects, predicates, features, relations? ... Relational clich s. Reasoning? Feature Construction. Languages for relational features ... – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 15
Provided by: pedr82
Category:

less

Transcript and Presenter's Notes

Title: Feature Construction, Aggregation, and Propositionalization Session Intro


1
Feature Construction, Aggregation, and
PropositionalizationSession Intro
  • Pedro Domingos
  • University of Washington

2
SRL via Propositionalization
  • Question How to learn from relational data?
  • Popular answer
  • Propositionalize data
  • Apply propositional learner
  • Question How to propositionalize?

3
How to Propositionalize?
  • How to search for relevant objects, predicates,
    features, relations?
  • How to combine them into a tractable number of
    propositional features?
  • How to aggregate a variable number of
    objects/features/... into a fixed-length vector?

4
Feature Search
  • Popular approaches
  • Breadth-first Not scalable
  • Greedy Myopic
  • Better approaches?
  • Declarative bias
  • Relational pathfinding
  • Relational clichés
  • Reasoning?

5
Feature Construction
  • Languages for relational features
  • Taxonomies of relational features
  • Domain knowledge (part of input?)
  • Bypass with high-dimensional learner

6
Aggregation
  • Traditional ILP Quantification
  • Traditional DBs SUM, COUNT, MAX, etc.
  • More general Sufficient statistics to order n
  • Most general Any function Full-blown learning
    problem!
  • Input Set of tuples
  • Output Any fixed-size model of joint distribution

7
Scaling UpStatistical Relational Learning
  • Pedro Domingos
  • University of Washington
  • Joint work with Geoff Hulten and Yeuhi Abe

8
Why Scaling Matters
  • Killer apps of SRL will be in very large domains
  • Small domains can be propositionalized by hand
  • Scaling is a bigger problem in SRL
  • Search space is larger
  • Databases dont fit in RAM
  • We may want to decide on the fly what sources to
    read
  • Dimensionality Attributes Examples
    (!)

S
9
Standard Setting
  • Propositionalization
  • Assemble flat table for propositional learner
  • By recursively following relations from objects
    of interest
  • Problems
  • Features explodes with depth of recursion
  • In-memory aggregation may not be feasible
  • Massive duplication of work

10
Feature Selection with Subsampling
  • Find n features with highest info gain
  • Start with objects own features (or max )
  • Read only enough data to select best features
    with high confidence(Hoeffding/normal bounds
    Union bound)
  • Add features of related objects and repeat

11
Aggregation with Subsampling
  • Read only enough objects to compute probably
    approx. correct aggregates
  • Bounds can be combined with feature selection
    ones for global confidence
  • Good for some aggregates (e.g., AVG),not for
    others (e.g., MAX)
  • Active area of research

12
Caching and Graph Traversal
  • Form traversal DAG
  • Node Class (with feature list)
  • Edge Relation
  • N-ary relations binarized
  • O(Schema size)
  • Do not reload previously-visited objects
  • DAG pruned as selection progresses

13
Preliminary Experiments
  • Predicting evolution of WWW
  • Propositional learner Decision trees
  • Subsampling Order-of-magnitude speedup
  • Caching Another order of magnitude
  • From depth 2 to depth 5
  • More accurate model
  • Properties of sibling pages are predictive

14
Open Questions
  • Bounds assume data is iid, but its not
  • Exploiting info from data-generating process
  • Avoiding redundancy in joint distr. learning
  • Interleaving learning and feature collection
  • Summarizing data in RAM
  • Applying to time-changing data
  • Etc.
Write a Comment
User Comments (0)
About PowerShow.com