ULDBs: Databases with Uncertainty and Lineage Omar Benjelloun, etc. - PowerPoint PPT Presentation

1 / 27
About This Presentation
Title:

ULDBs: Databases with Uncertainty and Lineage Omar Benjelloun, etc.

Description:

Any operation that produces its results in a tuple-by-tuple fashion is DL-monotonic. ... is similar to that of chain probability, i.e. recording history ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 28
Provided by: lite2
Category:

less

Transcript and Presenter's Notes

Title: ULDBs: Databases with Uncertainty and Lineage Omar Benjelloun, etc.


1
ULDBs Databases with Uncertainty and
LineageOmar Benjelloun, etc.
  • Te Li
  • 4/20/2006
  • ASU

2
Issues
  • Motivation
  • ULDB Data Model
  • DL-Monotonic Queries
  • ULDB Query Evaluation
  • ULDB Minimization
  • Probabilistic ULDBs
  • Pros and Cons

3
Motivation
  • Many applications require both lineage and
    uncertainty.
  • Data integration
  • Deduplication
  • Scientific data management
  • Information extraction
  • Lineage is useful since it
  • Offers helpful additional info to users
  • Correlates uncertainty in results with base data
  • Exhibits representl and computl benefits

4
ULDB Data Model (1)
  • Database with Lineage (LDB)

5
ULDB Data Model (2)
  • x-relation (Uncertain Database)
  • x-tuples tuple alternatives
  • maybe x-tuples ?

6
ULDB Data Model (3)
  • ULDB Combining Lineage and Uncertainty

7
ULDB Data Model (4)
  • A uncertainty model is complete if
  • It is possible to represent any set of possible
    instances within the model
  • This model is complete
  • Given any set of possible LDBs, there exists a
    ULDB whose possible LDBs are the same as the
    given set of LDBs.

8
ULDB Data Model (5)
  • Only consider a subset of the model called
    Well-Behaved ULDB
  • A ULDB is well-behaved if the lineages of all its
    x-tuples satisfy
  • No cycles
  • All alternatives of an x-tuple have distinct
    lineages
  • Their lineages point to alternatives of the exact
    same set of x-tuples

9
DL-Monotonic Queries (1)
  • Only consider a subset of relational queries
    called DL-Monotonic Queries
  • A DL-monotonic query is a function Q from LDBs to
    LDBs that satisfies
  • The lineage of a result tuple points to the
    minimal subset of the db that produces exactly
    that tuple
  • Monotonicity
  • Any operation that produces its results in a
    tuple-by-tuple fashion is DL-monotonic.
  • Such as multiset selection, projection, join,
    union

10
DL-Monotonic Queries (2)
11
ULDB Query Evaluation (1)
  • The result of query eval includes the original db
    and the answer relation
  • The algorithm
  • Performs standard eval on all the possible
    instances
  • Constructs the answer x-relation
  • If the db is well-behaved, so is the query result

12
ULDB Query Evaluation (2)
  • Semantics of Queries on ULDBs

13
ULDB Query Evaluation (3)Extraction
  • The extracted set of x-relations is a
    well-behaved ULDB which preserves the possible
    instances w.r.t. data and internal lineage
  • Preserve means to retain the x-tuples that are
    pointed to by the lineages
  • The resulting ULDB is D-minimal

14
ULDB Minimization (1)
  • Why do we need minimization?
  • Extraneous ?s
  • Extraneous alternatives

15
ULDB Minimization (2)
  • A ULDB is D-minimal if it includes no extraneous
    ?s or alternatives
  • What about lineage?
  • Internal lineage external lineage
  • Internal ? IDs in the database
  • External ? IDs outside the database
  • A ULDB is L-minimal if its internal lineage is
    the smallest

16
ULDB Minimization (3)Membership Queries
  • A side-effect of minimization
  • Tuple membership/certainty
  • P problems (for well-behaved ULDBs)
  • Instance membership/certainty
  • NP-hard problems (for all complete uncertainty
    models)

17
Probabilistic ULDBs (1)The Model
  • Well-behaved and D-minimized ULDB
  • Alternatives of an x-tuple are disjoint
  • Different x-tuples are independent
  • Each base alternative has a confidence value
  • The sum s of the confidence values of each base
    x-tuple is at most 1 and exactly 1 if it is not a
    maybe x-tuple
  • The confidence value of ? is (1-s)
  • The probability of a possible instance is the
    product of the confidences of the base
    alternatives and ? chosen in it

18
Probabilistic ULDBs (2)
  • Suppose a possible instance where
  • Amy saw an Acura
  • Betty saw a Mazda
  • Hank doesnt own an Acura
  • Confidence c 0.80.6(1-0.6) 0.192

19
Probabilistic ULDBs (3)Query Evaluation
  • Two-phase evaluation
  • Data computation
  • Confidence computation
  • Why post-computes confidence?
  • Feasibility. The confidence value for every
    result alternative a is a function of the
    confidence values for the base alternatives
    reachable by as transitive lineage
  • Benefit. Ensures correctness

20
Probabilistic ULDBs (4)
  • Query plan 1 (0.528)
  • Query plan 2 (0.6048)
  • Query plan 3 (0.528)
  • First compute the result (Hank) with ID (71,1)
  • Duplicate-elimination is not DL-monotonic,
    lineage function becomes a boolean
    formula ?((71,1)) ((51,1)?((11,1)?(12,1)))

21
Pros Cons
  • The model is very simple but very expressive, in
    fact, complete
  • The biggest penalty is maintenance of history. In
    case of ULDB, maintenance of lineage
  • The idea of lineage is similar to that of chain
    probability, i.e. recording history
  • The post-calculation of confidence is very
    similar to that in PEPX

22
Survey on Stanford InfoLab
  • 2006

23
People
  • 21 members
  • 5 key figures Hector Garcia-Molina, Rajeev
    Motwani, Jeff Ullman, Jennifer Widom, Gio
    Wiederhold

in database area
24
11 Current Projects
25
Trio Project (2004-)
  • People Jennifer Widom, 1 post-doc, 4 grads, 2
    alums
  • Papers CIDR05, ICDE06, DEB06, 2 TRs
  • Vision data, uncertainty, and lineage
  • Goals
  • To combine and distill previous work into a
    simple and usable extension to the relational
    model
  • To design a query language as a well-defined
    intuitive extension to SQL
  • To build a working system -- a system that
    augments conventional data management with both
    uncertainty and lineage as an integral part of
    the data

26
WebBase Project (1998-)
  • People Hector Garcia-Molina, Andreas Paepcke, 7
    students, 4 alums
  • Papers 16 confs (2 SIGMOD, 1 VLDB), 1 journal,
    10 TRs
  • Vision crawling, storage, indexing, and querying
    of large collections of Web pages
  • Goals
  • Provide a storage infrastructure for Web-like
    content
  • Store a sizeable portion of the Web
  • Enable researchers to easily build indexes of
    page features across large sets of pages
  • Distribute Webbase content via multicast channels
  • Support structure and content-based querying over
    the stored collection

27
STREAM Project (2000-2006)
  • People Jennifer Widom, Rajeev Motwani, 5
    students, 8 alums
  • Papers 32 conf/journal (5 SIGMODR, 6 VLDBJ),
    1 TR
  • Vision multiple, continuous, rapid, time-varying
    data streams
Write a Comment
User Comments (0)
About PowerShow.com