Query Processing and Networking Infrastructures - PowerPoint PPT Presentation

About This Presentation
Title:

Query Processing and Networking Infrastructures

Description:

Day 2: Seed some cross-fertilized research. Especially with networking ... Day 2: Research Synergies w/Networking. Queries as indirection, revisited ... – PowerPoint PPT presentation

Number of Views:162
Avg rating:3.0/5.0
Slides: 96
Provided by: joehell
Learn more at: https://dsf.berkeley.edu
Category:

less

Transcript and Presenter's Notes

Title: Query Processing and Networking Infrastructures


1
Query Processing and Networking Infrastructures
  • Day 1 of 2
  • Joe Hellerstein
  • UC Berkeley
  • Septemer 20, 2002

2
Two Goals
  • Day 1 Primer on query processing
  • Targeted to networking/OS folk
  • Bias systems issues
  • Day 2 Seed some cross-fertilized research
  • Especially with networking
  • Thesis dataflow convergence
  • query processing and routing
  • Clearly other resonances here
  • Dataflow HW architectures
  • Event-based systems designs
  • ML and Control Theory
  • Online Algorithms

3
(Sub)Space of Possible Topics
Distributed Federated QP
TransactionalStorage Networking
Active DBs (Trigger Systems)
TraditionalRelational QPOptimization
Execution
Data Model Query LanguageDesign
Parallel QP
NFNF Data Models (OO, XML, Semistructured)
Adaptive QP
Online and Approximate QP
Indexing
Data Reduction
Data Streams Continuous Queries
Visual Querying Data Visualization
Media Queries,Feature Extraction Similarity
Search
Compression
Statistical Data Analysis(Mining)
Boolean Text Search
Traditional TextRanking
HypertextRanking
4
Likely Topics Here
TransactionalStorage Networking
Distributed Federated QP
Active DBs (Trigger Systems)
TraditionalRelational QPOptimization
Execution
Data Model Query LanguageDesign
Parallel QP
NFNF Data Models (OO, XML, Semistructured)
Adaptive QP
Online and Approximate QP
Indexing
Data Reduction
Data Streams Continuous Queries
Visual Querying Data Visualization
Media Queries,Feature Extraction Similarity
Search
Compression
Statistical Data Analysis(Mining)
Boolean Text Search
Traditional TextRanking
HypertextRanking
5
Plus Some Speculative Ones
Distributed Federated QP
TraditionalRelational QPOptimization
Execution
ContentRouting
IndirectionArchitectures
Parallel QP
Adaptive QP
Online and Approximate QP
Indexing
Peer-to-PeerQP
Data Streams Continuous Queries
SensornetQP
NetworkMonitoring
Boolean Text Search
Traditional TextRanking
6
Outline
  • Day 1 Query Processing Crash Course
  • Intro
  • Queries as indirection
  • How do relational databases run queries?
  • How do search engines run queries?
  • Scaling up cluster parallelism and distribution
  • Day 2 Research Synergies w/Networking
  • Queries as indirection, revisited
  • Useful (?) analogies to networking research
  • Some of our recent research at the seams
  • Some of your research?
  • Directions and collective discussion

7
Getting Off on the Right Foot
8
Roots database and IR research
  • Top-down traditions (applications)
  • Usually begins with semantics and models.
  • Common Misconceptions
  • Query processing Oracle or Google.
  • Need not be so heavyweight or monolithic! Many
    reusable lessons within
  • IR search and DB querying are fundamentally
    different
  • Very similar from a query processing perspective
  • Many similarities in other data models as well
  • Querying is a synchronous, interactive process.
  • Triggers, rules and "continuous queries" not so
    different from plain old queries.

9
So well go bottom-up
  • Focus on resuable building blocks
  • Attempt to be language- and model-agnostic
  • illustrate with various querying scenarios

10
Confession Two Biases
  • Relational query engines
  • Most mature and general query technology
  • Best documented in the literature
  • Conceptually general enough to capture most all
    other models/schemes
  • Everybody does web searches
  • So its both an important app, and an inescapable
    usage bias we carry around
  • It will inform our discussion. Shouldnt skew it
  • Lots of other query systems/languages you can
    keep in mind as we go
  • LDAP, DNS, XSL/Xpath/XQuery, Datalog

11
What Are Queries For? I
  • Obvious answer search and analysis over big data
    sets
  • Search select data of interest
  • Boolean expressions over content
  • sometimes with an implicit ordering on results
  • Analysis construct new information from base
    data.
  • compute functions over each datum
  • concatenate related records (join)
  • partition into groups, summarize (aggregates)
  • aside Mining vs. Querying? As a rule of
    thumb, think of mining as WYGIWIGY.
  • Not the most general, powerful answer

12
What Are Queries For? II
  • Queries bridge a (large!) level of indirection
  • Declarative programming what you want, not how
    to get it
  • Easy (er) to express
  • Allows the how to change under the covers
  • A critical issue!
  • Not just for querying
  • Method invocation, data update, etc

?? !!
13
Motivation for this Indirection
  • Critical when rates of change differ across
    layers
  • In particular, when dapp/dt ltlt denvironment/dt
  • E.g. DB apps are used for years, decades (!!)
  • E.g. networked env high rates of change (??)
  • DB lit calls this data independence

14
Data Independence Background
  • Bad Old Days
  • Hierarchical and Network (yep!) data models
  • Nesting pointers mean that apps explicitly
    traverse data, become brittle when data layouts
    change
  • Apps with persistent data have slow dapp/dt
  • And the database environments change faster!
  • Logical changes to representation (schema)
  • Physical changes in storage (indexes, layouts,
    HW)
  • DBs often shared by multiple apps!
  • In B.O.D., all apps had to be rewritten on change

15
Its a SW Engineering Thing
  • Analogy imagine if your C structs were to
    survive for decades
  • youd keep them very simple
  • encapsulation to allow future mods
  • Similar Analogy to NWs
  • protocol simplicity is good
  • soft state is good (discourages hardcoded refs to
    transient resources)
  • But the fun systems part follows directly
  • Achieve the goal w/respectable performance over a
    dynamic execution environment

16
Codds Data Independence
  • Ted Codd, IBM c. 1969 and forward
  • Turing award 1981
  • Two layers of indirection

Applications
Spanned by views and query rewriting
Logical Independence
Logical Representation(schema)
Spanned by queryoptimization andexecution
Physical Independence
Physical Representation (storage)
17
A More Architectural Picture
Bridges logical independence
Declarative queryover views
Query Rewriter
Bridges physical independence
Query Processor
Declarative queryover base tables
Optimizer
Query Plan (Procedural)
Executor
N.B. This classical QParchitecture raises
someproblems. To be revisited!
IteratorAPI
Access Methods
18
Access Methods Indexing
19
Access Methods
  • Base data access layer
  • Model Data stored in unordered collections
  • Relations, tables, one type per collection
  • Interface iterators
  • Open(predicate) -gt cursor
  • Usually simple predicates attribute op constant
  • op usually arithmetic (lt, gt, ), though well see
    extensions (e.g. multi-d ops)
  • Next(cursor) -gt datum (of known type)
  • Close(cursor)
  • Insert(datum of correct type)
  • Delete(cursor)

20
Typical Access Methods
  • Heap files
  • unordered array of records
  • usually sequential on disk
  • predicates just save cross-layer costs
  • Traditional Index AMs
  • B-trees
  • actually, B-trees all data at leaves
  • Can scan across leaves for range search
  • predicates (lt,gt,, between) result in fewer I/Os
  • random I/Os (at least to find beginning of range)
  • Linear Hash index
  • Litwin 78. Supports equality predicates only.
  • This is it for IR and standard relational DBs
  • Though when IR folks say indexing, they
    sometimes mean all of query processing

21
Primary Secondary Indexes
Directory
Directory
Data
(key, ptr) pairs
Data
22
Primary Secondary Indexes
Directory
(key, ptr) pairs
23
An Exotic Forest of Search Trees
  • Multi-dimensional indexes
  • For geodata, multimedia search, etc.
  • Dozens! E.g. R-tree family, disk-based
    Quad-Trees, kdB-trees
  • And of course linearizations with B-trees
  • Path indexes
  • For XML and OO path queries
  • E.g. Xfilter
  • Etc.
  • Lots of one-off indexes, often many per workload
  • No clear winners here
  • Extensible indexing scheme would be nice

24
Generalized Search Trees (GiST)
Hellerstein et al., VLDB 95
  • What is a (tree-based) DB index? Typically
  • A clustering of data into leaf blocks
  • Hierarchical summaries (subtree predicates --
    SPs) for pointers in directory blocks

25
Generalized Search Trees (GiST)
  • Can realize that abstraction with simple
    interface
  • User registers opaque SP objects with a few
    methods
  • Consistent(q, p) should query q traverse
    subtree?
  • Penalty(d, p) how bad is it to insert d below p
  • Union (p1, p2) form SP that includes p1, p2
  • PickSplit(p1, , pn) partition SPs into 2
  • Tree maintenance, concurrency, recovery all
    doable under the covers
  • Covers many popular multi-dimensional indexes
  • Most of which had no concurrency/recovery story
  • http//gist.cs.berkeley.edu

26
Some Additional Indexing Tricks
ONeil/Quass, SIGMOD 97
  • Bitmap indexing
  • Many matches per value in (secondary) index?
    Rather than storing pointers to heap file in
    leaves, store a bitmap of matches in a (sorted)
    heap file.
  • Only works if file reorg is infrequent
  • Can make intersection, COUNT, etc. quicker during
    query processing
  • Can mix/match bitmaps and lists in a single index
  • Works with any (secondary) index with duplicate
    matches
  • Vertical Partitioning / Columnar storage
  • Again, for sorted, relatively static files
  • Bit-slice indexes

27
Query Processing Dataflow Infrastructures
28
Dataflow Infrastructure
  • Dataflow abstraction is very simple
  • box-and-arrow diagrams
  • (typed) collections of objects flow along edges
  • Details can be tricky
  • Push or Pull?
  • More to it than that
  • How do control-flow anddataflow interact?
  • Where does the data live?
  • Dont want to copy data
  • If passing pointers, where doesthe real data
    live?


29
Iterators
  • Most uniprocessor DB engines use iterators
  • Open() -gt cursor
  • Next(cursor) -gt typed record
  • Close(cursor)
  • Simple and elegant
  • Control-flow and dataflow coupled
  • Familiar single-threaded, procedure-call API
  • Data refs passed on stack, no buffering
  • Blocking-agnostic
  • Works w/blocking ops -- e.g. Sort
  • Works w/pipelined ops
  • Note well-behaved iterators come up for air
    in inner loops
  • E.g. for interrupt handling

g
f
S
R
30
Where is the In-Flight Data?
  • In standard DBMS, raw data lives in disk format,
    in shared Buffer Pool
  • Iterators pass references to BufPool
  • A tuple slot per iterator input
  • Never copy along edges of dataflow
  • Join results are arrays of refs to base tables
  • Operators may pin pages in BufPool
  • BufPool never replaces pinned pages
  • Ops should release pins ASAP (esp. across Next()
    calls!!)
  • Some operators copy data into their internal
    state
  • Canspill this state to private disk space

31
Weaknesses of Simple Iterators
  • Evolution of uniprocessor archs to parallel archs
  • esp. shared-nothing clusters
  • Opportunity for pipelined parallelism
  • Opportunity for partition parallelism
  • Take a single box in the dataflow, and split it
    across multiple machines
  • Problems with iterators in this environment
  • Spoils pipelined parallelism opportunity
  • Polling (Next()) across the network is
    inefficient
  • Nodes sit idle until polled, and during comm
  • A blocking producer blocks its consumer
  • But would like to keep iterator abstraction
  • Especially to save legacy query processor code
  • And simplify debugging (single-threaded,
    synchronous)

32
Exchange
Graefe, SIGMOD 90
  • Encapsulate partition parallelism asynchrony
  • Keep the iterator API between ops
  • Exchange operator partitions input data by
    content
  • E.g. join or sort keys
  • Note basic architectural idea!
  • Encapsulate dataflowtricks in operators,
    leavinginfrastructure untouched
  • Well see this again next week, e.g. in Eddies

33
Exchange Internals
  • Really 2 operators, XIN and XOUT
  • XIN is top of a plan, and pulls, pushing
    results to XOUT queue
  • XOUT spins on its local queue
  • One thread becomes two
  • Producer graph XIN
  • Consumer graph XOUT
  • Routing table/fn in XINsupports partition
    parallelism
  • E.g. for sort, join, etc.
  • Producer and consumer see iterator API
  • Queue thread barrier turns NW-based push into
    iterator-style pull

XOUT
XIN
route
Exchange
34
Exchange Benefits?
  • Remember Iterator limitations?
  • Spoils pipelined parallelism opportunity
  • solved by Exchange thread boundary
  • Polling (Next()) across the network is
    inefficient
  • Solved by XIN pushing to XOUT queue
  • A blocking producer blocks its consumer
  • Still a problem!

35
Exchange Limitations
  • Doesnt allow consumer work to overlap w/blocking
    producers
  • E.g. streaming data sources, events
  • E.g. sort, some join algs
  • Entire consumer graph blocks if XOUT queue empty
  • Control flow coupled to dataflow, so XOUT wont
    return without data
  • Queue is encapsulated from consumer
  • But
  • Note that exchange model is fine for most
    traditional DB Query Processing
  • May need to be extended for new settings

36
Fjords
Madden/Franklin, ICDE 01
  • Thread of control per operator
  • Queues between each operator
  • Asynch or synch calls
  • Can do asynch poll-and-yield iteration in each
    operator (for both consumer and producer)
  • Or can do synchronous get_next iteration
  • Can get traditional behavior if you want
  • Synch polls queue of size 1
  • ? Iterators
  • Synch consumer, asynch producer
  • Exchange
  • Asynch calls solve the blocking problem of
    Exchange

iterator
exchange
37
Fjords
  • Disadvantages
  • Lots of threads
  • Best done in an event-programming style, not OS
    threads
  • Operators really have to come up for air
    (yield)
  • Need to write your own scheduler
  • Harder to debug
  • But
  • Maximizes flexibility for operators at the
    endpoints
  • Still provides a fairly simple interface for
    operator-writers

38
Basic Relational Operators and Implementation
39
Relational Algebra Semantics
  • Selection sp(R)
  • Returns all rows in R that satisfy p
  • Projection pC(R)
  • Returns all rows in R projected to columns in C
  • In strict relational model, remove duplicate rows
  • In SQL, preserve duplicates (multiset semantics)
  • Cartesian Product R ? S
  • Union R ? S Difference R S
  • Note R, S must have matching schemata
  • Join R p S sp(R ? S)
  • Missing Grouping Aggregation, Sorting

40
Operator Overview Basics
  • Selection
  • Typically free, so pushed down
  • Often omitted from diagrams
  • Projection
  • In SQL, typically free, so pushed down
  • No duplicate elimination
  • Always pass the minimal set of columns downstream
  • Typically omitted from diagrams
  • Cartesian Product
  • Unavoidable nested loop to generate output
  • Union
  • Concat, or concat followed by dup. elim.

41
Operator Overview, Cont.
  • Unary operators Grouping Sorting
  • Grouping can be done with hash or sort schemes
    (as well see)
  • Binary matching Joins/Intersections
  • Alternative algorithms
  • Nested loops
  • Loop with index lookup (Index N.L.)
  • Sort-merge
  • Hash Join
  • Dont forget have to write as iterators
  • Every time you get called with Next(), you adjust
    your state and produce an output record

42
Unary External Hashing
Bratbergsengen, VLDB 84
  • E.g. GROUP BY, DISTINCT
  • Two hash functions, hc (coarse) and hf (fine)
  • Two phases
  • Phase 1 for each tuple of input, hash via hc
    into a spill partition to be put on disk
  • B-1 blocks of memory used to hold output buffers
    for writing a block at a time per partition

43
Unary External Hashing
  • Phase 2 for each partition, read off disk and
    hash into a main-memory hashtable via hf
  • For distinct, when you find a value already in
    hashtable, discard the copy
  • For GROUP BY, associate some agg state (e.g.
    running SUM) with each group in the hash table,
    and maintain

44
External Hashing Analysis
  • To utilize memory well in Phase 2, would like
    each partition to be B blocks big
  • Hence works in two phases when B gt ?R
  • Same req as external sorting!
  • Else can recursively partition the partitions in
    Phase 2
  • Can be made to pipeline, to adapt nicely to small
    data sets, etc.

45
Hash Join (GRACE)
Fushimi, et al., VLDB 84
  • Phase 1 partition each relation on the join key
    with hc, spilling to disk
  • Phase 2
  • build each partition of smaller relation into a
    hashtable via hf
  • scan matching partition of bigger relation, and
    for each tuple probe the hashtable via hf for
    matches
  • Would like each partition of smaller relation to
    fit in memory
  • So works well if B gt ?smaller
  • Size of bigger is irrelevant!! (Vs. sort-merge
    join)
  • Popular optimization Hybrid hash join
  • Partition 0 doesnt spill -- it builds and
    probes immediately
  • Partitions 1 through n use rest of memory for
    output buffers
  • DeWitt/Katz/Olken/Shapiro/Stonebraker/Wood,
    SIGMOD 84

46
Hash-Join
Original Relations
Partitions
OUTPUT
1
1
2
INPUT
2
hash function hc
. . .
B-1
B-1
B main memory buffers
Disk
Disk
Partitions of R S
Hash table for partition Ri (k lt B-1 pages)
hash
fn
hf
hf
Join Result
Output buffer
Input buffer for Si
B main memory buffers
Disk
47
Symmetric Hash Join
Mikillineni Su, TOSE 88 Wilschut Apers,
PDIS 91
  • Pipelining, in-core variant
  • Build and probe symmetrically
  • Correctness Each output tuple generated when
    its last-arriving component appears
  • Can be extended to out-of-core case
  • Tukwila Ives HaLevy, SIGMOD 99
  • Xjoin Spill and read partitions multiple times
  • Correctness guaranteed by timestamping tuples and
    partitions
  • Urhan Franklin, DEBull 00

48
Relational Query Engines
49
A Basic SQL primer
SELECT DISTINCT ltoutput expressionsgt FROM
lttablesgt WHERE ltpredicatesgt GROUP BY
ltgb-expressiongt HAVING lth-predicatesgt ORDER
BY ltexpressiongt
  • Join tables in FROM clause
  • applying predicates in WHERE clause
  • If GROUP BY, partition results by GROUP
  • And maintain aggregate output expressions per
    group
  • Delete groups that dont satisfy HAVING clause
  • If ORDER BY, sort output accordingly

50
Examples
  • Single-table S-F-W
  • DISTINCT, ORDER BY
  • Multi-table S-F-W
  • And self-join
  • Scalar output expressions
  • Aggregate output expressions
  • With and without DISTINCT
  • Group By
  • Having
  • Nested queries
  • Uncorrelated and correlated

51
A Dopey Query Optimizer
spredicates
  • For each S-F-W query block
  • Create a plan that
  • Forms the cartesian product of the FROM clause
  • Applies the WHERE clause
  • Incredibly inefficient
  • Huge intermediate results!
  • Then, as needed
  • Apply the GROUP BY clause
  • Apply the HAVING clause
  • Apply any projections and output expressions
  • Apply duplicate elimination and/or ORDER BY

?

tables
52
An Oracular Query Optimizer
  • For each possible correct plan
  • Run the plan (infinitely fast)
  • Measure its performance in reality
  • Pick the best plan, and run it in reality

53
A Standard Query Optimizer
  • Three aspects to the problem
  • Legal plan space (transformation rules)
  • Cost model
  • Search Strategy

54
Plan Space
  • Many legal algebraic transformations, e.g.
  • Cartesian product followed by selection can be
    rewritten as join
  • Join is commutative and associative
  • Can reorder the join tree arbitrarily
  • NP-hard to find best join tree in general
  • Selections should (usually) be pushed down
  • Projections can be pushed down
  • And physical choices
  • Choice of Access Methods
  • Choice of Join algorithms
  • Taking advantage of sorted nature of some streams
  • Complicates Dynamic Programming, as well see

55
Cost Model Selectivity Estimation
  • Cost of a physical operator can be modeled fairly
    accurately
  • E.g. number of random and sequential I/Os
  • Requires metadata about input tables
  • Number of rows (cardinality)
  • Bytes per tuple (physical schema)
  • In a query pipeline, metadata on intermediate
    tables is trickier
  • Cardinality?
  • Requires selectivity (COUNT) estimation
  • Wet-finger estimates
  • Histograms, joint distributions and other
    summaries
  • Sampling

56
Search Strategy
  • Dynamic Programming
  • Used in most commercial systems
  • IBMs System R Selinger, et al. SIGMOD 79
  • Top-Down
  • Branch and bound with memoization
  • Exodus, Volcano Cascades Graefe, SIGMOD 87,
    ICDE 93, DEBull 95
  • Used in a few commercial systems (Microsoft SQL
    Server, especially)
  • Randomized
  • Simulated Annealing, etc. Ioannidis Kang
    SIGMOD 90

57
Dynamic Programming
  • Use principle of optimality
  • Any subtree of the optimal plan is itself optimal
    for its sub-expression
  • Plans enumerated in N passes (if N relations
    joined)
  • Pass 1 Find best 1-relation plan for each
    relation.
  • Pass 2 Find best way to join result of each
    1-relation plan (as outer) to another relation.
    (All 2-relation plans.)
  • Pass N Find best way to join result of a
    (N-1)-relation plan (as outer) to the Nth
    relation. (All N-relation plans.)
  • This gives all left-deep plans. Generalization
    is easy
  • A wrinkle physical properties (e.g. sort orders)
    violate principle of optimality!
  • Use partial-order dynamic programming
  • I.e. keep undominated plans at each step --
    optimal for each setting of the physical
    properties (each interesting order)

58
Relational Architecture Review
Query Parsing and Optimization
Query Executor
Access Methods
Buffer Management
Disk Space Management
DB
59
Text Search
60
Information Retrieval
  • A research field traditionally separate from
    Databases
  • Goes back to IBM, Rand and Lockheed in the 50s
  • G. Salton at Cornell in the 60s
  • Lots of research since then
  • Products traditionally separate
  • Originally, document management systems for
    libraries, government, law, etc.
  • Gained prominence in recent years due to web
    search
  • Today simple IR techniques
  • Show similarities to DBMS techniques you already
    know

61
IR vs. DBMS
  • Seem like very different beasts
  • Under the hood, not as different as they might
    seem
  • But in practice, you have to choose between the 2

IR DBMS
Imprecise Semantics Precise Semantics
Keyword search SQL
Unstructured data format Structured data
Read-Mostly. Add docs occasionally Expect reasonable number of updates
Page through top k results Generate full answer
62
IRs Bag of Words Model
  • Typical IR data model
  • Each document is just a bag of words (terms)
  • Detail 1 Stop Words
  • Certain words are considered irrelevant and not
    placed in the bag
  • e.g. the
  • e.g. HTML tags like ltH1gt
  • Detail 2 Stemming
  • Using English-specific rules, convert words to
    their basic form
  • e.g. surfing, surfed --gt surf
  • Detail 3 we may decorate the words with other
    attributes
  • E.g. position, font info, etc.
  • Not exactly bag of words after all

63
Boolean Text Search
  • Find all documents that match a Boolean
    containment expression
  • Windows AND (Glass OR Door) AND NOT
    Microsoft
  • Note query terms are also filtered via stemming
    and stop words
  • When web search engines say 10,000 documents
    found, thats the Boolean search result size.

64
Text Indexes
  • When IR folks say index or indexing
  • Usually mean more than what DB people mean
  • In our terms, both tables and indexes
  • Really a logical schema (i.e. tables)
  • With a physical schema (i.e. indexes)
  • Usually not stored in a DBMS
  • Tables implemented as files in a file system

65
A Simple Relational Text Index
  • Create and populate a table
  • InvertedFile(term string, docID int64)
  • Build a B-tree or Hash index on
    InvertedFile.term
  • May be lots of duplicate docIDs per term
  • Secondary index list compression per term
    possible
  • This is often called an inverted file or
    inverted index
  • Maps from words -gt docs
  • whereas normal files map docs to the words in the
    doc (?!)
  • Can now do single-word text search queries

66
Handling Boolean Logic
  • How to do term1 OR term2?
  • Union of two docID sets
  • How to do term1 AND term2?
  • Intersection (ID join) of two DocID sets!
  • How to do term1 AND NOT term2?
  • Set subtraction
  • Also a join algorithm
  • How to do term1 OR NOT term2
  • Union of term1 and NOT term2.
  • Not term2 all docs not containing term2.
    Yuck!
  • Usually forbidden at UI/parser
  • Refinement what order to handle terms if you
    have many ANDs/NOTs?

67
Boolean Search in SQL
Windows AND (Glass OR Door) AND NOT
Microsoft
  • (SELECT docID FROM InvertedFile WHERE word
    window INTERSECT SELECT docID FROM
    InvertedFile WHERE word glass OR word
    door)EXCEPTSELECT docID FROM InvertedFile
    WHERE wordMicrosoftORDER BY magic_rank()
  • Really theres only one query (template) in IR
  • Single-table selects, UNION, INTERSECT, EXCEPT
  • Note that INTERSECT is a shorthand for equijoin
    on a key
  • Often theres only one query plan in the system,
    too!
  • magic_rank() is the secret sauce in the search
    engines

68
Fancier Phrases and Near
  • Suppose you want a phrase
  • E.g. Happy Days
  • Add a position attribute to the schema
  • InvertedFile (term string, docID int64, position
    int)
  • Index on term
  • Enhance join condition in query
  • Cant use INTERSECT syntax, but query is nearly
    the same
  • SELECT I1.docID FROM InvertedFile I1,
    InvertedFile I
  • WHERE I1.word HAPPY AND I2.word DAYS
    AND I1.docID I2.docID AND I2.position -
    I1.position 1ORDER BY magic_rank()
  • Can relax to term1 NEAR term2
  • Position lt k off

69
Classical Document Ranking
  • TF ? IDF (Term Freq. ? Inverse Doc Freq.)
  • For each term t in the query
  • QueryTermRank occurrences of t in q
    TF ?
    log((total docs)/(docs with this term)) IDF
    ? normalization-factor
  • For each doc d in the boolean result
  • DocTermRank occurrences of t in d
    TF
    ? log((total docs)/(docs with this term))
    IDF ? normalization-factor
  • Rank DocTermRankQueryTermRank
  • Requires more to our schema
  • InvertedFile (term string, docID int64, position
    int, DocTermRank float)
  • TermInfo(term string, numDocs int)
  • Can compress DocTermRank non-relationally
  • This basically works fine for raw text
  • There are other schemes, but this is the standard

70
Some Additional Ranking Tricks
  • Phrases/Proximity
  • Ranking function can incorporate position
  • Query expansion, suggestions
  • Can keep a similarity matrix on terms, and
    expand/modify peoples queries
  • Document expansion
  • Can add terms to a doc
  • E.g. in anchor text of refs to the doc
  • Not all occurrences are created equal
  • Mess with DocTermRank based on
  • Fonts, position in doc (title, etc.)

71
Hypertext Ranking
  • Also factor in graph structure
  • Social Network Theory (Citation Analysis)
  • Hubs and Authorities (Clever), PageRank
    (Google)
  • Intuition recursively weighted in-degrees,
    out-degrees
  • Math eigenvector computation
  • PageRank sure seems to help
  • Though word on the street is that other factors
    matter as much
  • Anchor text, title/bold text, etc.

72
Updates and Text Search
  • Text search engines are designed to be
    query-mostly
  • Deletes and modifications are rare
  • Can postpone updates (nobody notices, no
    transactions!)
  • Updates done in batch (rebuild the index)
  • Cant afford to go offline for an update?
  • Create a 2nd index on a separate machine
  • Replace the 1st index with the 2nd
  • Can do this incrementally with a level of
    indirection
  • So no concurrency control problems
  • Can compress to search-friendly,
    update-unfriendly format
  • For these reasons, text search engines and DBMSs
    are usually separate products
  • Also, text-search engines tune that one SQL query
    to death!
  • The benefits of a special-case workload.

73
Architectural Comparison
Search String Modifier
Ranking Algorithm

The Query

Simple DBMS
The Access Method
OS
Buffer Management
Disk Space Management
Concurrencyand RecoveryNeeded
DB
DBMS
Search Engine
74
Revisiting Our IR/DBMS Distinctions
  • Data Modeling Query Complexity
  • DBMS supports any schema queries
  • Requires you to define schema
  • Complex query language (hard for folks to learn)
  • Multiple applications at output
  • RowSet API (cursors)
  • IR supports only one schema query
  • No schema design required (unstructured text)
  • Trivial query language
  • Single application behavior
  • Page through output in rank order, ignore most of
    output
  • Storage Semantics
  • DBMS online, transactional storage
  • IR batch, unreliable storage

75
Distribution Parallelism
76
Roots
  • Distributed QP vs. Parallel QP
  • Distributed QP envisioned as a k-node intranet
    for k 10
  • Sound old-fashioned? Think of multiple hosting
    sites (e.g. one per continent)
  • Parallel QP grew out of DB Machine research
  • All in one room, one administrator
  • Parallel DBMS architecture options
  • Shared-Nothing
  • Shared-Everything
  • Shared-Disk
  • Shared-nothing is most general, most scalable

77
Distributed QP Semi-Joins
Bernstein/Goodman 79
  • Main query processing issue in distributed DB
    lit use semi-joins
  • R S pR(R S)
  • Observe that R S ? (R p(S)) S
  • Assume each table lives at one site, R is bigger.
    To reduce communication
  • Ship Ss join columns to Rs site, do semijoin
    there, ship result to Ss site for the join
  • Notes
  • Im sloppy about dups in my defns above
  • Semi-joins arent always a win
  • Extra cost estimation task for a distributed
    optimizer

78
Bloom Joins
Babb, TODS 79
  • A constant optimization on semi-joins
  • Idea (R p(S)) S is redundant
  • Semi-join can safely return false hits from R
  • Rather than shipping p(S), ship a superset
  • A particular kind of lossy set compression
    allowed
  • Bloom Filter (B. Bloom, 1970)
  • Hash each value in a set via k independent hash
    functions onto an array of n bits
  • Check membership correspondingly
  • By tuning k and n, can control false hit rate
  • Rediscovered recently in web lit, some new
    wrinkles (Mitzenmachers compressed B.F.s,
    Rheas attenuated B.F.s)

79
Sideways Information Passing
  • These ideas generalize more broadly

Set o Stuff
CostlySet Generator
80
Sideways Information Passing
  • These ideas generalize more broadly
  • E.g. magic sets rewriting in datalog SQL
  • Tricky to do optimally in those settings, but
    wins can be very big

Set o Stuff
81
Parallelism 101
See DeWitt Gray, CACM 92
  • Pipelined vs. Partitioned
  • Pipelined typically inter-operator
  • Nominal benefits in a dataflow
  • Partition typically intra-operator
  • E.g. hash join or sort using k nodes
  • Speedup Scaleup
  • Speedup xold_time/new_time
  • Ideal linear
  • Scaleup small_sys_elapsed_small_problem /
    big_sys_elapse_big_problem
  • Ideal 1
  • Transaction scaleup N times as many TPC-Cs for
    N machines
  • Batch scaleup N times as big a DB for a query on
    N machines

82
Impediments to Good Parallelism
  • Startup overheads
  • Amortized for big queries
  • Interference
  • usually the result of unpredictable communication
    delays (comm cost, empty pipelines)
  • Skew
  • Of these, skew is the real issue in DBs
  • Embarrassingly parallel
  • I.e. it works

83
Data Layout
  • Horizontal Partitioning
  • For each table, assign rows to machines by some
    key
  • Or assign arbitrarily (round-robin)
  • Vertical Partitioning
  • Sort table, and slice off columns
  • Usually not a parallelism trick
  • But nice for processing queries on read-mostly
    data (projection is free!)

84
Intra-Operator Parallelism
  • E.g. for Hash Join
  • Every site with a horizontal partition of either
    R or S fires off a scan thread
  • Every storage site reroutes its data among join
    nodes based on hash of the join column(s)
  • Upon receipt, each site does local hash join

85
Recall Exchange!
86
Skew Handling
  • Skew happens
  • Even when hashing? Yep.
  • Can pre-sample and/or pre-summarize data to
    partition better
  • Solving skew on the fly is harder
  • Need to migrate accumulated dataflow state
  • FLuX Fault-Tolerant, Load-balancing eXchange

87
In Current Architectures
  • All DBMSs can run on shared memory, many on
    shared-nothing
  • The high end belongs to clusters
  • The biggest web-search engines run on clusters
    (Google, Inktomi)
  • And use pretty textbook DB stuff for Boolean
    search
  • Fun tradeoffs between answer quality and
    availability/management here (the Inktomi story)

88
Precomputations
89
Views and Materialization
  • A view is a logical table
  • A query with a name
  • In general, not updatable
  • If to be used often, could be materialized
  • Pre-compute and/or cache result
  • Could even choose to do this for common query
    sub-expressions
  • Neednt require a DBA to say this is a view

90
Challenges in Materialized Views
  • Three main issues
  • Given a workload, which views should be
    materialized
  • Given a query, how can mat-views be incorporated
    into query optimizer
  • As base tables are updated, how can views be
    incrementally maintained?
  • See readings book, Gupta Mumick

91
Precomputation in IR
  • Often want to save results of common queries
  • E.g. no point re-running Britney Spears or
    Harry Potter as Boolean queries
  • Can also use as subquery results
  • E.g. the query Harry Potter Loves Britney
    Spears can use the Harry Potter and Britney
    Spears results
  • Constrained version of mat-views
  • No surprise -- constrained relational workload
  • And consistency of matview with raw tables is not
    critical, so maintenance not such an issue.

92
Precomputed Aggregates
  • Aggregation queries work on numerical sets of
    data
  • Math tricks apply here
  • Some trivial, some fancy
  • Theme replace the raw data with small
    statistical summaries, get approximate results
  • Histograms, wavelets, samples, dependency-based
    models, random projections, etc.
  • Heavily used in query optimizers for selectivity
    estimation (a COUNT aggregate)
  • Spate of recent work on approximate query
    processing for AVG, SUM, etc.
  • Garofalakis, Gehrke, Rastogi tutorial, SIGMOD
    02

93
A Taste of Next Week
94
Query Dataflows Meet NWs
  • Some more presentation
  • Indirection in space and time
  • Thematic similarities, differences in NW/QP
  • Revisit a NW classic through a DB lens
  • Adaptive QP In Telegraph
  • Eddies, Stems, FluX
  • A taste of QP in Sensor Networks
  • TinyDB (TAG), Directed Diffusion
  • A taste of QP in p2p
  • PIER project at Berkeley parallel QP over DHTs
  • Presentations from MITers?
  • Open Discussion

95
Contact
  • jmh_at_cs.berkeley.edu
  • http//www.cs.berkeley.edu/jmh
Write a Comment
User Comments (0)
About PowerShow.com