CS 541 Hashing - PowerPoint PPT Presentation

1 / 65

About This Presentation

Title:

CS 541 Hashing

Description:

and cut directory if possible (Reverse insert procedure) Fall 2002. Chris Clifton - CS541 ... (Not bad if directory in memory) Directory doubles in size (Now it ... – PowerPoint PPT presentation

Number of Views:82

Avg rating:3.0/5.0

Slides: 66

Provided by: clif8

Category:

more less

Transcript and Presenter's Notes

Title: CS 541 Hashing

1
CS 541Hashing

October 9, 2002

2
Hashing

key ? h(key)

Buckets (typically 1 disk block)
. . .
3

Two alternatives

. . .
records
(1) key ? h(key)
. . .
4
Two alternatives
record
(2) key ? h(key)
key 1
Index

Alt (2) for secondary search key

5
Example hash function

Key x1 x2 xn n byte character string
Have b buckets
h add x1 x2 .. xn
compute sum modulo b

? This may not be best function
? Read Knuth Vol. 3 if you really need to
select a good function.

Good hash ? Expected number of
function keys/bucket is the same for all
buckets
7
Within a bucket

Do we keep keys sorted?

Yes, if CPU time critical
Inserts/Deletes not too frequent

8
Next example to illustrate inserts,
overflows, deletes

h(K)

9
EXAMPLE 2 records/bucket

INSERT
h(a) 1
h(b) 2
h(c) 1
h(d) 0

0 1 2 3
h(e) 1
10
EXAMPLE deletion
Deleteef
0 1 2 3
a
b
d
c
c
e
f
g
11
Rule of thumb

Try to keep space utilization
between 50 and 80
Utilization keys used
total keys that fit

If
If 80, overflows significant depends on how
good hash function is on keys/bucket

12
Indiana Center for Database SystemsSeminar
Wednesdays 1130, CS111Today Defining Privacy
for Data Mining

Chris Clifton
clifton_at_cs.purdue.edu
This is joint work with Murat Kantarcioglu and
Jaideep Vaidya

13
CS 541Hashing

October 16, 2002

14
Hashing

key ? h(key)

Buckets (typically 1 disk block)
. . .
15
How do we cope with growth?

Overflows and reorganizations
Dynamic hashing

16
Extensible hashing two ideas

(a) Use i of b bits output by hash function
b
h(K) ?
use i ? grows over time.

00110101
17

(b) Use directory
h(K)i to bucket

. . .
. . .
18
Example h(k) is 4 bits 2 keys/bucket
1

0001
1
1001
1100
Insert 1010
19
Example continued
i
2
00 01 10 11
1
0001
0111
1001
1010
Insert 0111 0000
1100
20
Example continued
2
0000
0001
i
2
00 01 10 11
2
0111
Insert 1001
21
Extensible hashing deletion

No merging of blocks
Merge blocks and cut directory if possible
(Reverse insert procedure)

22
Deletion example

Run thru insert example in reverse!

23
Extensible hashing
Summary

Can handle growing files
- with less wasted space
- with no full reorganizations

24
Linear hashing

Another dynamic hashing scheme

25
Example b4 bits, i 2, 2 keys/bucket

insert 0101

Future growth buckets
0101
0000
1111
1010

00 01 10 11

m 01 (max used block)
26
Example b4 bits, i 2, 2 keys/bucket
Future growth buckets
0101
0000
1111
1010

00 01 10 11

m 01 (max used block)
27
Example Continued How to grow beyond this?
i 2
1111
1010
0101
0000
0101

00 01 10 11

. . .
m 11 (max used block)
28
? When do we expand file?

Keep track of used slots
total of slots

If U threshold then increase m
(and maybe i )

29
Linear Hashing
Summary

Can handle growing files
- with less wasted space
- with no full reorganizations
No indirection like extensible hashing

30
Example BAD CASE

Very full
Very empty Need to move
m here
Would waste
space...

31
Summary

Hashing
- How it works
- Dynamic hashing
- Extensible
- Linear

32
CS 541Hashing

October 18, 2002

33
Next

Indexing vs Hashing
Index definition in SQL
Multiple key access

34
Indexing vs Hashing

Hashing good for probes given key
e.g., SELECT
FROM R
WHERE R.A 5

35
Indexing vs Hashing

INDEXING (Including B Trees) good for
Range Searches
e.g., SELECT
FROM R
WHERE R.A 5

36
Index definition in SQL

Create index name on rel (attr)
Create unique index name on rel (attr)

defines candidate key

Drop INDEX name

CANNOT SPECIFY TYPE OF INDEX
(e.g. B-tree, Hashing, )
OR PARAMETERS
(e.g. Load Factor, Size of Hash,...)
... at least in SQL...

Note
38

ATTRIBUTE LIST ? MULTIKEY INDEX
(next)
e.g., CREATE INDEX foo ON R(A,B,C)

Note
39

Multi-key Index

Motivation Find records where
DEPT Toy AND SAL 50k

40
Strategy I

Use one index, say Dept.
Get all Dept Toy records and
check their salary

I1
41
Strategy II

Use 2 Indexes Manipulate Pointers
Toy Sal
50k

42
Strategy III

Multiple Key Index
One idea

I2
I3
I1
43
Example

Example
Record
Dept
Index
Salary
Index

10k
15k
17k
21k
NameJoe DEPTSales SAL15k
12k
15k
15k
19k
44
For which queries is this index good?

Find RECs Dept Sales SAL20k
Find RECs Dept Sales SAL 20k
Find RECs Dept Sales
Find RECs SAL 20k

45
Interesting application

Geographic Data
DATA

y
x
. . .
46
Queries

What city is at ?
What is within 5 miles from ?
Which is closest point to ?

47
Example

48
Queries

Find points with Yi 20
Find points with Xi
Find points close to i
Find points close to b

Many types of geographic index structures have
been suggested
Quad Trees
R Trees

50
Two more types of multi key indexes

Grid
Partitioned hash

51
Grid Index

Key 2
X1 X2 Xn
V1
V2
Key 1
Vn

To records with key1V3, key2X2
52
CLAIM

Can quickly find records with
key 1 Vi ? Key 2 Xj
key 1 Vi
key 2 Xj

And also ranges.
E.g., key 1 ? Vi ? key 2

? But there is a catch with Grid Indexes!

How is Grid Index stored on disk?

Problem
Need regularity so we can compute position of
entry

54
Solution Use Indirection

Buckets
V1
V2
V3 Grid only
V4 contains
pointers to
buckets
Buckets

X1 X2 X3
-- -- --
-- -- --
-- -- --
-- -- --
-- -- --
55
With indirection

Grid can be regular without wasting space
We do have price of indirection

56
Can also index grid on value ranges

Salary Grid

0-20K
1
20K-50K
2
50K-
3
8
Linear Scale
1
2
3
Toy
Sales
Personnel
57
Grid files

Good for multiple-key search
Space, management overhead (nothing is
free)
Need partitioning ranges that evenly split keys

-
-
58
Partitioned hash function

Idea
Key1 Key2

010110 1110010
h1
h2
59
EX

h1(toy) 0 000
h1(sales) 1 001
h1(art) 1 010
. 011
.
h2(10k) 01 100
h2(20k) 11 101
h2(30k) 01 110
h2(40k) 00 111
.
.
,

Insert
60

h1(toy) 0 000
h1(sales) 1 001
h1(art) 1 010
. 011
.
h2(10k) 01 100
h2(20k) 11 101
h2(30k) 01 110
h2(40k) 00 111
.
.
Find Emp. with Dept. Sales ? Sal40k

h1(toy) 0 000
h1(sales) 1 001
h1(art) 1 010
. 011
.
h2(10k) 01 100
h2(20k) 11 101
h2(30k) 01 110
h2(40k) 00 111
.
.
Find Emp. with Sal30k

h1(toy) 0 000
h1(sales) 1 001
h1(art) 1 010
. 011
.
h2(10k) 01 100
h2(20k) 11 101
h2(30k) 01 110
h2(40k) 00 111
.
.
Find Emp. with Dept. Sales

63
Summary

Post hashing discussion
- Indexing vs. Hashing
- SQL Index Definition
- Multiple Key Access
- Multi Key Index
Variations Grid, Geo Data
- Partitioned Hash

64
Reading Chapter 5

Skim the following sections
5.3.6, 5.3.7, 5.3.8
5.4.2, 5.4.3, 5.4.4
Read the rest

65
The BIG picture.

Chapters 2 3 Storage, records, blocks...
Chapter 4 5 Access Mechanisms - Indexes
- B trees
- Hashing
- Multi key
Chapter 6 7 Query Processing

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

World's Best PowerPoint Templates PowerPoint PPT Presentation

World's Best PowerPoint Templates - CrystalGraphics offers more PowerPoint templates than anyone else in the world, with over 4 million to choose from. Winner of the Standing Ovation Award for “Best PowerPoint Templates” from Presentations Magazine. They'll give your presentations a professional, memorable appearance - the kind of sophisticated look that today's audiences expect. Boasting an impressive range of designs, they will support your presentations with inspiring background photos or videos that support your themes, set the right mood, enhance your credibility and inspire your audiences.

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Multi-dimensional Indexes PowerPoint PPT Presentation

Multi-dimensional Indexes - Multi-dimensional Indexes 198:541 | PowerPoint PPT presentation | free to view

541: Database Systems - 541: Database Systems | PowerPoint PPT presentation | free to view

CS 541 Review PowerPoint PPT Presentation

CS 541 Review - Hashing / Bitmap Indexes. Query Processing. Query Optimization. Handling Failure ... Attribute = property of entities in an entity set, similar to fields of a struct. ... | PowerPoint PPT presentation | free to view

CS 541 Review for Midterm PowerPoint PPT Presentation

CS 541 Review for Midterm - Attribute = property of entities in an entity set, similar to ... of the standard nomenclature (species is a weak E.S. supported by membership in a genus) ... | PowerPoint PPT presentation | free to view

DBMS Storage and Indexing PowerPoint PPT Presentation

DBMS Storage and Indexing - In first alternative, moving records for ... Alternative File Organizations ... So, better than Alternative 1 with large data records, especially if search keys ... | PowerPoint PPT presentation | free to view

D-vitamin receptor aktiv PowerPoint PPT Presentation

D-vitamin receptor aktiv - Title: A kr nikus veseel gtelens g Author: Dr. Zakar G bor Last modified by: gzakar Created Date: 5/17/2001 4:51:14 PM Document presentation format | PowerPoint PPT presentation | free to view

GATE 2022 Exam: An Overview PowerPoint PPT Presentation

GATE 2022 Exam: An Overview - GATE or Graduate Aptitude Test Engineering is a national-level engineering entrance exam that will be conducted by IIT Kharagpur for the Year 2022 session. GATE 2022 will be having two more new papers, Marine Engineering & Naval Architecture as one of the papers, and Geomatics Engineering. Hence, there will be 29 papers for the GATE 2022 Examination. | PowerPoint PPT presentation | free to view

q - Title: CHAPTER 06 - RSA cryptosystem Author: Radek Kantor Last modified by: gruska Created Date: 11/2/2003 5:02:41 PM Document presentation format | PowerPoint PPT presentation | free to view

Advances in Pattern Databases PowerPoint PPT Presentation

Advances in Pattern Databases - 1. Advances in Pattern Databases. Ariel Felner, Ben-Gurion University. Israel ... sorts all generated nodes in an OPEN-LIST and chooses the node with the best ... | PowerPoint PPT presentation | free to view

Origin of Heuristic Functions PowerPoint PPT Presentation

Origin of Heuristic Functions - Heuristic from Relaxed Models Example: Road navigation STRIPS - Eight Puzzle Example Now in order to construct a simplified or relaxed problem we only have to remove ... | PowerPoint PPT presentation | free to view

Risk Behaviors Reported by Young Adults with Chronic Hepatitis C in Florida, March PowerPoint PPT Presentation

Risk Behaviors Reported by Young Adults with Chronic Hepatitis C in Florida, March - Oxycodone/oxycontin (including roxycodone) was the most commonly injected drug. Of those reporting recent IDU (within 6 months), 97% reported sharing needles. | PowerPoint PPT presentation | free to view

Case Studies: DNS, X'500, and NARS EEE465 1999 Lecture 38 PowerPoint PPT Presentation

Case Studies: DNS, X'500, and NARS EEE465 1999 Lecture 38 - We have looked at the design of a Simple Name Service, discussed some of its ... the tactical environment, where it is unwise to rely on the availability of any ... | PowerPoint PPT presentation | free to view

Relational Query Optimization PowerPoint PPT Presentation

Relational Query Optimization - Term col=value has RF 1/NKeys(I), given index I on col ... Term col value has RF (High(I)-value)/(High(I)-Low(I)) SELECT attribute list. FROM relation list ... | PowerPoint PPT presentation | free to view

Recruiting Hiring Onboarding PowerPoint PPT Presentation

Recruiting Hiring Onboarding - Recruiting Hiring Onboarding Presenter: Terri Higgins, PHR Associate Director, Henricopolis SWCD Associate Director, Loudoun SWCD | PowerPoint PPT presentation | free to view

Towards a Science of Knowledge Base Performance Analysis PowerPoint PPT Presentation

Towards a Science of Knowledge Base Performance Analysis - http://www.cs.vu.nl/~pmika/swc/btc.html. Analysis ... Degree of interconnectedness (percentage of non-literal statements, with/without ... | PowerPoint PPT presentation | free to view

Evaluation of Relational Operators PowerPoint PPT Presentation

Evaluation of Relational Operators - ... unsorted: Must essentially scan the whole relation; cost ... If space for just 90 pages of R, we would scan S 12 times. ... Then resume scanning R and S. ... | PowerPoint PPT presentation | free to view

RSA Public Key Encryption Algorithm PowerPoint PPT Presentation

RSA Public Key Encryption Algorithm - If Bob wants to respond to Alice, he will encrypt his message using her ... Alice. Bob. Trudy/Eve/Mallory. INTERNET. 11. Public-Key Cryptography Requirements ... | PowerPoint PPT presentation | free to view

Welcome to my Nightmare PowerPoint PPT Presentation

Welcome to my Nightmare - This causes shared pool thrashing. Can result in 4031 errors. Flushing is not the answer ... Look at the code in the shared pool. if you see code that looks like so: ... | PowerPoint PPT presentation | free to view

Chapter 3: Topk Query Processing and Indexing PowerPoint PPT Presentation

Chapter 3: Topk Query Processing and Indexing - metasearch engines: aggregation over ranked results from multiple. web search engines ... Best-score. Worst-score. Doc. Rank. t1. d78. 0.9. d1. 0.7. d88. 0.2 ... | PowerPoint PPT presentation | free to view

Security "Monsters:" Current Security Threats and What You Should Be Doing to Address Them IT Security: A Call to Action for the Education Community 10:45-12:00, November 2nd, 2006, Ramada Plaza, Fargo ND Joe St Sauver, Ph.D. 235 Computing - Given the potential impact of lost laptops or other mobile devices, how should ... Alternative approach: treat laptops as nothing more than a display device and ... | PowerPoint PPT presentation | free to view

E-mail: nfhuang@cs.nthu.edu.tw - / E-mail: nfhuang@cs.nthu.edu.tw | PowerPoint PPT presentation | free to view

Probabilistic%20Calling%20Context PowerPoint PPT Presentation

Probabilistic%20Calling%20Context - Probabilistic Calling Context Michael D. Bond Kathryn S. McKinley University of Texas at Austin | PowerPoint PPT presentation | free to view

Cached Murphi PowerPoint PPT Presentation

Cached Murphi - Cached Murphi Enrico Tronci Dipartimento di Informatica Universit di Roma La Sapienza , Via Salaria 113, 00198 Roma Tronci@dsi.uniroma1.it http://www.dsi ... | PowerPoint PPT presentation | free to view

CS232 roadmap PowerPoint PPT Presentation

CS232 roadmap - Two GNU tools integrated into the GCC C compiler. Gprof: The GNU profiler ... Two example tools: Gprof: periodically interrupts program. Gcov: inserts counters ... | PowerPoint PPT presentation | free to view

OCR - ii 10 15 12 15 | PowerPoint PPT presentation | free to view

High Performance Embedded Computing with Massively Parallel Processors PowerPoint PPT Presentation

High Performance Embedded Computing with Massively Parallel Processors - High Performance Embedded Computing with Massively Parallel Processors Yangdong Steve Deng dengyd@tsinghua.edu.cn Tsinghua University The routing table ... | PowerPoint PPT presentation | free to view

VISTA IMAGING: SO THATS HOW IT WORKS PowerPoint PPT Presentation

VISTA IMAGING: SO THATS HOW IT WORKS - IEN=INTERNAL ENTRY NUMBER OF THE RECORD. PIECES=DATA STORED BETWEEN UP ARROW CHARACTERS CALLED THE DELIMITERS ... DENTAL|LONG|Intra-Oral Xray Unit, Rm, D-153 ... | PowerPoint PPT presentation | free to view