# Data Structures(????) Course 2:Searching - PowerPoint PPT Presentation

PPT – Data Structures(????) Course 2:Searching PowerPoint presentation | free to download - id: 69558f-ZjViM

The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
Title:

## Data Structures(????) Course 2:Searching

Description:

### Title: PowerPoint Presentation Author: Valued Gateway Client Last modified by: xz Created Date: 1/15/2000 4:50:39 AM Document presentation format – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 67
Provided by: ValuedGate2465
Category:
Tags:
Transcript and Presenter's Notes

Title: Data Structures(????) Course 2:Searching

1
Data Structures(????)Course 2Searching
2
Vocabulary
• sequential search ????
• element ??
• order ??
• binary search ????
• target ??
• algorithm ??
• array ??
• location ??
• object ??,??
• parameter ??
• index ??,??,??
• sentinel ??
• probability ??
• key ???
• hash ??,??
• collision ??
• cluster ??,??
• synonym ???,???
• probe ??

3
Searching
• One of the most common and time-consuming
operations in computer science.
• To find the location of a target among a list of
objects.

4
• List searching(including two basic search
algorithms)
• Sequential search(including three variations)
• Binary search
• Hashed list searchingthe key through an
algorithmic function determines the location of
data
• Collision resolution
• To discuss the list search algorithms using an
array structure

5
2-1 list searches (work with arrays)
• The algorithm used to search a list depends to
the structure of list
• Sequential search(any array)
• List no ordered
• Small lists
• Not searched often

6
Locating data in unordered list
Location wanted (3)
A0
A1
A11
4 21 36 14 62 91 8 22 7 81 77 10
Target given (14)
7
Search Concept
Target given14 Location wanted3
8
Search Concept
9
Sequential search algorithms
• Needs to tell the calling algorithm two things
• Did it Find the data it was looking for?
• If it did, at what index are the target data
found.
• Requires four parameters
• The list we are searching
• An index to the last element in the list
• The target
• The address where the found elements index
location is to stored
• (Return Boolean)

10
sequential search algorithm
Locate the target in an unordered list Pre list
must contain at least one element last is index
to last element in the list target contains the
data to be located locn is address of index in
calling algorithm Post if foundmatching index
stored in locn found false Return foundltbooleangt
• algorithm seqsearch(val list ltarraygt
• val last ltindexgt
• val target
ltkeytypegt
• ref locn
ltindexgt)
• looker0
• loop (looker lt last and
• target not equal list looker)
• looker looker 1
• end loop
• locn looker
• if (target equal list looker)
• found true
• else
• found false
• end if
• return found
• end seqsearch

11
Variations on sequential searches
• Sentinel search
• Probability search
• Ordered list search

12
Sentinel search
Locate the target in an unordered list Pre list
must contain at least one element Last is index
to last element in the list Target contains the
data to be located Locn is address of index in
calling algorithm Post if foundmatching index
stored in locn found true Return foundltbooleangt
• algorithm seqsearch(val list ltarraygt
• val last
ltindexgt
• val target
ltkeytypegt
• ref locn
ltindexgt)
• List last 1 target
• looker0
• loop (target not equal list looker)
• looker looker 1
• end loop
• locn looker
• if (looker lt last)
• found true
• locn looker
• else
• found false
• locn last
• end if
• return found
• end sentinel search

13
probability search
• looker0
• loop (looker lt last and target not equal list
looker)
• looker looker 1
• end loop
• if (target equal list looker)
• found true
• if ( looker gt 0 )
• temp list looker 1
• list looker 1 list
looker
• list looker temp
• looker looker 1
• endif
• else
• found false
• end if
• locn looker
• return found
• end probability search

Locate the target in an unordered list Pre as
the same above Post if foundmatching index
stored in locn found true Element move up
foundltbooleangt
14
Ordered list search
• Locate target in a list ordered on target
• Note
• It is not necessary to search to the end of list
• It is only for the small list
• Incorporate the Sentinel
• Pre the same as sequential
• Post
• if foundthe same as above
target or locn equal last found is false
• Return found lt boolean gt
• If (target lt listlast )
• looker0
• loop (target gt list looker)
• looker looker 1
• end loop
• else
• looker last
• endif
• if (target equal listlooker)
• found true
• else
• found false
• end if
• locn looker
• return found

15
Binary search
• Sequential search algorithm is very slow
• But, It is the only solution if the array is not
sorted
• Binary search(ordered list)
• For the large list
• First sort
• Then search

16
Binary search method
• Suppose
• L a sorted list
• searching for a value X
• Compare X to the middle value (M) in L.
• if X M we are done.
• if X lt M we continue our search, but we can
confine our search to the first half of L and
entirely ignore the second half of L.
• 4.if X gt M we continue, but confine ourselves to
the second half of L.

17
First
mid
last
18
19
Binary search(ordered list )
Pre list is ordered it must contain at least one
element end is index to the largest element in
the list Target is the value of element being
sought Locn is address of index in calling
algorithm Post Foundlocn assigned index to
element below or above target found set
false Return foundltbooleangt
else found equal force
exit first last 1 end if
end loop locn mid if (target equal list
mid) found true else found
false end if return found end binary search
• algorithm binary_search(
• val list ltarraygt,
• val end ltindexgt,
• val target ltkeytypegt,
• ref locn ltindexgt)
• First 0
• Last end
• loop (first lt last )
• mid ( first last ) / 2
• if ( target gt list mid )
• look in upper half
• first mid 1
• else if ( target lt list mid )
• look in lower half
• last mid 1

20
Analyzing (the efficiency)
• Sequential search ,Sentinel search ,Ordered list
search O(n)
• Binary search O(log 2n)
• Comparison of binary and sequential searches

size binary Sequential (average) Sequential (worst case)
16 4 8 16
10,000 14 5000 10,000
1,000,000 20 500,000 1,000,000
21
2-3 Hashed list searches
Ideal search we would know exactly where the
data are and go directly to
there Goal of hashed search to find the data
with only
one test
Use an array of data
22
Hash function
key
5
102002 107095 111060
100
hash
2
key
Figure 2-6 Hash concept
23
Basic Concepts
Hash search A search in which the key
,through an algorithmic function,
determines the location of the data. we use
a hashing algorithm to transform the key into the
index that contains the data we need to
24
Problem
A set of keys hash to the same locationSynonym
Contain two or more synonyms in a listcollision
Prime areamemory contains all of home addresses
Collision resolutiontwo keys collide at a home
address Place one of the keys and its data in
another location
25
B and A Collide at 8
Collision resolution
C and B Collide at 16
C A B
0
4
16
8
Collision resolution
1.hash(A)
3.hash(C)
2.hash(B)
Figure 2-7 the collision resolution concept
26
Locate an element in a hashed list Use the same
algorithm to insert it into the list First hash
the key and check the home address If it does
the search is complete If not use the collision
resolution algorithm to determine the next
location and continue until find the element or
determine it is not in the list Each calculation
of an address and test for success probe
27
Hashing methods
Hashing methods
modulo division
direct
rotation
midsquare
pseudorandom generation
digit extraction
subtraction
folding
Figure 2-8 Basic hashing techniques
28
Direct method
• The key is the address(an element a key , no
synonyms)
• Example1 total monthly sales by the days of the
months
• Create an array of 31accumulator
• The accumulation code is

dailySalessale.day dailySalessale.day
sale.amount
29
Example 2 a small company has fewerlt100 Employee
number is between 1 and 100
000
000 (not used)
001 Harry lee
002 Sarah trapp
003
004
005 Vu nguyen
006
007
008

099
001
002
003
004
5
005
005 100 002
100
hash
006
2
007
008
key
Figure 2-9 Direct hashing Of employee numbers
099
100
30
Subtraction method
• keys are consecutive , but do not start from 1
• Such as your student ID number
• Hashing function is very simple
• No collisions
• Only for small lists

31
Note 1. Generally speaking , hashing lists
require some empty elements to reduce the number
of collisions 2. This application above two is
the ideal ,but it is very limited , such as ID
card number
32
Modulo-division method(Division remainder)
This method divides the key by the array size and
uses the remainder for the address Hashing
algorithm is
Note a prime number listsize produces fewer
collisions
33
000
379452 Marry Dodd

121267 Bryan Devaux

378845 John Carver

160252 Tuan Ngo
045128 Shouli Feldman
001
002
2
003
121267 045128 379452
306
hash
004
0
005
006
007
008
Listsize307
305
Figure 2-10 modulo-division Hashing
306
34
Digit extraction method Selected digits are
extracted from the key And used as
379452 121267 378845 160252 045128
394 112 388 102 051
6-digits Employee number
Select the first, third, fourth digits
35
Midsquare method
The key is squared and the address selected from
the middle of the squared number Limitation the
size of the key Example 4-digit keys
Variation select a portion of the key
379452 121267 378845 160252 045128
379 379143641 121 121014641 378
378142884 160 160025600 045 045002025
364 464 288 560 202
Select 1-3 digits
Fill 0 to 6 digits
squared
36
Folding methods fold shift and fold boundary
123456789
Digits reversed
321
123
123
456
789
123
456
789

987
789
Digits reversed
764
1
1
368
(b)fold boundary
(a)fold shift
Figure 2-11 hash fold examples
37
Rotation method Incorporate with others
Useful when keys are assigned serially
600101 600102 600103 600104 600105
600101 600102 600103 600104 600105
160010 260010 360010 460010 560010
Original key
Rotation
Rotated key
Figure 2-12 Rotation hashing
38
Pseudorandom method
In this method, the key is used as the seed in a
pseudorandom number generator , the resulting
random number is scaled into the possible address
range using modulo division
A common random generator is yaxc For
efficiency,factors a and c should be prime
numbers For example , a17, c7
39
(170451287) modulo 307297
000

379452 Marry Dodd

121267 Bryan Devaux

378845 John Carver

045128 Shouli Feldman

160252 Tuan Ngo

(171212677) modulo 30741
007
41
121267 045128 379452
041
297
hash
7
(173794527) modulo 3077
297
Figure 2-10 modulo-division Hashing
306
40
Hash Algorithm
• Convert the alphanumeric key into a number by
adding the American Standard Code for Information
Interchange(ASCII) to accumulator.
• Rotate the bits in the address to maximize the
distribution of the values.
• Take the absolutely value of the address and map

41
Hash Algorithm
This algorithm converts an alphanumeric key of
size characters into an integral address. Pre Key
is a key to be hashed. size is the number
of characters in the key.
• algorithm Hash(
• val key ltarray gt,
• val size ltintegergt,
• Looper 0
• Hash Key
• Loop (Loopltsize)
• if (keylooper not space)
• rotate addr 12 bits right
• end if
• End loop

Hash
42
2-4 collision resolution
• Except the direct and subtraction, none of the
hashing methods are one-to-one mapping
• Collision not avoid
• There are several methods for hashing collisions

Collision resolution
buckets
pseudorandom
Key offset
Linear probe
Figure 2-13 collision resolution methods
43
Several concepts
• data to group within the list (unevenly across a
hashed list).
• a high degree of clustering grows the number of
probes to locate an element and reduces the
processing efficiency of the list. There are two
• Primary clustering when data cluster around a
• Secondary clusteringwhen data become grouped
along a collision path throughout a list
• Need to design hashing algorithms to minimize
clustering
• Clustering

44
• Resolves collisions in the prime area (contains
all of the home addresses )
• Linear probe
• Double hashing
• Pseudorandom
• Key offset

45
Linear Probe
000
379452 Marry Dodd
070918 Sarah Trapp
121267 Bryan Devaux
166702 Harry eagle

378845 John Carver

160252 Tuan Ngo
045128 Shouli Feldman
001
002
First insert No collision
003
004
1
070918 166702
005
hash
006
1
007
008
305
Figure 2-14 linear probe collision resolution
306
46
linear probe
Disadvantage first, tend to produce primary
clustering . Second, tend to make the search
algorithm more complex
47
• To eliminate primary clustering
• The increment is the collision probe number
,
• The new address is the modulo of the list size.
• 1. the time required to square the probe
number.
• 2. It is not possible to generate a new
address for every element in the list.

48
Pseudorandom collision resolution
• A double hashing the address is rehashed
• Uses a pseudorandom number to resolve the
collision
• Using the collision address as a factor in the
random number calculation, such as

Figure2-15 showing a collision resolving for
figure 2-14
49
Pseudorandom probe
000
379452 Marry Dodd
070918 Sarah Trapp
121267 Bryan Devaux

378845 John Carver
166702 Harry eagle

160252 Tuan Ngo
045128 Shouli Feldman
001
002
First insert No collision
003
004
1
005
070918 166702
hash
006
1
007
008
second insert collision
Pseudorandom Y 3x5
305
306
Figure 2-15 pseudorandom collision resolution
50
Key offset
• Another double hashing
• Produces different collision paths for different
keys
• key offset calculates the new address as (the
simplest versions)

51
Example the key is 166702, list size is
307,using the modulo-division generate an address
of 1 This synonym of 070918 produces a collision
at 1 Using key offset to calculate the next
offset ?166702 / 307? 543 address ((543
001) modulo 307) 237
If 237 were also a collision, repeat the process
offset ?166702 / 307? 543 address ((543
237) modulo 307) 166
52
To really see the effect of key offset, we need
to calculate several different keys ,all hashing
to the same home address. Table 2-3 shows that
three keys that collide at address 001, Next two
Key28 Home address Key offset Probe 1 Probe 2
166702 1 543 237 166
572556 1 1865 024 047
067234 1 219 220 132
Table 2-3 key offset
Note each key resolves its collision at a
different address for both the first and second
probes
53
that each collision resolution increases the
probability of future collisions
• A linked list is an ordered collection of data in
which each element contains the location of the
next element

54
000
379452 Marry Dodd
070918 Sarah Trapp
121267 Bryan Devaux

160252 Tuan Ngo
045128 Shouli Feldman
166702 Harry eagle
001
002
572556 Chris Wallj
003
004
pointer
pointer
005
006
007
008
305
306
Figure 2-16 linked list collision resolution
55
• Linked list resolution uses a separate area to
store collisions and chains all synonyms together
• It uses two storage areas, the prime area and the
overflow area
• Each element in the prime area contains an
• The linked list data can be stored in any order,
but the most common is key sequence

56
Bucket hashing
Bucket 0 379452 Marry Dodd
Bucket 0
Bucket 0
Bucket 1 070918 Sarah Trapp
Bucket 1 166702 Harry eagle
Bucket 1 367173 Ann georgis
Bucket 2 121267 Bryan Devaux
Bucket 2 572556 Chris wallj
Bucket 2

Bucket 307 045128 Shouli Feldman
Bucket 307
Bucket 307
000
nodes that accommodate multiple data.
occurrences, collision are postponed until the
bucket is full
001
002
Linear probe Places here
307
Figure 2-17 bucked hashing
57
Two problems combination approaches
• First it uses significantly more space, many of
the buckets will be (or partially) empty
• Second it does not completely resolve the
collision problem
• Resolving the collision is to use the linear
probe
• There are several approaches to resolving
collisions ,often uses multiple steps
• Example one large database hashes to a bucket,
full, linear probe , linked list overflow area

58
summary
• Searching is the process of finding the location
of a target among a list of objects
• Two basic searching methods for arrays
sequential and binary search
• The sequential search is normally used when a
list is not sorted. It starts at the beginning of
the list and searches until it finds the data or
hits the end of the list
• One of the variation of the sequential search is
the sentinel search. In this method,the condition
ending the search is reduced to only one by
artificially inserting the target at the end of
the list
• The second variation of the sequential search is
called the probability search. In this method,
the list is ordered with the most probable
elements at the beginning of the list and the
least probable at the end

59
2-5 summary(continued)
• The sequential search can also be used to search
a sorted list, in this case, we can terminate the
search when the target is less than the current
element
• If an array is sorted, we can use a more
efficient algorithm called the binary search
• the binary search algorithm searches the list by
first checking the middle element. If the target
is not in the middle element, the algorithm
eliminates the upper half or the lower half of
the list depending on the value of the middle
element. The process continues until the target
is found or reduced list length becomes zero
• The efficiency of a sequential search is O(n)
• The efficiency of a binary search is O(log2n)

60
summary(continued)
• In a hashed search,the key through an algorithmic
transformation,determines the location of the
data. It is a key-to-address transformation
• There are several hashing functions we
discussed direct, subtraction, modulo division,
digit extraction, mid-square, folding, rotation ,
and pseudorandom generation

61
summary(continued)
• In direct hashing,the key is the address without
any algorithmic manipulation
• In subtraction hashing,the key is transformed to
an address by subtracting a fixed number from it
• In modulo-division hashing,the key is divided by
the list size,recommended to be a prime number
• In digit-extraction hashing,selected digits are
extracted from the key and used as an address
• In mid-square hashing,the key is squared and the
address is selected from the middle of the result
• In fold shift hashing,the key is divided into
parts whose sizes match the size of the required

62
summary(continued)
• In fold boundary hashing,the key is divided into
parts whose sizes match the size of the required
address.then the left and right parts are
reversed and added to the middle part to obtain
• In rotation hashing,the rightmost digit of the
key is rotated to the left to determine an
address. However,this method is usually used in
combination with other methods
• In the pseudorandom generation hashing,the key is
used as the seed to generate a pseudorandom
number. The result is then scaled to obtain the
• Except in the direct and subtraction methods,
collisions are unavoidable in hashing. Collision
occur when a new key is hashed to an address that

63
summary(continued)
• Clustering is the tendency of data to build up
unevenly across a hashed list.
• Primary clustering occur when data build up
• Secondary clustering occurs when data build up
along a collision path in the list
• To solve a collision, a collision resolution
method is used
• Three general methods are used to resolve
buckets
• The open addressing method can be subdivided into
rehashing,and key-offset rehashing

64
summary(continued)
• In the linear probe method,when the collision
occurs,the new data will be stored in the next
• In the quadratic method,the increment is the
collision probe number squared.
• In the pseudorandom rehashing method, we use a
random number generator to rehash the address
• In the key-offset rehashing method,we use an

65
summary(continued)
• In the linked list technique,we use separate
areas to store collision and chain all synonyms
• In bucket hashing,we use a bucket that can
accommodate multiple data occurrences

66
Homework
• Using the modulo-division method and linear
probing, store the keys shown below in an array
with 19 elements, How many collision occurred?
The value of load factor of the list after all
keys have been inserted?
• 224562,137456,214562,140145,214567,162145,144467,
199645,234534
• Repeat above problem using the digit-extraction
method (first, third and fifth digits) and