Data Structures(????) Course 2:Searching - PowerPoint PPT Presentation

Loading...

PPT – Data Structures(????) Course 2:Searching PowerPoint presentation | free to download - id: 69558f-ZjViM



Loading


The Adobe Flash plugin is needed to view this content

Get the plugin now

View by Category
About This Presentation
Title:

Data Structures(????) Course 2:Searching

Description:

Title: PowerPoint Presentation Author: Valued Gateway Client Last modified by: xz Created Date: 1/15/2000 4:50:39 AM Document presentation format – PowerPoint PPT presentation

Number of Views:16
Avg rating:3.0/5.0
Slides: 67
Provided by: ValuedGate2465
Category:

less

Write a Comment
User Comments (0)
Transcript and Presenter's Notes

Title: Data Structures(????) Course 2:Searching


1
Data Structures(????)Course 2Searching
2
Vocabulary
  • sequential search ????
  • element ??
  • order ??
  • binary search ????
  • target ??
  • algorithm ??
  • array ??
  • location ??
  • object ??,??
  • parameter ??
  • index ??,??,??
  • sentinel ??
  • probability ??
  • key ???
  • hash ??,??
  • collision ??
  • cluster ??,??
  • synonym ???,???
  • probe ??
  • load factor ????

3
Searching
  • One of the most common and time-consuming
    operations in computer science.
  • To find the location of a target among a list of
    objects.

4
Main contents(in chapter 2)
  • List searching(including two basic search
    algorithms)
  • Sequential search(including three variations)
  • Binary search
  • Hashed list searchingthe key through an
    algorithmic function determines the location of
    data
  • Collision resolution
  • To discuss the list search algorithms using an
    array structure

5
2-1 list searches (work with arrays)
  • The algorithm used to search a list depends to
    the structure of list
  • Sequential search(any array)
  • List no ordered
  • Small lists
  • Not searched often

6
Locating data in unordered list
Location wanted (3)
A0
A1
A11
4 21 36 14 62 91 8 22 7 81 77 10
Target given (14)
7
Search Concept
Target given14 Location wanted3
8
Search Concept
9
Sequential search algorithms
  • Needs to tell the calling algorithm two things
  • Did it Find the data it was looking for?
  • If it did, at what index are the target data
    found.
  • Requires four parameters
  • The list we are searching
  • An index to the last element in the list
  • The target
  • The address where the found elements index
    location is to stored
  • (Return Boolean)

10
sequential search algorithm
Locate the target in an unordered list Pre list
must contain at least one element last is index
to last element in the list target contains the
data to be located locn is address of index in
calling algorithm Post if foundmatching index
stored in locn found true If not foundlast
stored in locn found false Return foundltbooleangt
  • algorithm seqsearch(val list ltarraygt
  • val last ltindexgt
  • val target
    ltkeytypegt
  • ref locn
    ltindexgt)
  • looker0
  • loop (looker lt last and
  • target not equal list looker)
  • looker looker 1
  • end loop
  • locn looker
  • if (target equal list looker)
  • found true
  • else
  • found false
  • end if
  • return found
  • end seqsearch

11
Variations on sequential searches
  • Sentinel search
  • Probability search
  • Ordered list search

12
Sentinel search
Locate the target in an unordered list Pre list
must contain at least one element Last is index
to last element in the list Target contains the
data to be located Locn is address of index in
calling algorithm Post if foundmatching index
stored in locn found true If not foundlast
stored in locn found true Return foundltbooleangt
  • algorithm seqsearch(val list ltarraygt
  • val last
    ltindexgt
  • val target
    ltkeytypegt
  • ref locn
    ltindexgt)
  • List last 1 target
  • looker0
  • loop (target not equal list looker)
  • looker looker 1
  • end loop
  • locn looker
  • if (looker lt last)
  • found true
  • locn looker
  • else
  • found false
  • locn last
  • end if
  • return found
  • end sentinel search

13
probability search
  • looker0
  • loop (looker lt last and target not equal list
    looker)
  • looker looker 1
  • end loop
  • if (target equal list looker)
  • found true
  • if ( looker gt 0 )
  • temp list looker 1
  • list looker 1 list
    looker
  • list looker temp
  • looker looker 1
  • endif
  • else
  • found false
  • end if
  • locn looker
  • return found
  • end probability search

Locate the target in an unordered list Pre as
the same above Post if foundmatching index
stored in locn found true Element move up
in priority If not foundas same Return
foundltbooleangt
14
Ordered list search
  • Locate target in a list ordered on target
  • Note
  • It is not necessary to search to the end of list
  • It is only for the small list
  • Incorporate the Sentinel
  • Pre the same as sequential
  • Post
  • if foundthe same as above
  • If not foundlocn is index of first element gt
    target or locn equal last found is false
  • Return found lt boolean gt
  • If (target lt listlast )
  • looker0
  • loop (target gt list looker)
  • looker looker 1
  • end loop
  • else
  • looker last
  • endif
  • if (target equal listlooker)
  • found true
  • else
  • found false
  • end if
  • locn looker
  • return found

15
Binary search
  • Sequential search algorithm is very slow
  • But, It is the only solution if the array is not
    sorted
  • Binary search(ordered list)
  • For the large list
  • First sort
  • Then search

16
Binary search method
  • Suppose
  • L a sorted list
  • searching for a value X
  • Compare X to the middle value (M) in L.
  • if X M we are done.
  • if X lt M we continue our search, but we can
    confine our search to the first half of L and
    entirely ignore the second half of L.
  • 4.if X gt M we continue, but confine ourselves to
    the second half of L.

17
First
mid
last
18
Target not found --Target 11 is not in the list
19
Binary search(ordered list )
Pre list is ordered it must contain at least one
element end is index to the largest element in
the list Target is the value of element being
sought Locn is address of index in calling
algorithm Post Foundlocn assigned index to
target element found set true not foundlocn
element below or above target found set
false Return foundltbooleangt
else found equal force
exit first last 1 end if
end loop locn mid if (target equal list
mid) found true else found
false end if return found end binary search
  • algorithm binary_search(
  • val list ltarraygt,
  • val end ltindexgt,
  • val target ltkeytypegt,
  • ref locn ltindexgt)
  • First 0
  • Last end
  • loop (first lt last )
  • mid ( first last ) / 2
  • if ( target gt list mid )
  • look in upper half
  • first mid 1
  • else if ( target lt list mid )
  • look in lower half
  • last mid 1

20
Analyzing (the efficiency)
  • Sequential search ,Sentinel search ,Ordered list
    search O(n)
  • Binary search O(log 2n)
  • Comparison of binary and sequential searches

size binary Sequential (average) Sequential (worst case)
16 4 8 16
10,000 14 5000 10,000
1,000,000 20 500,000 1,000,000
21
2-3 Hashed list searches
Ideal search we would know exactly where the
data are and go directly to
there Goal of hashed search to find the data
with only
one test
Use an array of data
22
Hash function
address
key
address
5
102002 107095 111060
100
hash
2
key
Figure 2-6 Hash concept
23
Basic Concepts
Hash search A search in which the key
,through an algorithmic function,
determines the location of the data. we use
a hashing algorithm to transform the key into the
index that contains the data we need to
locate (key-to address)
24
Problem
A set of keys hash to the same locationSynonym
Contain two or more synonyms in a listcollision
Home addressproduced by hashing algorithm
Prime areamemory contains all of home addresses
Collision resolutiontwo keys collide at a home
address Place one of the keys and its data in
another location
25
B and A Collide at 8
Collision resolution
C and B Collide at 16
C A B
0
4
16
8
Collision resolution
1.hash(A)
3.hash(C)
2.hash(B)
Figure 2-7 the collision resolution concept
26
Locate an element in a hashed list Use the same
algorithm to insert it into the list First hash
the key and check the home address If it does
the search is complete If not use the collision
resolution algorithm to determine the next
location and continue until find the element or
determine it is not in the list Each calculation
of an address and test for success probe
27
Hashing methods
Hashing methods
modulo division
direct
rotation
midsquare
pseudorandom generation
digit extraction
subtraction
folding
Figure 2-8 Basic hashing techniques
28
Direct method
  • The key is the address(an element a key , no
    synonyms)
  • Example1 total monthly sales by the days of the
    months
  • Create an array of 31accumulator
  • The accumulation code is

dailySalessale.day dailySalessale.day
sale.amount
29
Example 2 a small company has fewerlt100 Employee
number is between 1 and 100
000
000 (not used)
001 Harry lee
002 Sarah trapp
003
004
005 Vu nguyen
006
007
008


099
100 John adams
001
002
003
address
004
5
005
005 100 002
100
hash
006
2
007
008
key
Figure 2-9 Direct hashing Of employee numbers
099
100
30
Subtraction method
  • keys are consecutive , but do not start from 1
  • Such as your student ID number
  • Advantage
  • Hashing function is very simple
  • No collisions
  • Disadvantage
  • Only for small lists

31
Note 1. Generally speaking , hashing lists
require some empty elements to reduce the number
of collisions 2. This application above two is
the ideal ,but it is very limited , such as ID
card number
32
Modulo-division method(Division remainder)
This method divides the key by the array size and
uses the remainder for the address Hashing
algorithm is
Address key modulus listsize
Note a prime number listsize produces fewer
collisions
33
000
379452 Marry Dodd

121267 Bryan Devaux




378845 John Carver



160252 Tuan Ngo
045128 Shouli Feldman
001
002
2
003
121267 045128 379452
306
hash
004
0
005
006
007
008
Listsize307
305
Figure 2-10 modulo-division Hashing
306
34
Digit extraction method Selected digits are
extracted from the key And used as
address Example
379452 121267 378845 160252 045128
394 112 388 102 051
6-digits Employee number
3-digit address
Select the first, third, fourth digits
35
Midsquare method
The key is squared and the address selected from
the middle of the squared number Limitation the
size of the key Example 4-digit keys
9452945289340304address is 3403
Variation select a portion of the key
379452 121267 378845 160252 045128
379 379143641 121 121014641 378
378142884 160 160025600 045 045002025
364 464 288 560 202
Select 3-5 digits as address
Select 1-3 digits
Fill 0 to 6 digits
squared
36
Folding methods fold shift and fold boundary
123456789
Digits reversed
321
123
123
456
789
123
456
789


987
789
Digits reversed
764
1
1
368
discarded
discarded
(b)fold boundary
(a)fold shift
Figure 2-11 hash fold examples
37
Rotation method Incorporate with others
Useful when keys are assigned serially
600101 600102 600103 600104 600105
600101 600102 600103 600104 600105
160010 260010 360010 460010 560010
Original key
Rotation
Rotated key
Figure 2-12 Rotation hashing
38
Pseudorandom method
In this method, the key is used as the seed in a
pseudorandom number generator , the resulting
random number is scaled into the possible address
range using modulo division
A common random generator is yaxc For
efficiency,factors a and c should be prime
numbers For example , a17, c7
39
(170451287) modulo 307297
000


379452 Marry Dodd

121267 Bryan Devaux


378845 John Carver

045128 Shouli Feldman

160252 Tuan Ngo

(171212677) modulo 30741
007
41
121267 045128 379452
041
297
hash
7
(173794527) modulo 3077
297
Figure 2-10 modulo-division Hashing
306
40
Hash Algorithm
  • Convert the alphanumeric key into a number by
    adding the American Standard Code for Information
    Interchange(ASCII) to accumulator.
  • Rotate the bits in the address to maximize the
    distribution of the values.
  • Take the absolutely value of the address and map
    it into the address range.

41
Hash Algorithm
This algorithm converts an alphanumeric key of
size characters into an integral address. Pre Key
is a key to be hashed. size is the number
of characters in the key.
MaxAddr is the maximum possible
address for the list. Post addr contain
the hashed address
  • algorithm Hash(
  • val key ltarray gt,
  • val size ltintegergt,
  • val maxAddr ltintegergt,
  • ref addr ltintegergt)
  • Looper 0
  • Addr 0
  • Hash Key
  • Loop (Loopltsize)
  • if (keylooper not space)
  • addr addrkeylooper
  • rotate addr 12 bits right
  • end if
  • End loop

test for negative address if
(addrlt0) addrabsolute(addr) end if
addr addr modulo maxaddr return end
Hash
42
2-4 collision resolution
  • Except the direct and subtraction, none of the
    hashing methods are one-to-one mapping
  • Collision not avoid
  • There are several methods for hashing collisions

Collision resolution
Open addressing
Linked lists
buckets
pseudorandom
Key offset
Linear probe
Quadratic probe
Figure 2-13 collision resolution methods
43
Several concepts
  • data to group within the list (unevenly across a
    hashed list).
  • a high degree of clustering grows the number of
    probes to locate an element and reduces the
    processing efficiency of the list. There are two
  • Primary clustering when data cluster around a
    home address
  • Secondary clusteringwhen data become grouped
    along a collision path throughout a list
  • Need to design hashing algorithms to minimize
    clustering
  • load factor
  • Clustering

44
Open addressing
  • Resolves collisions in the prime area (contains
    all of the home addresses )
  • Linear probe
  • Quadratic probe
  • Double hashing
  • Pseudorandom
  • Key offset

45
Linear Probe
000
379452 Marry Dodd
070918 Sarah Trapp
121267 Bryan Devaux
166702 Harry eagle



378845 John Carver



160252 Tuan Ngo
045128 Shouli Feldman
001
002
First insert No collision
003
004
1
070918 166702
005
hash
006
1
007
008
second insert collision Add 1
305
Figure 2-14 linear probe collision resolution
306
46
linear probe
Variation Add 1, subtract 2,Add 3, subtract 4
Advantage simple to implement.
Disadvantage first, tend to produce primary
clustering . Second, tend to make the search
algorithm more complex
47
Quadratic probe
  • To eliminate primary clustering
  • The increment is the collision probe number
    squared.first probe, add 12,second probe, add 22
    ,
  • The new address is the modulo of the list size.
  • Disadvantage
  • 1. the time required to square the probe
    number.
  • 2. It is not possible to generate a new
    address for every element in the list.

48
Pseudorandom collision resolution
  • A double hashing the address is rehashed
  • Uses a pseudorandom number to resolve the
    collision
  • Using the collision address as a factor in the
    random number calculation, such as

New address 3 collision address 5
Figure2-15 showing a collision resolving for
figure 2-14
49
Pseudorandom probe
000
379452 Marry Dodd
070918 Sarah Trapp
121267 Bryan Devaux




378845 John Carver
166702 Harry eagle


160252 Tuan Ngo
045128 Shouli Feldman
001
002
First insert No collision
003
004
1
005
070918 166702
hash
006
1
007
008
second insert collision
Pseudorandom Y 3x5
305
306
Figure 2-15 pseudorandom collision resolution
50
Key offset
  • Another double hashing
  • Produces different collision paths for different
    keys
  • key offset calculates the new address as (the
    simplest versions)

offset ?key/listsize? address ((offset old
address) modulo listsize)
51
Example the key is 166702, list size is
307,using the modulo-division generate an address
of 1 This synonym of 070918 produces a collision
at 1 Using key offset to calculate the next
address
offset ?166702 / 307? 543 address ((543
001) modulo 307) 237
If 237 were also a collision, repeat the process
offset ?166702 / 307? 543 address ((543
237) modulo 307) 166
52
To really see the effect of key offset, we need
to calculate several different keys ,all hashing
to the same home address. Table 2-3 shows that
three keys that collide at address 001, Next two
collision probe addresses
Key28 Home address Key offset Probe 1 Probe 2
166702 1 543 237 166
572556 1 1865 024 047
067234 1 219 220 132
Table 2-3 key offset
Note each key resolves its collision at a
different address for both the first and second
probes
53
Linked list resolution
  • To eliminate the disadvantage of open addressing
    that each collision resolution increases the
    probability of future collisions
  • A linked list is an ordered collection of data in
    which each element contains the location of the
    next element

54
000
379452 Marry Dodd
070918 Sarah Trapp
121267 Bryan Devaux








160252 Tuan Ngo
045128 Shouli Feldman
166702 Harry eagle
001
002
572556 Chris Wallj
003
004
pointer
pointer
005
006
007
008
305
306
Figure 2-16 linked list collision resolution
55
Linked list resolution
  • Linked list resolution uses a separate area to
    store collisions and chains all synonyms together
    in a linked list
  • It uses two storage areas, the prime area and the
    overflow area
  • Each element in the prime area contains an
    additional field, a link head pointer
  • The linked list data can be stored in any order,
    but the most common is key sequence

56
Bucket hashing
Bucket 0 379452 Marry Dodd
Bucket 0
Bucket 0
Bucket 1 070918 Sarah Trapp
Bucket 1 166702 Harry eagle
Bucket 1 367173 Ann georgis
Bucket 2 121267 Bryan Devaux
Bucket 2 572556 Chris wallj
Bucket 2

Bucket 307 045128 Shouli Feldman
Bucket 307
Bucket 307
000
nodes that accommodate multiple data.
occurrences, collision are postponed until the
bucket is full
001
002
Linear probe Places here
307
Figure 2-17 bucked hashing
57
Two problems combination approaches
  • First it uses significantly more space, many of
    the buckets will be (or partially) empty
  • Second it does not completely resolve the
    collision problem
  • Resolving the collision is to use the linear
    probe
  • There are several approaches to resolving
    collisions ,often uses multiple steps
  • Example one large database hashes to a bucket,
    full, linear probe , linked list overflow area

58
summary
  • Searching is the process of finding the location
    of a target among a list of objects
  • Two basic searching methods for arrays
    sequential and binary search
  • The sequential search is normally used when a
    list is not sorted. It starts at the beginning of
    the list and searches until it finds the data or
    hits the end of the list
  • One of the variation of the sequential search is
    the sentinel search. In this method,the condition
    ending the search is reduced to only one by
    artificially inserting the target at the end of
    the list
  • The second variation of the sequential search is
    called the probability search. In this method,
    the list is ordered with the most probable
    elements at the beginning of the list and the
    least probable at the end

59
2-5 summary(continued)
  • The sequential search can also be used to search
    a sorted list, in this case, we can terminate the
    search when the target is less than the current
    element
  • If an array is sorted, we can use a more
    efficient algorithm called the binary search
  • the binary search algorithm searches the list by
    first checking the middle element. If the target
    is not in the middle element, the algorithm
    eliminates the upper half or the lower half of
    the list depending on the value of the middle
    element. The process continues until the target
    is found or reduced list length becomes zero
  • The efficiency of a sequential search is O(n)
  • The efficiency of a binary search is O(log2n)

60
summary(continued)
  • In a hashed search,the key through an algorithmic
    transformation,determines the location of the
    data. It is a key-to-address transformation
  • There are several hashing functions we
    discussed direct, subtraction, modulo division,
    digit extraction, mid-square, folding, rotation ,
    and pseudorandom generation

61
summary(continued)
  • In direct hashing,the key is the address without
    any algorithmic manipulation
  • In subtraction hashing,the key is transformed to
    an address by subtracting a fixed number from it
  • In modulo-division hashing,the key is divided by
    the list size,recommended to be a prime number
  • In digit-extraction hashing,selected digits are
    extracted from the key and used as an address
  • In mid-square hashing,the key is squared and the
    address is selected from the middle of the result
  • In fold shift hashing,the key is divided into
    parts whose sizes match the size of the required
    address.then the parts are added to obtain the
    address

62
summary(continued)
  • In fold boundary hashing,the key is divided into
    parts whose sizes match the size of the required
    address.then the left and right parts are
    reversed and added to the middle part to obtain
    the address
  • In rotation hashing,the rightmost digit of the
    key is rotated to the left to determine an
    address. However,this method is usually used in
    combination with other methods
  • In the pseudorandom generation hashing,the key is
    used as the seed to generate a pseudorandom
    number. The result is then scaled to obtain the
    address
  • Except in the direct and subtraction methods,
    collisions are unavoidable in hashing. Collision
    occur when a new key is hashed to an address that
    is already occupied

63
summary(continued)
  • Clustering is the tendency of data to build up
    unevenly across a hashed list.
  • Primary clustering occur when data build up
    around a home address
  • Secondary clustering occurs when data build up
    along a collision path in the list
  • To solve a collision, a collision resolution
    method is used
  • Three general methods are used to resolve
    collision open addressing,linked list,and
    buckets
  • The open addressing method can be subdivided into
    linear probe,quadratic probe,pseudorandom
    rehashing,and key-offset rehashing

64
summary(continued)
  • In the linear probe method,when the collision
    occurs,the new data will be stored in the next
    available address.
  • In the quadratic method,the increment is the
    collision probe number squared.
  • In the pseudorandom rehashing method, we use a
    random number generator to rehash the address
  • In the key-offset rehashing method,we use an
    offset to rehash the address

65
summary(continued)
  • In the linked list technique,we use separate
    areas to store collision and chain all synonyms
    together in a linked list
  • In bucket hashing,we use a bucket that can
    accommodate multiple data occurrences

66
Homework
  • Using the modulo-division method and linear
    probing, store the keys shown below in an array
    with 19 elements, How many collision occurred?
    The value of load factor of the list after all
    keys have been inserted?
  • 224562,137456,214562,140145,214567,162145,144467,
    199645,234534
  • Repeat above problem using the digit-extraction
    method (first, third and fifth digits) and
    quadratic probing.
About PowerShow.com