Ceng-212 Data Structures-1 - PowerPoint PPT Presentation

1 / 57
About This Presentation
Title:

Ceng-212 Data Structures-1

Description:

Searching Chapter 2 – PowerPoint PPT presentation

Number of Views:69
Avg rating:3.0/5.0
Slides: 58
Provided by: Sera167
Category:

less

Transcript and Presenter's Notes

Title: Ceng-212 Data Structures-1


1
Searching
Chapter 2
2
Outline
  • Linear List Searches
  • Sequential Search
  • The sentinel search,
  • The probability search,
  • The ordered search.
  • Binary Search
  • Hashed List Searches
  • Collision Resolution

3
Linear List Searches
  • We study searches that work with arrays.

Figure 2-1
4
Linear List Searches
  • There are two basic searches for arrays
  • The sequential search.
  • It can be used to locate an item in any array.
  • The binary search.
  • It requires an ordered list.

5
Linear List SearchesSequential Search
  • The list is not ordered!
  • We will use this technique only for small arrays.
  • We start searching at the beginning of the list
    and continue until we find the target entity.
  • Eighter we find it,
  • or we reach the end of the list!

6
Locating data in unordered list.
Figure 2-2
7
Linear List SearchesSequential Search Algorithm
  • RETURN The algorithm must be tell two things to
    calling algorithm
  • Did it find the data ?
  • If it did, what is the index (address)?

8
Linear List SearchesSequential Search Algorithm
  • The searching algorithm requires five parameters
  • The list.
  • An index to the last element in the list.
  • The target.
  • The address where the found elements index
    location is to be stored.
  • The address where the found or not found boolean
    is to be stored.

9
Sequential Search Algorithm
  • algorithm SeqSearch (val list ltarraygt, val last
    ltindexgt,
  • val target ltkeyTypegt, ref locn ltindexgt)
  • Locate the target in an unordered list of size
    elements.
  • PRE list must contain at least one element.
  • last is index to last element in the list.
  • target contains the data to be located.
  • locn is address of index in calling
    algorithm.
  • POST if found matching index stored in locn
    found TRUE
  • if not found last stored in locn found
    FALSE
  • RETURN found ltbooleangt

10
Sequential Search Algorithm
  • looker 1
  • loop (looker lt last AND target not equal
    list(looker))
  • looker looker 1
  • locn looker
  • if (target equal list(looker))
  • found true
  • else
  • found false
  • return found
  • end SeqSearch

Big-O(n)
11
Variations On Sequential Search
  • There are three variations of sequential search
    algorithm
  • The sentinel search,
  • The probability search,
  • The ordered search.

12
Sequential Search AlgorithmThe Sentinel Search
  • If the target will be found in the list, we can
    eliminate the test for the end of list.
  • algorithm SentinelSearch (val list ltarraygt, val
    last ltindexgt,
  • val target ltkeyTypegt, ref locn ltindexgt)
  • Locate the target in an unordered list of size
    elements.
  • PRE list must contain element at the end for the
    sentinel.
  • last is index to last element in the list.
  • target contains the data to be located.
  • locn is address of index in calling
    algorithm.
  • POST if found matching index stored in locn
    found TRUE
  • if not found last stored in locn found
    FALSE
  • RETURN found ltbooleangt

13
Sequential Search AlgorithmThe Sentinel Search
  • listlast1 target
  • looker 1
  • loop (target not equal list(looker))
  • looker looker 1
  • if (looker lt last)
  • found true
  • locn looker
  • else
  • found false
  • locn last
  • return found
  • end SentinelSearch

Big-O(n)
14
Sequential Search AlgorithmThe Probability Search
  • algorithm ProbabilitySearch (val list ltarraygt,
    val last ltindexgt,
  • val target ltkeyTypegt, ref locn
    ltindexgt)
  • Locate the target in a list ordered by the
    probability of each element being the target
    most probable first, least probable last.
  • PRE list must contain at least one element.
  • last is index to last element in the list.
  • target contains the data to be located.
  • locn is address of index in calling
    algorithm.
  • POST if found matching index stored in locn
    found TRUE and element moved up in priority.
  • if not found last stored in locn found
    FALSE
  • RETURN found ltbooleangt

15
Sequential Search AlgorithmThe Probability Search
  • looker 1
  • loop (looker lt last AND target not equal
    listlooker)
  • looker looker 1
  • if (target listlooker)
  • found true
  • if (looker gt 1)
  • temp listlooker-1
  • listlooker-1 listlooker
  • listlooker temp
  • looker looker - 1
  • else
  • found false
  • locn looker
  • return found
  • end ProbabilitySearch

Big-O(n)
16
Sequential Search AlgorithmThe Ordered List
Search
  • If the list is small it can be more efficient to
    use a sequential search.
  • We can stop search loop, when the target becomes
    less than or equal to the testing element of the
    list.
  • algorithm OrderedListSearch (val list ltarraygt,
    val last ltindexgt,
  • val target ltkeyTypegt, ref locn
    ltindexgt)
  • Locate the target in a list ordered on target.
  • PRE list must contain at least one element.
  • last is index to last element in the list.
  • target contains the data to be located.
  • locn is address of index in calling
    algorithm.
  • POST if found matching index stored in locn
    found TRUE
  • if not found last stored in locn found
    FALSE
  • RETURN found ltbooleangt

17
Sequential Search AlgorithmThe Ordered List
Search
  • if (target lt listlast)
  • looker 1
  • loop (target gt listlooker)
  • looker looker 1
  • else
  • looker last
  • if (target equal listlooker
  • found true
  • else
  • found false
  • locn looker
  • return found
  • end OrderedListSearch

Big-O(n)
18
Sequential Search
  • The sequential search algorithm is very slow for
    the big lists.
  • Big-O(n)
  • If the list is ordered, we can use a more
    efficient algorithm called the binary search.

19
Binary Search
Test the data in the element at the middle of the
array.
If it is in the second half!
If it is in the first half!
Test the data in the element at the middle of
the array.
Test the data in the element at the middle of
the array.
If it is in the second half!
If it is in the second half!
If it is in the first half!
If it is in the first half!
. . .
. . .
. . .
. . .
20
mid(firstlast)/2
target gt mid first mid 1
target lt mid last mid -1
Figure 2-4
21
first becomes larger than last!
Figure 2-5
22
Binary Search Algorithm
  • algorithm BinarySearch(val list ltarraygt, val last
    ltindexgt,
  • val target ltkeyTypegt, ref locn
    ltindexgt)
  • Search an ordered list using binary search.
  • PRE list is orderedit must contain at least one
    element.
  • last is index to the largest element in the
    list.
  • target is the value of element being sought.
  • locn is address of index in calling
    algorithm.
  • POST Found locn assigned index to target
    element.
  • found set true.
  • Not found locn element below or above
    target.
  • found set false.
  • RETURN found ltbooleangt

23
Binary Search Algorithm
  • first 1
  • last end
  • loop (first lt last)
  • mid (first last)/2
  • if (target gt listmid)
  • first mid 1 (Look in upper half).
  • else if (target lt listmid
  • last mid 1 (Look it lower halt).
  • else
  • first last 1 (Found equal force exit)
  • locn mid
  • if (target equal listmid)
  • found true
  • else
  • found false
  • Return
  • end BinarySearch

Big-O(log2n)
24
Comparison of binary and sequential searches
Size Binary Sequential(Average) Sequential(Worst case)
16 4 8 16
50 6 25 50
256 8 128 256
1.000 10 500 1.000
10.000 14 5.000 10.000
100.000 17 50.000 100.000
1.000.000 20 500.000 1.000.000
25
Hashed List Searches
  • In an ideal search, we would know exactly where
    the data are and go directly there.
  • We use a hashing algorithm to transform the key
    into the index of array, that contains the data
    we need to locate.

26
  • It is a key-to-address transformation!

Figure 2-6
27
  • We call set of keys that hash to the same
    location in our list synonymns.
  • A collision is the event that occurs when a
    hashing algorithm produces an address for an
    insertion key and that address is already
    occupied.
  • Each calculation of an address and test for
    success is known as a probe.

Figure 2-7
28
Hashing Methods
Figure 2-8
29
Direct Hashing Method
  • The key is the address without any algorithmic
    manipulation.
  • The data structure must contain an element for
    every possible key.
  • It quarantees that there are no synonyms.
  • We can use direct hashing very limited!

30
Direct Hashing Method
Direct hashing of employee numbers.
Figure 2-9
31
Subtraction Hashing Method
  • The keys are consecutive and do not start from
    one.
  • Example
  • A company have 100 employees,
  • Employee numbers start from 1000 to 1100.

Ali Esin
1
Sema Metin
2
x1001
1
2
x1002
x 1000
100
x1100
99
Filiz Yilmaz
100
32
Modulo Division Hashing Method
  • The modulo-division method divides the key by the
    array size and uses remainder plus one for the
    address.
  • address key mod (listSize) 1
  • If a list size selected a prime number, that
    produces fewer collisions than other list sizes.

33
Modulo Division Hashing Method
121267 / 307 395 and remainder
2 hash(121267) 2 1 3
We have 300 employees, and the first prime
greater that 300 is 307!.
Figure 2-10
34
Digit Extraction Method
  • Selected digits are extracted from the key and
    used as the address.
  • Example
  • 379452 ? 394
  • 121267 ? 112
  • 378845 ? 388
  • 526842 ? 568

35
Midsquare Hashing Method
  • The key is squared and the address selected from
    the middle of the squared number.
  • The most obvious limitation of this method is the
    size of the key.
  • Example
  • 9452 9452 89340304 ? 3403 is the address.
  • Or
  • 379452 ? 379 379 143641 ? 364

36
Folding Hashing Method
Figure 2-11
37
Pseudorandom Hashing Method
  • The key is used as the seed in a pseudorandom
    number generator and resulting random number then
    scaled in to a possiple address range using
    modulo division.
  • Use a function such as y (ax b (mod m))1
  • x is the key value,
  • a is coefficient,
  • b is a constant.
  • m is the count of the element in the list.
  • y is the address.

38
Pseudorandom Hashing Method
  • y (ax b (mod m)) 1 ? y (17x 7 (mod
    307)) 1
  • x 121267 is the key value,
  • a 17
  • b 7
  • m 307
  • y ((( 17 121267) 7) mod 307) 1
  • y ((2061539 7) mod 307) 1
  • y 2061546 mod 307 1
  • y 41 1
  • y 42

39
Rotation Hashing Method
Rotation is often used in combination with
folding and psuedorandom hashing.
Figure 2-12
40
Collision Resolution Methods
All above methods of handling collision are
independent of the hashing algorithm.
Figure 2-13
41
Collision Resolution Concepts Load Factor
  • We define a full list, as a list in which all
    elements except one contain data.
  • Rule A hashed list should not be allowed to
    become more than 75 full!
  • the number of filled elements in the list
  • Load Factor ------------------------------------
    ------------------ x 100
  • total number of elements in the list
  • k
  • a --------- x 100 the
    number of elements
  • n

42
Collision Resolution Concepts Clustering
  • Some hashing algorithms tend to couse data to
    group within the list. This is known as
    clustering.
  • Clustering is created by collision.
  • If the list contains a high degree of clustering,
    then the number of probes to locate an element
    grows and the processing efficiency of the list
    is reduced.

43
Collision Resolution Concepts Clustering
  • Clustering types are
  • Primary clustering clustering around a home
    address in our list.
  • Secondary clustering the data are widely
    distributed across the whole list so that the
    list appears to be well distributed, however, the
    time to locate a requested element of data can
    become large.

44
Collision Resolution Methods Open Addressing
  • When a collision occurs, the home area addresses
    are searched for an open or unoccupied element
    where the new data can be placed.
  • We have four different method
  • Linear probe,
  • Quadratic probe,
  • Double hashing,
  • Key offset.

45
Open AddressingLinear Probe
  • When data cannot be stored in the home address,
    we resolve the collision by adding one to the
    current address.
  • Advantage
  • Simple implementation!
  • Data tend to remain near their home address.
  • Disadvantages
  • It tends to produce primary clustering.
  • The search algorithm may become more complex
    especially after data have been deleted!

.
46
Open AddressingLinear Probe
15532 / 307 50 and remainder 2 hash(15532) 2
1 3 New address 31 4
47
Open AddressingLinear Probe
Figure 2-14
48
Open AddressingQuadratic Probe
  • Clustering can be eliminated by adding a value
    other than one to the current address.
  • The increment is the collision probe number
    squared.
  • For the first probe 12
  • For the second probe 22
  • For the third collision probe 32 ...
  • Until we eighter find an empty element or we
    exhoust the possible elements.
  • We use the modulo of the quadratic sum for the
    new address.

49
Open Addressing Quadratic Probe
Increase by two Fore each probe!
Probe Number Collision Location ProbeProbe Increment New Address Increment Factor Next Increment
1 1 111 2 1 1
2 2 224 6 3 4
3 6 339 15 5 9
4 15 4416 31 7 16
5 31 5525 56 9 25
6 56 6636 92 11 36
7 92 7749 41 13 49



50
Open Addressing Double HashingPseudorandom
Collision Resolution
In this methot, rather than using an arithmetic
probe functions, the address is rehashed.
y ((ax c) mod listSize) 1 y ((3.2 (-1)
mod 307) 1 y 6
Figure 2-15
51
Open Addressing Double Hashing Key Offset
Collision Resolution
  • Key offset is another double hashing method and,
    produces different collision paths for different
    keys.
  • Key offset calculates the new address as a
    function of the old address and the key.

52
Open Addressing Double Hashing Key Offset
Collision Resolution
  • offSet key / listSize
  • address ((offSet old address) mod listSize)
    1
  • offSet 166702 / 307 543
  • 1. Probe address ((543 2) mod 307) 1
    239
  • 2. Probe address ((543 239) mod 307) 1
    169

Key Home Address Key Offset Probe 1 Probe 2
166702 2 543 239 169
572556 2 1865 26 50
67234 2 219 222 135
53
Collision Resolution Open Addressing Resolution
  • A major disadvantage to open addressing is that
    each collision resolution increases the
    probability of future collisions!

54
Collision ResolutionLinked List Resolution
Link head pointer.
A link list is an ordered collection of data in
which each element contains the location of the
next element.
Figure 2-16
55
Collision ResolutionBucket Hashing Resolution
Figure 2-17
56
Hw 2
  1. Create an array which includes the random integer
    100 numbers between 0 and 150.
  2. This should be an unordered list.
  3. Use Linear sentinel search algorithm and find the
    target value in the array.
  4. Use the Probability search algorithm and find the
    target value in the array.
  5. Create an ordered list which includes the 100
    numbers between 0 and 150.
  6. Use ordered list search algorithm and find the
    target value in the array.
  7. Use binary search algorithm and find the target
    value in the array.

Load your HW-2 to FTP site until 15 Mar. 06 at
1700.
57
Hw 2
  • Run the each search algorithm 10 times and report
    these performance values for each of them.
  • Write your comments about the result table.

  Sentinel Search Probability Search Ordered Search Binary Search
Number of Completed Searches        
Number of Successful Searches        
Avarage number of tests per search        
Write a Comment
User Comments (0)
About PowerShow.com