Cosequential Processing - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Cosequential Processing

Description:

To merge lists into a single sorted list (union) Make a single sorted list from many ... 20177 Cottage Cheese 5 392. 20179 Chicken Soup 6 32. 20231 T-bone 2 43 ... – PowerPoint PPT presentation

Number of Views:424
Avg rating:3.0/5.0
Slides: 43
Provided by: jims67
Category:

less

Transcript and Presenter's Notes

Title: Cosequential Processing


1
Cosequential Processing
  • Chapter 8

2
Cosequential Processing
  • Coordinated processing of two or more sequential
    lists
  • Goals
  • To merge lists into a single sorted list (union)
  • Make a single sorted list from many
  • To match records with the same keys
    (intersection)
  • Apply transactions to a master file
  • Find entries which exist in multiple lists

3
Cosequential Processing
  • Keys
  • Matching/merging may be by a single key or
    several.
  • Number of keys only affects compare operator, not
    sort strategy

4
Master Transaction File Processing
  • Common processing strategy on sequential files.
  • Common since historically sequential processing
    was the rule (tapes, cards)
  • Companies stored data in sequential files
  • Lists of transactions posted against these
    record periodically.

5
Master Transaction File Processing
  • Consider a grocery store
  • Record of inventory for each type of item stored
    in a large sequential file (master file)
  • As items sold, a the item number and quantity
    sold posted (written) as records to a transaction
    file
  • As trucks deliver new items, item numbers and
    quantities are entered into the transaction file.
  • As new types of items are added to inventory, or
    old items are discontinued, entries about this
    are placed in the transaction file.

6
Master Transaction File Processing
  • grocery store example

Master File
Transaction File
Item Item Name Type Quan 20231 Shoe Shine (br)
6 4 20231 Shoe Shine (bl) 6 1 20177 Cottage
Cheese 5 392 20179 Chicken Soup 6 32 20231
T-bone 2 43 ....
Item Trans Quan Item Name 20231 U
-2 20231 U 50 20379 U -5 20443 U
-4 20445 A 40 Corn Chips 20532 A 300
Butter 20534 D 20558 U 200 ....
U - Update A - Add D - Delete
7
Master Transaction File Processing
  • Periodically update master from transaction

New Master File
Transaction File
Update Operation
Old Master File
Update Messages
8
Master Transaction File Processing
  • Transactions are applied against master.
  • New master is created
  • Invalid Transactions result in Message
  • Important changes in Messages - audit trail
  • Transaction and master must be in sorted order.

9
Master Transaction File Processing
  • Processing Scheme
  • Read record Mast from old Master and Trans from
    Transaction
  • While more records in both files
  • if Add and Trans.ID lt Mast.ID, write Mast to new
    master
  • else If Trans.ID Mast.ID then
  • If UPDATE then update record and write to new
    master
  • If Delete then continue (no write)
  • else trasaction error
  • else write Mast to new master
  • Read next from transaction, next from old master
  • If more records in old master, write to new
    master
  • If more records in transaction, give errors

10
Merging
  • Merge two (or more) sorted lists into a single
    sorted list
  • May remove duplicates (union) or keep

Bill Gray Hillery Jenny Linda Mary Randy
Bill Cathy Fran Gray Hillery Jenny Kenny Linda Mar
y Pete Randy Sally Zeke
merge
Cathy Fran Kenny Pete Sally Zeke
11
Merging
  • Merge(List1,Max1,List2, Max2,Result)
  • int next1 0 next2 0 out 0
  • while Max1 gt next1 and Max2 gt next2
  • if (List1next1 gt List2next2)
  • Resultout List2next2
  • else
  • Resultout List1next1
  • if (List1 lt Max1)for ( next1 lt Max1
    Resultout List1next1)
  • if (List2 lt Max2)for ( next1 lt Max2
    Resultout List2next1)

12
Sorting
  • Small files
  • sort completely in memory
  • Called internal sorting.

13
Sorting
  • Larger files
  • may be too large to fit in memory simultaneously
  • require "external sorting"
  • Sorting using secondary devices

14
External Sorting
  • Criteria for evaluating external sorting
    algorithms
  • Different from internal sorts
  • Internal sort comparison criteria
  • Number of comparisons required
  • Number of swaps made
  • Memory needs
  • External sort comparison criteria
  • Dominated by I/O time
  • Minimize transfers between secondary storage and
    main memory

15
External Sorting
  • Two major external sorting methods
  • in situ - sort the file in place
  • use additional storage space

16
External Sorting
  • Characteristics of in situ sorting
  • uses less file space, thus larger files may be
    sorted.
  • if crash occurs during sort, file may be left in
    corrupt state
  • in site sorts may be done on direct-access files
    using standard internal type sorts.
  • direct-access required (may not be available)
  • performance of such algorithm's tends to be data
    sensitive

17
External Sorting
  • Consider a file with 1000 records, 120 bytes each
  • We have 25,000 bytes available for a buffer.
  • Solution?
  • read in 200 records at a time, sort internally
  • This results in 5 sorted files
  • merge the resulting sorted files into 1sorted file

18
Sort/Merge
  • A common non-in situ method is an algorithm
    called "sort-merge"
  • "safe" sorting technique
  • performance is guaranteed
  • requires only serial file access

19
Sort/Merge
Sort
Sort
Merge
Partition
Sort
Sort
20
Sort/Merge
  • Sort/Merge techniques have two stages
  • sort stage - sorted partitions are generated
  • Size depends on available memory
  • merge stage - sorted partitions are merged
    (repetitively if necessary)
  • Why might more then one merge phase be needed?

21
Basic Sort/Merge
  • initial partition size is 1
  • Merge begins immediately (no sort)
  • Smallest main memory use
  • requires only 2 buffers in memory.
  • File starts with N "sorted" files of size 1
  • Similar to internal merge/sort

22
Improving Sort/Merge
  • Increase buffer size
  • Partitions sorted (in memory) with little I/O
  • Larger partitions mean fewer (I/O intensive)
    merges needed
  • Take advantage of already sorted runs of data
  • Consider the "unsortedness" of the data

23
Sort/Merge
  • Producing sorted partitions
  • internal sorting
  • natural selection - (use already sorted runs)
  • replacement selection

24
Internal sorting
  • read M records (M determined by available memory)
  • sort them using internal sorting techniques
  • write back out, creating a partition of size M

25
Sort/Merge
  • Replacement selection (snowshovel)
  • files usually not totally out of order
  • take advantage of partial ordering in file
  • partition size varies with already existing
    ordering

26
Replacement selection (snowshovel)
  • Start with primary buffer of size N (snowshovel)
  • 1. Read in N records into buffer
  • 2. Output record with smallest key
  • 3. Replace with next record in file
  • 4. if this new record is smaller then the last
    record written, "freeze" (must wait for next
    partition)
  • 5. if unfrozen records remain, go to 2
  • 6. If all records frozen, unfreeze them all,
    start new partition, go to 2

27
Replacement selection (snowshovel)
  • if file is sorted or almost sorted, one pass may
    suffice for complete sort!
  • average partition length is 2N
  • Consider file with, N 4
  • 29 42 3 7 9 101 99 87 89 100 16 8 12 2 15 EOF

28
Natural Selection
  • Frozen records in the replacement scheme take up
    space and search time.
  • Natural, rather than freezing, writes these
    unused records to a fixed length secondary file
    (called reservoir)
  • partition creation terminates when reservoir
    full.
  • Next, buffer is refilled first with records from
    buffer, than records from file (if more needed)
  • expected partition length is 2.718N if reservoir
    and buffer same size - (about 30)

29
Natural Selection
  • Redo example with reservoir size 4
  • 29 42 3 7 9 101 99 87 89 100 16 8 12 2 15 EOF

30
Distribution and Merging
  • Merging
  • required to bring the sorted partitions together
    into a sorted whole
  • may require a series of merge phases, where
    shorter partitions are merged into larger
    partitions
  • More then one partitions per file
  • Not all partitions can be openned at once

31
MergingSingle phase
32
MergingMultiple phase
33
MergingMultiple Partitions / File
P5-8
P1-8
P9-12
P1-12
34
Merging
  • Major issues - minimizing overall I/O
  • Different length partitions
  • Spend time simply reading and writing from one
    file
  • Left over partitions
  • Spend time simply copying partitions

35
Distribution and Merging
  • Distribution
  • In order to merge, partitions must be
    distributed to files in a manner facilitaing
    the merge process.
  • If 1 partition per file, distribution is trivial
  • If gt1 partition per file, distribution should
    minimize I/O
  • Several partitions may be placed in each file

36
Balanced N-way merge
  • use as many files (or tapes) as the system can
    open at once
  • Distribute the partitions evenly amoung F/2 files
  • repetitively merge back and forth between one set
    of F/2 files and the other
  • Distribute the generated partitions evenly amoung
    the F/2 output files

37
Balanced 2-way merge
File 1
File 2
File 3
File 4
P5-8
File 1
File 2
P1-8
P9-12
File 3
File 4
P1-12
File 1
38
Balanced 2-way merge
  • Example 4 files, 700 records, 100 primary
    records can be sorted in memory

1-100 201-300 401-500 601-700
1-200 401-600
1-400
1-700
1-700
101-200 301-400 501-600
201-400 601-700
401-700
39
Balanced N-way merge
  • advantage
  • simple
  • disadvantage
  • wastes time if partition size different
  • spend time reading and write records without
    actually merging

40
Polyphase merging
  • Strategically distribute the partitions onto F
    files based on the Fibonacci Sequence
  • Algorithm
  • During each phase merge the F smallest files
    until the end of one file is reached.
  • After each phase at least one partition will now
    be empty - this file becomes available new place
    to merge into
  • Continue to merge until only one file exists

41
Polyphase merging
  • Consider Initially generate three files
  • 24 partitions, 20 partitions , and 13 partitions

42
Polyphase merging
  • advantages
  • No overhead from merging partitions of different
    sizes
  • disadvantages
  • complex management of files
  • must know partition sizes
  • still not completely optional - partition sizes
    not always maximal.
Write a Comment
User Comments (0)
About PowerShow.com