External Sorting Tool - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

External Sorting Tool

Description:

Record Move during output. 15.75s : 3.25s, improve 79.36% 12. Lessons ... World Record (1M, 100-byte record) 3.5s. My program: 25s. Main Thread Waits: 20s ... – PowerPoint PPT presentation

Number of Views:47
Avg rating:3.0/5.0
Slides: 16
Provided by: xinw2
Category:
Tags: external | sorting | tool

less

Transcript and Presenter's Notes

Title: External Sorting Tool


1
External Sorting Tool
  • Presenter Xinwei Li
  • COSC 6421
  • Instructor Parke Godfrey

2
Outline
  • Introduction
  • System Architecture
  • Software Demo
  • Lessons Learned
  • Future Work

3
Introduction
  • A classic problem in computer science!
  • ORDER BY clause
  • bulk loading B tree index.
  • Eliminating duplicate copies
  • Sort-merge join
  • Motivation
  • A prototype for Performance Evaluation
  • Flexible and extensible

4
Introduction (CONT.)
  • Partitions Internal sort
  • Merge Sort
  • Block IO, Double Buffering

INPUT 1
INPUT 1'
INPUT 2
OUTPUT
INPUT 2'
OUTPUT'
b
block size
Disk
INPUT k
Disk
INPUT k'
B main memory buffers, k-way merge
5
  • Architecture

6
External Sort
Internal Sort
Record
Record Manager
Run Manager
Block
Data Flow
Memory Pool
Block or Line
Record Parser
Run File Manager
Text File Manager
Disk
Block
7
System Architecture
  • Implementation
  • OS Linux
  • Language C
  • Some Techniques
  • Multithreads Semaphore, mutex ect
  • File IO Read / Write one block per call

8
Software Demo
  • name char10
  • idchar 10
  • ageinteger
  • memo Char2
  • integer
  • Schema File
  • Table Definition
  • Table File
  • Record one line
  • Fields Separated by tab
  • Benchmark 1M records, 30B / record

9
Performance
10
Lessons Learned
  • Block (4 pages) IO vs Single Page IO
  • 9.60s vs 11.58s, improve 17.01
  • Double buffering vs Single buffering
  • 9.60s vs 11.78s, improve 18.51
  • More buffers?
  • 3 buffers 10.48s
  • More Blocks?
  • 11 blocks 2 passes, 9.60s
  • 22 blocks 2 passes, 10.62s

11
Lessons Learned
  • Internal Sort (CPU-bound)
  • Comparisons
  • Record Moves
  • Reduce Record Move
  • Build Index (Pointers)
  • Pointers Move during sort
  • Record Move during output
  • 15.75s 3.25s, improve 79.36

12
Lessons Learned
  • External Sort
  • File Handles gt System Capacity
  • Create File up to 0.08s
  • Avoid one file per run
  • Total two files 1 read, 1 write
  • Switch them for new pass
  • Bookkeeping for run information in MEM

13
Future Work
  • Waiting time
  • Urgent tasks are blocked by unurgent tasks
  • Uneven density of IO tasks
  • Suggestion
  • Task Priority dynamic adjustment
  • gt 1 working threads multiple disk drives
  • Task orders some tasks must be done earlier

14
Future Work
  • 3.5s
  • World Record (1M, 100-byte record)

My program 25s Main Thread Waits 20s IO Thread
Waits lt 1s
15
  • Thank you !!!
Write a Comment
User Comments (0)
About PowerShow.com