Two Case Studies in Predictable Application Scheduling Using RialtoNT - PowerPoint PPT Presentation

1 / 46
About This Presentation
Title:

Two Case Studies in Predictable Application Scheduling Using RialtoNT

Description:

study improvements possible using Rialto/NT CPU Reservation mechanism. 3 ... one-shot time reservation for specified amount of work ... support needed ... – PowerPoint PPT presentation

Number of Views:51
Avg rating:3.0/5.0
Slides: 47
Provided by: mikej3
Category:

less

Transcript and Presenter's Notes

Title: Two Case Studies in Predictable Application Scheduling Using RialtoNT


1
Two Case Studies in Predictable Application
Scheduling Using Rialto/NT
  • Michael B. Jones Microsoft Research
  • John Regehr University of Virginia
  • Stefan Saroiu University of Washington

2
Application Case Studies
  • Two applications needing predictable execution on
    Windows 2000
  • Soft Modem Driver
  • Digital Audio Player
  • The case studies
  • analyze behavior on normal Windows 2000
  • study improvements possible using Rialto/NT CPU
    Reservation mechanism

3
Consumer Real-Time
  • General-purpose Operating Systems,such as
    Windows 2000
  • maximize aggregate throughput
  • approximate fair sharing of the resources
  • Increasing use of time-dependent tasks
  • signal processing, audio, video
  • Need support for
  • predictable scheduling for independently
    developed applications
  • low latency responses
  • explicit resource allocation mechanisms

4
Rialto/NT Abstractions
  • Two real-time software abstractions
  • CPU Reservations ongoing reservation for at
    least X time units out of every Y units for a
    thread
  • Time Constraints one-shot time reservation for
    specified amount of work between start time and
    deadline
  • Case studies use only CPU Reservations

5
Rialto/NT Implementation
  • Rialto/NT developed on top of Windows 2000
    priority scheduler
  • Limitations
  • CPU Reservations must be integer multiples of
    milliseconds
  • Frequency of reservations must be power-of-two
    multiple of 1ms

6
First Case Study
  • Predictable Scheduling for a Soft Modem

7
Why Study Soft Modems ?
  • Signal Processing done on host CPU
  • requires predictable scheduling
  • requires low latency responses
  • While coexisting with other system activities
  • Soft Modem is a background real-time task
  • Successful in home computer market
  • Low cost
  • Easy to update software upgrade

8
Methodology
  • Instrumented Windows 2000 performance kernel
  • Logs predefined and custom events
  • Writes them to a memory buffer
  • Dumps buffers to disk at end of trace
  • Driver Software
  • No source for signal processing code
  • Measurement Environment
  • All experiments run with normal-priority spinning
    competitor thread
  • System
  • Windows 2000 Professional
  • Pentium II 450 MHz (uniprocessor)
  • 384 MB ECC SDRAM - 100 MB allocated to logging

9
Vendor Driver Version - Processing in Interrupt
(INT)
  • Operation of the modem
  • 1. DMA transfers between A/D and D/A and physical
    memory
  • 2. When enough data samples, the modem raises an
    interrupt
  • 3. Inside ISR, process incoming data and provide
    outgoing samples, before buffers exhausted
  • Uses input and output data buffers holding 512
    16-bit samples (1024 bytes/buffer)

10
Three Additional Versions
  • DPC Version (DPC)
  • The ISR queues a DPC
  • DPC performs signal processing
  • Thread Version (THR)
  • The ISR queues a DPC that signals a thread via a
    semaphore
  • Thread performs signal processing
  • Experimented with several different priorities
  • Rialto/NT Version (RES)
  • Same as THR, but thread scheduled using Rialto/NT
    real-time periodic CPU Reservation

11
Interrupt Rate
3 different phases, interrupts very regular
Falls within PC 99 recommended interrupt rates of
3-16ms
12
Elapsed Times in ISR (INT)
1.8 ms with repeatable worst case of 3.3 ms
  • PC 99 recommends maximum time during which a
    driver-based modem disables interrupts should not
    exceed 100 µs

13
CPU Utilization
14.7 sustained load on 450MHz Pentium II
14
Elapsed Times in ISR (DPC)
ISR times now small, typically lt 6µs
15
Elapsed Times in Queued DPC
But now long DPC times 1.8ms avg., 3.3 max
(same as elapsed times in ISR for INT)
  • PC 99 recommends that the total execution time
    required for all queued DPCs should not exceed
    500 µs

16
Samples Pending to be Processed(INT THR 24)
Small relative to 512 sample buffer size
17
Samples Pending to be Processed (THR 8)
Unsurprisingly, contention kills modem
18
Latency Results
  • Set the multimedia timers to fire once every
    millisecond
  • Register a routine to be called every millisecond
  • Routine does very little work
  • Stores cycle counter value and sleeps again
  • Histograms show differences between recorded
    times and ideal times

19
Coexisting Thread Latencies (Control Case - No
Modem)
Maximum 1978µs between wakeups
20
Coexisting Thread Latencies (INT)
Maximum 5313µs between wakeups
21
Coexisting Thread Latencies (DPC)
Maximum 4396µs between wakeups
22
Coexisting Thread Latencies (THR 24)
Maximum 2239µs between wakeups
23
What Have We Learned So Far?
  • Signal processing in the context of the interrupt
    handler is
  • unnecessary
  • detrimental to the latencies and predictability
    of coexisting activities
  • Vendor choice understandable
  • For any priority there is a potentially unbounded
    delay between the interrupt and the thread
    running
  • In practice
  • Delays are reasonable for well-configured systems
    Intel OSDI 99
  • Using interrupts extreme form of priority
    inflation

24
Two Possible Solutions
  • Rate Monotonic Analysis determine the right
    priority assignments among all threads - two
    problems
  • Assumes cooperative priority assignment among all
    threads - unrealistic
  • Working priority assignment dependent upon timing
    requirements of all threads
  • Changes in application mix may require changes in
    priority assignments
  • Use a time-based real-time scheduler
  • Such as Rialto/NT

25
Samples Pending to be Processed (RES 2ms/8ms
25)
Fits well within 512-sample buffer size
26
Coexisting Thread Latencies (RES 2ms/8ms 25)
Maximum 1971µs between wakeups
27
File Transfer Times
Results for 10 copies of 200,000 bytes each
For 1/8, 2/15, 3/17, 4/17, 7/20 no test passed
28
Modem Reservation Ranges
Sensitivity to both percentage and gaps
If period lt 12.5ms, must get 14.7 to work If
period gt 12.5ms, (period amount) gt 12.5ms must
also hold
29
Soft Modem Conclusions
  • Signal Processing in interrupt context is
  • Unnecessary
  • Detrimental to the predictability and latencies
    of the coexisting activities
  • The DPC version has similar problems
  • Threads help alleviate these problems
  • Modem runs well with real-time priorities and
    non-real-time competition
  • However modem threads may interfere with other
    threads
  • Real-time scheduler allows
  • Control over modems degree of interference with
    other time-sensitive activities
  • Performance isolation for threads using
    reservations

30
Industry Perspective
  • Vendor did try their own THR version
  • Worked fine during normal load
  • However, modem was starved when
  • Copying data between two IDE devices
  • Using USB scanner (Intel 440BX chipset) that
    turned off interrupts for 30-50 ms
  • Therefore they shipped the INT version
  • Vendor is willing to be a good citizen only if
    ensured that others would be as well
  • Systematic latency timing verification of
    components is needed to enforce good behavior

31
Soft DSL is Coming
  • More demanding than soft modems
  • 4ms processing period
  • G.lite
  • 1.531Mbps downstream and 512Kbps upstream
  • 25 of a 600 MHz Pentium III
  • Full rate DSL
  • 3.062Mbps downstream and 512Kbps upstream
  • Nearly 50 of a 600 MHz Pentium III
  • Soft Bluetooth period 312.5µs

32
Further Soft Modem Studies
  • Software-based Digital Subscriber Line (SoftDSL)
    studies
  • Multiple Soft Modems within the same machine
  • Similar studies on multiprocessors

33
Second Case Study
  • Predictable Scheduling for Digital Audio

34
Methodology
  • Empirically reverse-engineer thread requirements
    in a complex, legacy soft real-time application
  • without use of source code
  • Assign CPU reservations to threads
  • without modifying the application
  • Measure application behavior during contention

35
Windows Media Player
  • Default player for mp3, wav, avi, mpeg
  • Experimental method
  • Modelled contention using spinning thread at
    various priorities
  • Gave CPU Reservations to media player threads
  • Played an mp3 song
  • Listened for glitches
  • Used instrumented kernel to detect buffer
    under-runs

36
Media Player Thread Structure (Simplified)
() Received CPU Reservations in some experiments.
37
MP3 Playback w/o Contention
  • Kmixer thread (top) runs every 10ms
  • MP3 decoder (4th line) runs every 100ms
  • Works fine

38
Starvation Caused by Competing Thread _at_ Priority
10
  • Media Player runs only when NT priority inversion
    avoidance logic kicks in

39
Media Player Reservation
  • 1ms every 16ms reserved for decoder thread
  • Competing with priority 10 thread
  • Works fine

40
Priority Inversion Caused by Competing Thread
x
x
  • Competitor thread (priority 9) preempts MP3
    decoder while holding Kmixer buffer lock
  • Kmixer misses next two time slots (x)
  • Starves, causes audio glitch
  • Fix raise decoder priority before grabbing lock

41
Media Player Deadlock
  • Circular wait among Media Player threads
  • Deadlock broken by a timeout
  • Fix file a bug report

42
Media Player Results
  • Expected
  • In the presence of contention, the Windows
    priority scheduler allows real-time apps to
    starve
  • This can be fixed by giving real-time threads CPU
    Reservation
  • Unexpected
  • Competitor thread changes sequencing, exposes
    races in Media Player
  • Hard to write correct programs with many threads
    mutexes
  • Fixed using priority ceiling emulation

43
Implications of Results
  • Periods of threads in complex legacy apps can be
    reverse engineered
  • Amounts are platform-dependent and are harder
  • Next step to store application requirements and
    use middleware to automatically assign
    reservations
  • No application support needed
  • Potentially a way around the chicken/egg problem
    of using reservations in a world of legacy OSs
    and applications

44
Possible ContinuedMedia Experiments
  • Study software DVD player
  • CPU intensive and time sensitive

45
Overall Conclusions
  • Status quo insufficient
  • Applications either inflate their priorities
  • as did the soft modem driver
  • or are at the mercy of applications that may be
    run at higher priorities
  • as is the case with the digital audio player
  • CPU Reservations solve this problem
  • by allowing applications to reliably obtain the
    time they need
  • while allowing other applications to do the same

46
For More Information
  • See Mike Jones (mbj_at_microsoft.com)
  • http//research.microsoft.com/mbj/
  • or John Regehr (regehr_at_cs.utah.edu)
  • http//www.cs.utah.edu/regehr/
  • or Stefan Saroiu (tzoompy_at_cs.washington.edu)
  • http//www.cs.washington.edu/homes/tzoompy/
  • Related papers at Mikes web site
Write a Comment
User Comments (0)
About PowerShow.com