Title: Understanding Operating Systems Sixth Edition
1Understanding Operating Systems Sixth Edition
- Chapter 12
- System Management
2Learning Objectives
- After completing this chapter, you should be
able to describe - The tradeoffs to be considered when attempting to
improve overall system performance - The roles of system measurement tools such as
positive and negative feedback loops - Two system monitoring techniques
- The fundamentals of patch management
- The importance of sound accounting practices by
system administrators
3System ManagementEvaluating an Operating System
- Most OSs were designed to work with a certain
piece of hardware, a category of processors, or
specific groups of users. - Although most evolved over time to operate
multiple systems, most still favor some users and
some computing environments over others. - To evaluate an OS, you need to know
- Its design goals and history
- How it communicates with its users
- How its resources are managed
- What tradeoffs were made to achieve its goals.
4System ManagementEvaluating an Operating System
- An Operating systems strengths and weaknesses
need to be weighed in relation to - Who will be using the operating system
- On what hardware
- For what purpose.
5System ManagementCooperation Among Components
- The performance of any one resource depends on
the performance of the other resources in the
system. - Memory management is intrinsically linked with
device management when memory is used to buffer
data between a very fast processor and slower
secondary storage devices.
6System ManagementCooperation Among Components
- If you managed an organizations computer system
and were allocated money to upgrade it, where
would you put the investment to best use? - A faster CPU
- Additional processors
- More disk drives
- A RAID system
- New file management software
- Or, if you bought a new system, what
characteristics would you look for that would
make it more efficient than the old one?
7System ManagementCooperation Among Components
- Any system improvement can be made only after
extensive analysis of - The needs of the systems resources
- Requirements
- Managers
- Users.
- Whenever changes are made to a system, often
youre trading one set of problems for another. - The key is to consider the performance of the
entire system and not just the individual
components.
8System ManagementRole of Memory Management
- Memory management schemes were discussed in
Chapters 2 and 3. - If you increase memory or change to another
memory allocation scheme, you must consider the
actual operating environment in which the system
will reside. - Theres a trade-off between memory use and CPU
overhead. - As the memory algorithms grow more complex, the
CPU overhead increases and overall performance
can suffer. - However, some OS perform remarkably better with
additional memory.
9System ManagementRole of Processor Management
- Processor management was covered in Chapters
4,5, and 6. - If you decide to implement a multiprogramming
system to increase your processors utilization - Youd have to remember that multiprogramming
requires a great deal of synchronization between - The Memory Manager
- The Processor Manager
- The I/O devices.
10System ManagementRole of Processor Management
- The tradeoff
- Better use of the CPU versus increased overhead
- Slower response time
- Decreased throughput.
- Problems to watch for
- A system could reach a saturation point if the
CPU is fully utilized but is allowed to accept
additional jobs. - This would result in higher overhead and less
time to run programs.
11System ManagementRole of Processor Management
- Problems to watch for
- Under heavy loads, the CPU time required to
manage I/O queues (which under normal
circumstances dont require a great deal of time)
could dramatically increase the time required to
run jobs. - With long queues forming at the channels, control
units, and I/O devices, the CPU could be idle
waiting for processes to finish their I/O. - Likewise, increasing the number of processors
necessarily increases the overhead required to
manage multiple jobs among multiple processors. - The payoff can be faster turnaround time.
12System ManagementRole of Device Management
- Device management, covered in Chapter 7, contains
several ways to improve I/O device utilization
including - Blocking, buffering, and rescheduling I/O
requests to optimize access time. - Tradeoffs
- Each of these options also increases CPU overhead
and uses additional memory space. - Blocking
- Reduces the number of physical I/O requests
(good). - But its the CPUs responsibility to block and
later. deblock the records, and thats overhead
(bad).
13System ManagementRole of Device Management
- Buffering
- Helps the CPU match slower I/O device speeds and
vice versa, but it requires memory space for the
buffers, either dedicated space or a temporarily
allocated section of main memory - This reduces the level of processing that can
take place. - Tradeoff
- Reduced multiprogramming versus better use of I/O
devices.
14System ManagementRole of Device Management
- Rescheduling requests
- A technique that can help optimize I/O times
- Its a queue reordering technique.
- Its also an overhead function so the speed of
both the CPU and the I/O device must be weighed
against the time it would take to execute the
reordering algorithm.
15System ManagementRole of Device Management
16System ManagementRole of Device Management
- Lets assume that a system consisting of CPU1 and
Disk Drive A has to access Track 1, Track 9,
Track 1, and then Track 9 and the arm is already
located at Track 1. - Without reordering, Drive A requires
approximately 35 ms for each access35 25
35 105 ms (Figure 12.2)
17System ManagementRole of Device Management
- Example without reordering
- CPU 1 and disk drive A
- Access track 1, track 9, track 1, track 9
- Arm already located at track 1
18System ManagementRole of Device Management
- After reordering (which requires 30 ms), the arm
can perform both accesses on Track 1 before
traveling, in 35 ms, to Track 9 for the other two
accesses, resulting in a speed nearly twice as
fast 30 35 65 ms (Figure 12.3)
19System ManagementRole of Device Management
- Example after reordering
- Arm performs both accesses on Track 1 before
traveling Track 9 (35 ms)
20System ManagementRole of Device Management
- However, when the same situation is faced by CPU
1 and the much faster Disk Drive C, we find the
disk will again begin at Track 1 and make all
four accesses in 15 ms (5 5 5), but when it
stops to reorder these accesses (which requires
30 ms), it takes 35 ms (30 5) to complete the
task. - Therefore, reordering requests not always
warranted.
21System ManagementRole of Device Management
- Remember that when the system is configured, the
reordering algorithm is either always on or
always off. - It cant be changed by the systems operator
without reconfiguration, so the initial setting,
on or off, must be determined by evaluating the
system based on average system performance.
22System ManagementRole of File Management
- The discussion of file management in Chapter 8
looked at how secondary storage allocation
schemes help the user organize and access the
files on the system. - Almost every factor discussed in that chapter can
affect overall system performance. - File organization is an important consideration.
- If a file is stored noncontiguously and has
several sections residing in widely separated
cylinders of a disk pack, sequentially accessing
all of its records could be a time-consuming task.
23System ManagementRole of File Management
- Such a case would suggest that the files should
be compacted (defragmented) so each section of
the file resides near the others. - Recompaction, however, takes CPU time and makes
the files unavailable to users while its being
done. - Another file management issue that could affect
retrieval time is the location of a volumes
directory. - Some systems read the directory into main memory
and hold it there until the user terminates the
session.
24System ManagementRole of File Management
- Looking at Figure 12.1
- The first retrieval would take 35 ms when the
system retrieves the directory for Drive A and
loads it into memory. - Every subsequent access would be performed at the
CPUs much faster speed without the need to
access the disk. - This poses a problem if the system crashes before
any modifications have been recorded permanently
in secondary storage. - The I/O time that was saved by not having to
access secondary storage every time the user
requested to see the directory would be negated
by not having current information in the users
directory.
25System ManagementRole of File Management
- Similarly, the location of a volumes directory
on the disk might make a significant difference
in the time it takes to access it. - If the directories are stored on the outermost
track, then the disk drive arm has to travel
farther to access each file than it would if the
directories were kept in the center tracks. - File management is closely related to the device
on which the files are stored. - Designers must consider both issues at the same
time when evaluating or modifying computer
systems.
26System ManagementRole of File Management
- Different schemes offer different flexibility,
but the trade-off for increased file flexibility
is increased CPU overhead.
27System ManagementRole of File Management
- File management related to device where files
stored
28System ManagementRole of Network Management
- The discussion if network management in Chapters
9 and 10 examined the impact of adding networking
capability to the OS and the overall effect on
the system performance. - The Network Manager
- Routinely synchronizes the load among remote
processors - Determines message priorities
- Tries to select the most efficient communication
paths over multiple data communication lines.
29System ManagementRole of Network Management
- When an application program requires data from a
disk drive at a different location, the Network
Manager attempts to provide this service
seamlessly. - When networked devices (printers, plotters, disk
drives) are required, the Network Manager has the
responsibility of allocating and deallocating the
required resources correctly. - In addition, the Network Manager allows a network
administrator to monitor the use of individual
computers and shared hardware, and ensure
compliance with software license agreements.
30System ManagementRole of Network Management
- The Network Manager also simplifies the process
of updating data files and programs on networked
computers by coordinating changes through a
communications server instead of making the
changes on each individual computer.
31System ManagementMeasuring System Performance
- Total system performance can be defined as the
efficiency with which a computer system meets its
goals how well it serves its users. - System efficiency is affected by three major
components - User programs
- Operating system programs
- Hardware
- In addition, system performance can be very
subjective and difficult to quantify. - How can anyone objectively gauge ease of use.
32Measuring System PerformanceMeasurement Tools
- Throughput
- A composite measure that indicates the
productivity of the system as a whole. - Usually measured under steady-state conditions
and reflects quantities such as - The number of jobs processed per day
- The number of online transactions handled per
hour. - Can also be a measure of the volume of work
handled by one unit of the computer system. - An isolation thats useful when analysts are
looking for bottlenecks in the system.
33Measuring System PerformanceMeasurement Tools
- Capacity
- Bottlenecks tend to develop when resources reach
their capacity (maximum throughput level). - Thrashing is a result of a saturated disk.
- Bottlenecks also occur when main memory has been
overcommitted and the level of multiprogramming
has reached a peak point. - The working sets for the active jobs cant be
kept in main memory, so the Memory Manager is
continuously swapping pages between main memory
and secondary storage.
34Measuring System PerformanceMeasurement Tools
- Capacity
- Throughput and capacity can be monitored by
either hardware or software. - Bottlenecks can be detected by monitoring the
queues forming at each resource. - When a queue starts to grow rapidly, this is an
indication that the arrival rate is greater than,
or close to, the service rate and the resource is
saturated. - Feedback Loop.
- Once a bottleneck is detected, the appropriate
action can be taken to resolve the problem.
35Measuring System PerformanceMeasurement Tools
- Response time (Online Interactive Users)
- An important measure of system performance.
- The interval required to process a users
request - From when the user presses the key to send the
message until the system indicates receipt of the
message. - Turnaround time (Batch Jobs)
- The time from the submission of a job until its
output is returned to the user. - Whether in an online or batch context, this
measure depends on both the workload being
handled by the system at the time of the request
and the type of job or request being submitted.
36Measuring System PerformanceMeasurement Tools
- Resource utilization
- A measure of how much each unit is contributing
to the overall operation. - Usually given as a percentage of time that a
resource is actually in use. - CPU busy 60 percent of the time
- The line printer busy 90 percent of the time
- Terminal usage?
- Seek mechanism on a disk?
- This data helps determine whether there is
balance among the units of a system or whether a
system is I/O-bound or CPU-bound.
37Measuring System PerformanceMeasurement Tools
- Availability
- Indicates the likelihood that a resource will be
ready when a user needs it. - For online Users, it may mean the probability
that a port is free or a terminal is available
when they attempt to log on. - for those already on the system, it may mean the
probability that one or several specific
resources will be ready when their programs make
requests. - A unit will be operational and not out of service
when a user needs it.
38Measuring System PerformanceMeasurement Tools
- Availability
- Is Influenced by two factors
- Mean time between failures (MTBF)
- The average time that a unit is operational
before it breaks down. - Mean time to repair (MTTR)
- The average time needed to fix a failed unit and
put it back in service. -
-
39Measuring System PerformanceMeasurement Tools
- If you buy a terminal with an MTBF of 4,000 hours
(Number given by the manufacturer), and you plan
to use it for 4 hours a day for 20 days a month
(or 80 hours per month), then you would expect it
to fail every 50 months (4,000/80). - Assuming the MTTR is 2 hours
- Availability (A) Availability
4000 0.9995 - 4000 2
40Measuring System PerformanceMeasurement Tools
- On average, this unit would be available 9,995
out of every 10,000 hours. - Five failures out of 10,000 uses.
41Measuring System PerformanceMeasurement Tools
- Reliability
- Similar to availability.
- Measures the probability that a unit will not
fail during a given time period (t) - Its a function of MTBF
- If you absolutely need to use the terminal for
the 10 minutes before your upcoming deadline. - With time expressed in hours, the units
reliability is
0.9999584
42Measuring System PerformanceMeasurement Tools
- Performance measures cant be taken in isolation
from the workload being handled by the system
unless youre simply fine-tuning a specific
portion of the system. - Overall system performance varies from time to
time, so its important to define the actual
working environment before making
generalizations.
43Measuring System PerformanceFeedback Loops
- To prevent the processor from spending more time
doing overhead than executing jobs, the OS must
continuously monitor the system and feed this
information to the Job Scheduler. - The Scheduler can either allow more jobs to enter
the system or prevent new jobs from entering
until some of the congestion has been relieved. - A Feedback Loop
- It can be either negative or positive.
44Measuring System PerformanceFeedback Loops
- Negative feedback loop
- Monitors the system and, when it becomes too
congested, signals the Job Scheduler to slow down
the arrival rate of the processes (Figure 12.4). - A negative feedback loop monitoring I/O devices
would inform the Device Manager that Printer 1
has too many jobs in its queue, causing the
Device Manager to direct all newly arriving jobs
to Printer 2, which isnt as busy. - The negative feedback helps stabilize the system
and keeps queue lengths close to expected mean
values.
45Measuring System PerformanceFeedback Loops
46Measuring System PerformanceFeedback Loops
- Positive feedback loop
- Monitors the system, and when the system becomes
underutilized, causes the arrival rate to
increase.(Figure 12.5). - Used in paged virtual memory systems
- Must be used cautiously because theyre more
difficult to implement than negative loops.
47Measuring System PerformanceFeedback Loops
48Measuring System PerformanceFeedback Loops
- Positive feedback loop
- How it works
- The positive feedback loop informs the Job
Scheduler that the CPU is underutilized. - The Scheduler allows more jobs to enter the
system to give more work to the CPU. - As more jobs enter, the amount of main memory
allocated to each job decreases. - If too many jobs are allowed to enter the system,
the result can be an increase in page faults - This may cause the CPU to deteriorate.
- The monitoring mechanisms for positive feedback
loops must be designed with great care.
49Measuring System PerformanceFeedback Loops
- Positive feedback loop
- An algorithm for a positive feedback loop should
monitor the effect of new arrivals in two places - The Processor Managers control of the CPU
- The Device Managers read and write operations.
- Both areas experience the most dynamic changes,
which can lead to unstable conditions. - Such an algorithm should check to see whether the
arrival produces the anticipated result and
whether system performance is actually improved.
50Measuring System PerformanceFeedback Loops
- Positive feedback loop
- If the arrival causes performance to deteriorate,
then the monitoring algorithm could cause the OS
to adjust its allocation strategies until a
stable mode of operation has been reached again.
51Measuring System PerformancePatch Management
- The systematic updating of the operating system
and other system software. - A patch is a piece of programming code that
replaces or changes code that make up the
software.
52Measuring System PerformancePatch Management
- There are three primary reasons for the emphasis
on software patches for sound system
administration - The need for vigilant security precautions
against constantly changing system threats - The need to assure system compliance with
government regulations regarding privacy and
financial accountability - The need to keep systems running at peak
efficiency.
53Measuring System PerformancePatch Management
- The task of keeping computing systems patched
correctly has become a challenge because of the
complexity of the entire system - (The OS, network, various platforms, remote
users) - The speed with which software vulnerabilities are
exploited by worms, viruses, and other system
assaults. - Overall responsibility lies with the CIO, the
CSO, the network administrator or individual
users. - It is only through rigorous patching that the
systems resources can reach top performance, and
its information can be best protected.
54Measuring System PerformancePatch Management
- Manual and automatic patch technologies
- Among top eight used by organizations
55Patch ManagementPatching Fundamentals
- While the installation of the patch is the most
public event, there are several essential steps
that take place before that happens - Identify the required patch
- Verify the patchs source and integrity
- Test the patch in a safe environment
- Deploy the patch throughout the system
- Audit the system to gauge the success of the
patch deployment.
56Patch ManagementPatching Fundamentals
- All changes to the OS or other critical system
must be undertaken in an environment that makes
regular system backups, and tests restoration
from backups.
57Patch ManagementPatching Fundamentals
- Patch availability
- Identify the criticality of the patch.
- If the patch is critical it should be applied
ASAP. - If the patch is not critical, you might choose to
delay installation until a regular patch cycle
begins. - Patch integrity
- Authentic patches will have a digital signature
or patch validation tool. - Before applying a patch, validate the digital
signature used by the vendor to send the new
software.
58Patch ManagementPatching Fundamentals
- Patch testing
- Before installation on a live system, test the
new patch on a sample system or an isolated
machine (development system) to verify its worth.
- Tests
- Test to see if the system restarts after the
patch is installed. - Check to see if the patched software performs its
assigned tasks. - The tested system should resemble the complexity
of the target system as closely as possible. - Test the contingency plans to uninstall the patch
and recover the old software if it becomes
necessary to do so.
59Patch ManagementPatching Fundamentals
- Patch deployment
- Single-user computer
- Install the software and reboot the computer.
- Multiplatform system (many users)
- Exceptionally complicated task
- Maintain an accurate inventory of all hardware
and software on those computers that need the
patch. - On a large network, this information can be
gleaned from network mapping software that
surveys the network and takes a detailed
inventory of the system. - Because its impossible to use the system during
the patching process, schedule the patch
deployment when system use is low (evenings or
weekends).
60Patch ManagementPatching Fundamentals
- Audit finished system
- Confirm that the resulting system meets
expectations - Verify that all computers are patched correctly
and perform fundamental tasks as expected. - Verify that no users had unexpected or
unauthorized versions of software that may not
accept the patch. - Verify that no users are left out of the
deployment. - This process should include documentation of the
changes made to the system and the success or
failure of each stage of the process. - Get feedback from the users to verify the
deployments success.
61Patch ManagementSoftware Options
- Patches can be installed manually, one at a time,
or via software thats written to perform the
task automatically. - Deployment software falls into two groups
- Those programs that require an agent (agent-based
software) - Those programs that do not (agentless software).
62Patch ManagementSoftware Options
- If the deployment software uses an agent
(software that assists in patch installation) - The agent software must be installed on every
target computer system before patches can be
deployed. - On a very large or dynamic system, this can be a
daunting task. - For administrators of large, complex networks,
agentless software may offer some time-saving
efficiencies.
63Patch ManagementTiming The Patch Cycle
- While critical system patches must be applied
immediately, less-critical patches can be
scheduled at the convenience of the systems
group. - These patch cycles can be based on calendar
events or vendor events. - The advantage of having routine patch cycles is
that they allow for thorough review of the patch
and testing cycles before deployment.
64Measuring System PerformanceSystem Monitoring
- Several techniques for measuring the performance
of a working system have been developed as
computer systems have evolved, which can be
implemented using either hardware or software
components. - Hardware monitors
- More expensive but they have the advantage of
having a minimum impact on the system because
theyre outside of it and attached
electronically. - Examples Hard-wired counters, clocks,
comparative elements.
65Measuring System PerformanceSystem Monitoring
- Software monitors
- Relatively inexpensive.
- Because they become part of the system, they can
distort the results of the analysis. - The software must use the resources its trying
to monitor. - Software tools must be developed for each
specific system, so its difficult to move them
from system to system. - In early systems, performance was measured simply
by timing the processing of specific
instructions. - The system analysis might have calculated the
number of times an ADD instruction could be done
in one second.
66Measuring System PerformanceSystem Monitoring
- They might have measured the processing time of a
typical set of instructions. - These measurements monitored only the CPU speed
because in those days the CPU was the most
important resource, so the remainder of the
system was ignored. - Today, system measurements must include the other
hardware units as well as the OS, compilers, and
other system software. - Measurements are made in a variety of ways.
- Some are made using real programs, usually
production programs that are used extensively by
the users of the system, which are run with
different configurations of CPUs, OS, and other
components.
67Measuring System PerformanceSystem Monitoring
- The results are called benchmarks and are useful
when comparing systems that have gone through
extensive changes. - Benchmarks are often used by vendors to
demonstrate to prospective clients the specific
advantages of a new CPU, OS, compiler, or piece
of hardware. - Benchmark results are highly dependent upon
- The systems workload
- The systems design and implementation
- The specific requirements of the applications
loaded on system.
68Measuring System PerformanceSystem Monitoring
- Performance data is usually obtained in a
rigorously controlled environment so results will
probably differ in real-life operation. - Benchmarks offer valuable comparison data.
- A place to begin a system evaluation.
- If its not possible to experiment with the
system itself, a simulation model can be used to
measure performance. - A simulation model is a computerized abstraction
of what is represented in reality. - The amount of detail built into the model is
dictated by time and money.
69Measuring System PerformanceAccounting
- The accounting function pays the bills and keeps
the system financially operable. - Most computer system resources are paid for by
the users. - In a single-user environment, its easy to
calculate the cost of the system. - In a multiuser environment, computer costs are
usually distributed among users based on how much
each one uses the systems resources.
70Measuring System PerformanceAccounting
- To do this distribution, the OS must be able to
- Set up user accounts
- Assign passwords
- Identify which resources are available to each
user - Define quotas for available resources (disk space
or maximum CPU time allowed per job).
71Measuring System PerformanceAccounting
- Pricing policies vary from system to system.
Typical measurements include some or all of the
following - Total amount of time spent between job submission
and completion. In interactive environments this
is the time from logon to logoff (connect time). - CPU time is the time spent by the processor
executing the job. - Main memory usage is represented in units of
time, bytes of storage, or bytes of storage
multiplied by units of time.
72Measuring System PerformanceAccounting
- Pricing policy measurements
- A job that requires 200K for 4 seconds followed
by 120K for 2 seconds could be billed for 6
seconds of main memory usage, or 320K of memory
usage or a combination of K/second of memory
usage.(200 4) (120 2) 1040K/second of
memory usage - Secondary storage used during program execution,
like main memory use, can be given in units of
time, or space or both. - Secondary storage used during billing period is
usually given in terms of the number of disk
tracks allocated.
73Measuring System PerformanceAccounting
- Pricing policy measurements
- Use of system software includes utility packages,
compilers, and/or databases. - Number of I/O operations is usually grouped by
device class (line printer, terminal, disks). - Time spent waiting for I/O completion
- Number of input records read usually grouped by
type of input device. - Number of output records printed usually grouped
by type of output device. - Number of page faults is reported in paging
systems.
74Measuring System PerformanceAccounting
- Pricing policies are sometimes used as a way to
achieve specific operational goals. - By varying the price of system services, users
can be convinced to distribute their workload to
the system managers advantage. - By offering reduced rates during off-hours, some
users might be persuaded to run long jobs in
batch mode inexpensively overnight instead of
interactively during peak hours. - Pricing incentives can also be used to encourage
users to access more plentiful and cheap
resources rather than those that are scarce and
expensive. - By putting a high price on printer output, users
might be encouraged to order a minimum of
printouts.
75Measuring System PerformanceAccounting
- Should the system give each user billing
information at the end of each job or at the end
of each online session? - Depends on the environment
- Some systems only give information on resource
usage. - Other systems also calculate the price of the
most costly items (CPU utilization, disk storage
use, supplies) at the end of each job. - This gives the user an up-to-date report of
expenses and calculates how much is left in the
users account.
76Measuring System PerformanceAccounting
- The advantage of maintaining billing records
online is that the status of each user can be
checked before the users job is allowed to enter
the READY state. - The disadvantage is overhead.
- When billing records are kept online and an
accounting program is kept active - Memory space is used
- CPU processing is increased.
- One compromise is to defer the accounting program
until off-hours, when the system is lightly
loaded.
77Summary
- The OS is more than the sum of its parts its
the orchestrated cooperation of every piece of
hardware and every piece of software. - When one part of the system is favored, its
often at the expense of the others. - The systems managers must make sure theyre
using the appropriate measurement tools and
techniques to verify the effectiveness of the
system before and after modification and then
evaluate the degree of improvement.