INSIDE THE LINUX KERNEL

1 / 40
About This Presentation
Title:

INSIDE THE LINUX KERNEL

Description:

when in User Mode, some parts of RAM can't be addressed, some instructions can't ... besides running in Kernel Mode, kernels have three other peculiarities: ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 41
Provided by: liberauniv

less

Transcript and Presenter's Notes

Title: INSIDE THE LINUX KERNEL


1
INSIDE THE LINUX KERNEL
UnixForum Chicago - March 8, 2001
Daniel P. Bovet University of Rome "Tor Vergata"
2
WHAT IS A KERNEL? (1/2)
  • its a program that runs in Kernel Mode
  • CPUs run either in Kernel Mode or in User Mode
  • when in User Mode, some parts of RAM cant be
    addressed, some instructions cant be executed,
    and I/O ports cant be accessed
  • when in Kernel Mode, no restriction is put on
    the program

3
WHAT IS A KERNEL? (2/2)
  • besides running in Kernel Mode, kernels have
    three other peculiarities
  • large size (millions of machine language
    instructions)
  • machine dependency (some parts of the kernel
    must be coded in Assembly language)
  • loading into RAM at boot time in a rather
    primitive way

4
ENTERING THE KERNEL PROGRAM (1/2)
  • when the CPU is running in User Mode

5
ENTERING THE KERNEL PROGRAM (2/2)
  • when the CPU is running in Kernel Mode

6
NESTED KERNEL INVOCATIONS
  • some similarity with nested function calls
  • different because events causing kernel
    invocations are not (usually) related to the
    running program

7
KERNEL ENTRY POINTS
software interrupt ---gt
I/O device requires attention ---gt
time interval elapsed ---gt
hardware failure ---gt
faulty instruction ---gt
8
IS AN INSTRUCTION REALLY FAULTY?
  • faulty instructions may occur for two distinct
    reasons
  • programming error
  • deferred allocation of some kind of resource
  • the kernel must be able to identify the reason
    that caused the exception

9
EXCEPTIONS RELATED TO DEFERRED ALLOCATION
  • two cases of deferred allocation of resources in
    Linux
  • page frames (demand paging, Copy On Write)
  • floating point registers

10
WHY IS A KERNEL SO COMPLEX?
  • large program with many entry points
  • must offer disk caching to lower average disk
    access time
  • must support run nested kernel invocations --gt
    must run with the interrupts enabled most
    of the time
  • must be updated quite frequently to support new
    hardware circuits and devices

11
HW CONCURRENCY (1/2)
  • the I/O APIC polls the devices and issues
    interrupts
  • no new interrupt can be issued until the CPU
    acknowledges the previous one
  • good kernels run with interrupts enabled most of
    the time

12
HW CONCURRENCY (2/2)
  • Symmetrical MultiProcessor architectures (SMP)
    include two ore more CPUs
  • SMP kernels must be able to execute concurrently
    on available CPUs
  • one service routine related to networking runs
    on a CPU while another routine related to file
    system runs concurrently on another CPU

13
LIMITING KERNEL SIZE
  • try to distribute kernel functions in smaller
    programs that can be linked separately
  • two approaches microkernels and modules
  • Linux prefers modules for reasons of efficiency

14
MICROKERNELS
  • only a few functions such as process scheduling,
    and interprocess communication are included into
    the microkernel
  • other kernel functions such as memory
    allocation, file system handling, and device
    drivers are implemented as system processes
    running in User Mode
  • microkernels introduce a lot of interprocess
    communication

15
MODULES (1/2)
  • modules are object files containing kernel
    functions that are linked dynamically to the
    kernel
  • Linux offers an excellent support for
    implementing and handling modules

16
MODULES (2/2)
thanks to the kernel symbol table, it is possible
to defer linking of an object module
17
MODULES AND DISTRIBUTIONS
  • modern computer architectures based on PCI
    busses support autoprobe of installed I/O devices
    while booting the system
  • recent Linux distributions put all non-critical
    I/O drivers into modules
  • at boot time, only the I/O modules of identified
    I/O devices are dynamically linked to the kernel

18
SUPPORT TO CLIENT/SERVER APPLICATIONS
  • scenario many tasks executing concurrently on a
    common address space (for instance, a web server
    handling thousands of requests per second)
  • problem implementing each client request as a
    new process causes a lot of overhead
  • process creation/elimination are time-consuming
    kernel functions

19
THE THREAD SOLUTION
  • introduce a new kernel object called thread
  • each process includes one or more threads
  • all threads associated with a given process
    share the same address space
  • CPU scheduling is done at the thread level
    (Windows NT)
  • thread switching is more efficient than process
    switching

20
THE CLONE SOLUTION
  • introduce groups of lightweight processes called
    clones that share a common address space, opened
    files, signals, etc.
  • CPU scheduling is done at the process level in a
    standard way
  • clones have been invented by Linux
  • the npmt_pthread or the dexter module used by
    the Linux version of Apache 2.0 are both based on
    clones

21
LINUX PEARLS
  • we selected in a rather arbitrary way a few
    pearls related to two distinct kernel design
    areas
  • clever design choices
  • efficient coding

22
CLEVER DESIGN CHOICES
  • isolate the architecture-dependent code
  • rely on the VFS abstraction
  • avoid over-designing

23
ISOLATE THE ARCHITECTURE-DEPENDENT CODE (1/2)
  • Linux source code includes two
    architecture-dependent directories
    /usr/src/linux/arch and /usr/src/linux/include

24
ISOLATE THE ARCHITECTURE-DEPENDENT CODE (2/2)
  • the schedule() function invokes the switch_to()
    Assembly language function to perform process
    switching
  • the code for switch_to() is stored in the
    include/asm/system.h file
  • depending on the target system, the asm symbolic
    link is set to asm-i386, asm-s390, etc.

25
RELY ON THE VFS ABSTRACTION
  • VFS is an abstraction for representing several
    kinds of information containers (IC) in a common
    way
  • standard operations on ICs open(), close(),
    seek(), ioctl(), read(), write()
  • VFS associates a logical inode with each opened
    IC

26
EXAMPLES OF ICs
  • files stored in a disk-based filesystem
  • files stored in a network filesystem
  • disk partitions
  • kernel data structures (/proc filesystem)
  • RAM content (/dev/mem)
  • RAM disk (/dev/ram0)
  • serial port (/dev/ttyS0)

27
AVOID OVER-DESIGNING
  • Linux scheduler is simple and works for most
    applications
  • no attempt to transform Linux into a real-time
    system

28
A GENERAL-PURPOSE SCHEDULER
  • the scheduler of the System V Release 4 provides
    a set of class-independent routines that
    implement common services
  • object-oriented approach based on scheduling
    class the scheduler represents an abstract base
    class, and each scheduling class acts as a
    subclass

29
A HEATED DISCUSSION
  • If the Linux development community is not
    responsive to the end user community, refusing to
    incorporate necessary functionality on the basis
    of aesthetics, then that community will abandon
    Linux in favor of something else. Is that really
    what you want?
  • Yes - If it turns into a pile of shit they'll
    abandon it even faster. I'd rather have a decent
    OS that works and does the right thing for most
    people than a single OS that tries to do
    everything and does nothing right (Alan Cox)

30
EXAMPLES OF EFFICIENT CODING
  • retrieving the process descriptor of the running
    process
  • handling dynamic timers
  • catching invalid addresses passed as system call
    parameters

31
RETRIEVING THE PROCESS DESCRIPTOR OF THE RUNNING
PROCESS (1/3)
  • classic solution introduce an array
    currentNCPU whose components point to the
    process descriptors of the processes running on
    the CPUs
  • clever solution store the process Kernel Mode
    stack and the process descriptor into contiguous
    addresses so that the value of the CPU stack
    pointer register (esp register) is linked to that
    of the process descriptor

32
DESCRIPTOR OF THE RUNNING PROCESS (2/3)
  • Kernel Mode stack process descriptor are
    stored in 2 contiguous page frames (8 KB)

33
DESCRIPTOR OF THE RUNNING PROCESS (3/3)
Mask
34
HANDLING DYNAMIC TIMERS (1/3)
  • I/O drivers and user applications may create
    hundreds of timers
  • find an efficient way to check at each timer
    interrupt whether at least one timer has expired
  • trivial solution maintain a list of timers
    ordered by increasing decaying times and start
    checking from the first element of the list

35
HANDLING DYNAMIC TIMERS (2/3)
  • clever solution (timing wheel) use percolation
    and maintain strict ordering only for the next
    256 ticks (in Linux- i386, one tick 10 ms)
  • use several lists of timers

36
HANDLING DYNAMIC TIMERS (3/3)
0 1 2 ?? 255
0 1 2 ?? 63
tv2
tv1
index incremented by 1 once every tick
index incremented by 1 once every 256 ticks
when tv1 becomes empty, it is replenished
by emptying one slot of tv2, and so forth
37
CATCHING INVALID ADDRESSES (1/4)
  • many system calls require one or more addresses
    specified as parameters
  • invalid addresses passed as parameters should
    not cause a system crash
  • classic solution perform a preliminary check
    before servicing the system call
  • clever solution defer checking until an
    exception caused by the invalid occurs in Kernel
    Mode

38
CATCHING INVALID ADDRESSES (2/4)
  • deferred checking is more efficient since system
    calls are issued most of the times with correct
    parameters
  • if an addressing error occurs in Kernel Mode,
    the kernel must be able to distinguish whether it
    is caused by a faulty process or whether by a
    kernel bug
  • in the first case, the kernel sends a SIGSEGV
    signal to the faulty process

39
CATCHING INVALID ADDRESSES (3/4)
  • clever idea force the kernel to use always the
    same group of functions when copying data to or
    from the process address space
  • if an addressing error occurs while doing that,
    the CPU will signal the address of the
    instruction that contained an invalid address
    operand

40
CATCHING INVALID ADDRESSES (4/4)
  • the kernel knows from the address of the faulty
    instruction that it belongs to one of the
    functions used to access data in the process
    address space
  • it can then execute some kind of fixup code
    as a result, the system call returns an error code
Write a Comment
User Comments (0)