ecs150 Spring 2006: Operating System - PowerPoint PPT Presentation

About This Presentation
Title:

ecs150 Spring 2006: Operating System

Description:

The main data structure for network processing in the kernel. Why can't we use 'kernel memory management' facilities such as kernel malloc ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 72
Provided by: astpr
Category:

less

Transcript and Presenter's Notes

Title: ecs150 Spring 2006: Operating System


1
ecs150 Spring 2006Operating System7
mbuf(Chapter 11)
  • Dr. S. Felix Wu
  • Computer Science Department
  • University of California, Davis
  • http//www.cs.ucdavis.edu/wu/
  • sfelixwu_at_gmail.com

2
IPC
  • Uniform communication for distributed processes
  • socket network programming
  • operating system kernel issues
  • Semaphores, messages queues, and shared memory
    for local processes

3
Socketan IPC Abstraction Layer
4
MbufsMemory Buffers
  • The main data structure for network processing in
    the kernel
  • Why cant we use kernel memory management
    facilities such as kernel malloc (power of 2
    alike), page, or VM objects directly?

5
Packet
  • EtherNet or 802.11 header
  • IP header
  • IPsec header
  • Transport headers (TCP/UDP/)
  • SSL header
  • Others???

6
PropertiesNetwork Packet Processing
  • Variable sizes
  • Prepend or remove
  • Fragment/divide or defragment/combine
  • ? can we avoid COPYING as much as possible???
  • Queue
  • Parallel processing for high speed
  • E.g., Juniper routers are running FreeBSD

7
4
sys/mbuf.h kern/kern_mbuf.c kern/ipc_mbuf.c kern/i
pc_mbuf2.c
4
2
256 bytes
8
the same packet
next packet
M_EXT M_PKTHDR M_EOR M_BCAST M_MCAST
9
define M_EXT 0x0001 define M_PKTHDR 0x0002
define M_EOR 0x0004 define M_RDONLY
0x0008 define M_PROTO1 0x0010 define
M_PROTO2 0x0020 define M_PROTO3 0x0040
define M_PROTO4 0x0080 define M_PROTO5
0x0100 define M_SKIP_FIREWALL 0x4000 define
M_FREELIST 0x8000 define M_BCAST 0x0200
define M_MCAST 0x0400 define M_FRAG
0x0800 define M_FIRSTFRAG 0x1000 define
M_LASTFRAG 0x2000
10
struct mbuf struct m_hdr m_hdr union
struct struct pkthdr MH_pkthdr
union struct m_ext MH_ext char
MH_databufMHLEN MH_dat MH
char M_databufMLEN M_dat
11
struct mbuf struct m_hdr m_hdr union
struct struct pkthdr MH_pkthdr
union struct m_ext MH_ext char
MH_databufMHLEN MH_dat MH
char M_databufMLEN M_dat
12
struct mbuf struct m_hdr m_hdr union
struct struct pkthdr MH_pkthdr
union struct m_ext MH_ext char
MH_databufMHLEN MH_dat MH
char M_databufMLEN M_dat
13
24 bytes
14
IPsec_IN_DONE IPsec_OUT_DONE IPsec_IN_CRYPTO_DONE
IPsec_OUT_CRYPTO_DONE
15
mbuf
  • Current 256
  • Old 128 (shown in the following slides)

16
(No Transcript)
17
A Typical UDP Packet
18
(No Transcript)
19
m_devget When an IP packet comes in
20
(No Transcript)
21
(No Transcript)
22
mtod dtom
  • mbuf ptr ?? data region
  • e.g. struct ip
  • mtod?
  • dtom?

23
mtod dtom
  • mbuf ptr ?? data region
  • e.g. struct ip
  • mtod?
  • dtom?
  • define dtom(x) (struct mbuf ) ((int) (x)
    (MSIZE -1)))

24
mtod dtom
  • mbuf ptr ?? data region
  • e.g. struct ip
  • mtod?
  • dtom?
  • define dtom(x) (struct mbuf )((int
    )(x)(MSIZE -1)))

25
netstat -m
  • Check for mbuf statistics

26
mbuf
  • IP input/output/forward
  • IPsec
  • IP fragmentation/defragmentation
  • Device ??IP ??Socket

27
Memory Management for IPC
  • Why do we need something like MBUF?

28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
I/O Architecture
IRQ
Control bus
Device Controller
I/O Device
CPU
Memory
Data and I/O buses
Initialization Input Output Configuration Interrup
t
Internal buffer
32
Direct Memory Access
  • Used to avoid programmed I/O for large data
    movement
  • Requires DMA controller
  • Bypasses CPU to transfer data directly between
    I/O device and memory

33
DMA Requests
  • Disk address to start copying
  • Destination memory address
  • Number of bytes to copy

34
(No Transcript)
35
Is DMA a good idea?
  • CPU is a lot faster
  • Controllers/Devices have larger internal buffer
  • DMA might be much slower than CPU
  • Controllers become more and more intelligent
  • USB doesnt have DMA.

36
Network Processor
37
File System Mounting
  • A file system must be mounted before it can be
    accessed.
  • A unmounted file system is mounted at a mount
    point.

38
(No Transcript)
39
Mount Point
40
logical disks
fs0 /dev/hd0a
/
usr
sys
dev
etc
bin
mount -t ufs /dev/hd0e /usr
/
local
adm
home
lib
bin
fs1 /dev/hd0e
mount -t nfs 152.1.23.12/export/cdrom /mnt/cdrom
41
Distributed FS
  • Distributed File System
  • NFS (Network File System)
  • AFS (Andrew File System)
  • CODA

42
Distributed FS
ftp.cs.ucdavis.edu fs0 /dev/hd0a
/
usr
sys
dev
etc
bin
/
local
adm
home
lib
bin
Server.yahoo.com fs0 /dev/hd0e
43
Distributed File System
  • Transparency and Location Independence
  • Reliability and Crash Recovery
  • Scalability and Efficiency
  • Correctness and Consistency
  • Security and Safety

44
Correctness
  • One-copy Unix Semantics??

45
Correctness
  • One-copy Unix Semantics
  • every modification to every byte of a file has to
    be immediately and permanently visible to every
    client.

46
Correctness
  • One-copy Unix Semantics
  • every modification to every byte of a file has to
    be immediately and permanently visible to every
    client.
  • Conceptually FS sequent access
  • Make sense in a local file system
  • Single processor versus shared memory
  • Is this necessary?

47
DFS Architecture
  • Server
  • storage for the distributed/shared files.
  • provides an access interface for the clients.
  • Client
  • consumer of the files.
  • runs applications in a distributed environment.

open close read write opendir stat readdir
applications
48
NFS (SUN, 1985)
  • Based on RPC (Remote Procedure Call) and XDR
    (Extended Data Representation)
  • Server maintains no state
  • a READ on the server opens, seeks, reads, and
    closes
  • a WRITE is similar, but the buffer is flushed to
    disk before closing
  • Server crash client continues to try until
    server reboots no loss
  • Client crashes client must rebuild its own state
    no effect on server

49
RPC - XDR
  • RPC Standard protocol for calling procedures in
    another machine
  • Procedure is packaged with authorization and
    admin info
  • XDR standard format for data, because
    manufacturers of computers cannot agree on byte
    ordering.

50
rpcgen
RPC program
rpcgen
RPC client.c
RPC server.c
RPC.h
51
NFS Operations
  • Every operation is independent server opens file
    for every operation
  • File identified by handle -- no state information
    retained by server
  • client maintains mount table, v-node, offset in
    file table etc.

What do these imply???
52
Client computer
Server computer
Application
Application
program
program
Virtual file system
Virtual file system
UNIX
UNIX
NFS
NFS
file
file
client
server
system
system
mount t nfs home.yahoo.com/pub/linux /mnt/linux

53
Final 06/15/2006 810 am
  • 1062 Bainer
  • Midterm plus
  • 5.15.8, 5.115.12
  • 6.1, 6.56.7
  • 8.18.9
  • 9.19.3
  • 11.3
  • Notes/PPT, Homeworks, Brainstorming

54
State-ful vs. State-less
  • A server is fully aware of its clients
  • does the client have the newest copy?
  • what is the offset of an opened file?
  • a session between a client and a server!
  • A server is completely unaware of its clients
  • memory-less I do not remember you!!
  • Just tell me what you want to get (and where).
  • I am not responsible for your offset values (the
    client needs to maintain the state).

55
The State
open read stat lseek
applications
open read stat lseek
offset
applications
56
Unix file semantics
  • NFS
  • open a file with read-write mode
  • later, the servers copy becomes read-only mode
  • now, the application tries to write it!!

57
Problems with NFS
  • Performance not scaleable
  • maybe it is OK for a local office.
  • will be horrible with large scale systems.

58
  • Similar to UNIX file caching for local files
  • pages (blocks) from disk are held in a main
    memory buffer cache until the space is required
    for newer pages. Read-ahead and delayed-write
    optimisations.
  • For local files, writes are deferred to next sync
    event (30 second intervals)
  • Works well in local context, where files are
    always accessed through the local cache, but in
    the remote case it doesn't offer necessary
    synchronization guarantees to clients.
  • NFS v3 servers offers two strategies for updating
    the disk
  • write-through - altered pages are written to disk
    as soon as they are received at the server. When
    a write() RPC returns, the NFS client knows that
    the page is on the disk.
  • delayed commit - pages are held only in the cache
    until a commit() call is received for the
    relevant file. This is the default mode used by
    NFS v3 clients. A commit() is issued by the
    client whenever a file is closed.


59
  • Server caching does nothing to reduce RPC traffic
    between client and server
  • further optimisation is essential to reduce
    server load in large networks
  • NFS client module caches the results of read,
    write, getattr, lookup and readdir operations
  • synchronization of file contents (one-copy
    semantics) is not guaranteed when two or more
    clients are sharing the same file.
  • Timestamp-based validity check
  • reduces inconsistency, but doesn't eliminate it
  • validity condition for cache entries at the
    client
  • (T - Tc lt t) v (Tmclient Tmserver)
  • t is configurable (per file) but is typically set
    to 3 seconds for files and 30 secs. for
    directories
  • it remains difficult to write distributed
    applications that share files with NFS

t freshness guarantee Tc time when cache entry
was last validated Tm time when block was last
updated at server T current time

60
AFS
  • State-ful clients and servers.
  • Caching the files to clients.
  • File close gt check-in the changes.
  • How to maintain consistency?
  • Using Callback in v2/3 (Valid or Cancelled)

open read
applications
invalidate and re-cache
61
Why AFS?
  • Shared files are infrequently updated
  • Local cache of a few hundred mega bytes
  • Now 50100 giga bytes
  • Unix workload
  • Files are small, Read Operations dominated,
    sequential access is common, read/written by one
    user, reference bursts.
  • Are these still true?

62

63

64
Fault Tolerance in AFS
  • a server crashes
  • a client crashes
  • check for call-back tokens first.

65
Problems with AFS
  • Availability
  • what happens if call-back itself is lost??

66
GFS Google File System
  • failures are norm
  • Multiple-GB files are common
  • Append rather than overwrite
  • Random writes are rare
  • Can we relax the consistency?

67
(No Transcript)
68
(No Transcript)
69
CODA
  • Server Replication
  • if one server goes down, I can get another.
  • Disconnected Operation
  • if all go down, I will use my own cache.

70
Consistency
  • If John update file X on server A and Mary read
    file X on server B.

Read-one Write-all
71
Read x Write (N-x1)
read
write
72
Example R3W4 (61)
Initial 0 0 0 0 0 0
Alice-W 2 2 0 2 2 0
Bob-W 2 3 3 3 3 0
Alice-R 2 3 3 3 3 0
Chris-W 2 1 1 1 1 0
Dan-R 2 1 1 1 1 0
Emily-W 7 7 1 1 1 7
Frank-R 7 7 1 1 1 7
73
Write a Comment
User Comments (0)
About PowerShow.com