Distributed File Systems I - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Distributed File Systems I

Description:

The University of California created a free version of NFS in the late 1980s, ... To the user, this transaction is totally transparent. India Systems Technology Labs ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 66
Provided by: Tris61
Category:

less

Transcript and Presenter's Notes

Title: Distributed File Systems I


1
Distributed File Systems - I
  • Trishali Nayar
  • Staff Software Engineer
  • India Systems Technology Lab
  • IBM Pune.

2
Agenda
  • File Systems
  • Distributed File Systems
  • History, Issues and Need
  • Characteristics
  • Examples
  • Network File System (NFS)
  • Introduction
  • NFS Version 4
  • Features

3
File Systems
  • A file is a collection of related information
    defined by its creator. 
  • A file system is a hierarchical structure (file
    tree) of files and directories. This file tree
    uses directories to organize data and programs
    into groups, allowing the management of several
    directories and files at one time.
  • Some tasks are performed more efficiently on a
    file system than on each directory within the
    file system. For example, you can back up, or
    move an entire file system.

4
File System
5
History
  • Early days - large, simple, centrally
    administered management environment.
  • Data was accessed using character-cell terminals
    directly connected to individual machines.
  • No local storage was available to the user except
    in the form of offline media such as magnetic
    tape.
  • Required users to share processor time and other
    resources.
  • Difficult to provide access to geographically
    remote systems, and such connections were
    extremely slow or otherwise unreliable.

6
History
  • This situation changed significantly in the
    1980s, as minicomputers, workstations, and
    personal computer systems began appearing in the
    marketplace.
  • These machines contained local disks, data began
    spreading across the enterprise in a relatively
    unconstrained fashion. This created a more
    complex management environment for the following
    reasons
  • Revision control and Multiple Disk space
  • The problem becomes worse as files proliferate
    across the enterprise, resulting in many copies
    that might all contain locally introduced
    changes.
  • Files invariably will become out of sync with the
    master copies.

7
Issues
  • Critical data requires proper backup and
    archiving.
  • Needs more hardware such as a local tape drive.
  • Proper storage of media to prevent data loss or
    theft.
  • Outages to individual workstations holding
    important data might impact overall business
    activities.
  • Outage due to hardware failure or user error is
    much higher than that of a highly available,
    managed server system.

8
Issues
  • Lacking a centralized infrastructure available to
    all client hardware, users might find it
    difficult to locate resources such as
    applications, shared hardware, and data files.
    Therefore, a great deal of time is wasted in
    attempts to locate a particular file.
  • Management of an increasing number of remote
    systems becomes progressively difficult. Each
    machine requires hardware maintenance, operating
    system and other software upgrades, and
    individual copies of licensed software.
  • Distribution of corporate data might involve
    shipping tapes or other media to remote offices,
    or copying files over a network link on a
    periodic basis.

9
Evolution
  • These situations generated a great deal of
    research into distributed computing technologies.
    The areas of namespace management, remote access
    to file systems and other resources, replication
    and other high availability technologies,
    centralized backup and data archiving, and
    authorization received particular attention for
    the reasons noted earlier.

10
Distributed File Systems
  • A distributed file system enables co-operating
    hosts (clients and servers) to efficiently share
    file system resources across both local area and
    wide area networks.
  • It allows users to access remote files and
    directories and treat those files and directories
    as if they were local.
  • Operating system commands can be used to create,
    remove, read, write, and set file attributes for
    remote files and directories.

11
Distributed File Systems
  • A network or distributed file system is a
    collection of servers and storage devices that
    are dispersed across machines on a network.
  • Activity to the storage devices must be carried
    out across the network. Instead of a single
    centralized data repository, the file system
    consists of multiple, independent storage
    devices. The configuration of a distributed file
    system can vary. Servers can run on dedicated
    machines, while other machines can be both a
    server and a client.

12
Distributed File Systems
  • Early versions of these enterprise file systems
    did not operate well in wide area network (WAN)
    environments. These file systems were designed
    for use on fast local area networks (LANs), and
    the long latencies of WANs greatly impacted their
    performance.
  • The design point for these file systems was the
    small workgroup, and their security was weak for
    this reason. Support for replication and location
    independence was also limited.

13
Distributed File Systems
14
Characteristics
  • Security Authentication and Authorization
  • Federated Namespace
  • Caching
  • Replication
  • Migration

15
Authentication
  • Have a central authentication, users do not
    require individual login IDs for numerous
    systems.
  • Requires less management tasks in the form of
    account maintenance.
  • The absence of many individual per-machine user
    accounts also helps avoid creating a fragmented
    environment in regard to enterprise-wide
    applications.
  • Access management is easier, because a given user
    might require different levels of authorization
    to data and other resources located on each
    system.

16
Federated Namespace
  • Providing a single federated rendered namespace
    that is seen from all users greatly enhances
    collaboration and sharing of file system data.
  • The namespace can exist across many server
    systems that might be geographically far apart or
    reside in different administrative domains and
    organizations.
  • Significant administrative control over how the
    namespace is rendered, including which parts of
    the namespace are visible.
  • Redundancy to protect from single points of
    failure or loss of connectivity to central
    resources.
  • Integration with data management features to
    allow physical location transparency within a
    namespace.

17
Caching
  • Most network file system client implementations
    do caching of both data and attributes to improve
    performance and reduce network traffic. This
    brings the performance of networked file
    operations closer to those seen by accessing data
    on storage local to the client.

18
Replication
  • Replication provides for copies of file system
    data on multiple servers.
  • Clients are aware of the replicas and can switch
    to another server when the currently accessed
    server becomes unavailable.
  • Clients can analyze available sites and choose
    which one to access based on network properties
    or performance.
  • Replication has classically been provided with
    read-only or read-mostly data.
  • Ideally, the administrative model allows for a
    high degree of control over replica content and
    release of updates.

19
Fail-Over Replication
20
Load-Balancing Replication
21
Replication - Benefits
  • Better performance through affinity (hosting
    replicas closer to clients)
  • Better performance through distribution of load
    across multiple servers.
  • Reduced network (especially WAN) traffic.
  • Continued access to data when there is loss of
    connectivity to resources across the WAN.
  • A convenient means for controlled geographic
    distribution of data.

22
Replication - Benefits
  • Increased availability without single points of
    failure.
  • Controlled consistency through administrator-initi
    ated updates.

23
Migration
  • Migration enables the relocation of file system
    data from one server to another.
  • To be effective, it includes server mechanisms
    that keep data available (online) during
    migration events with minimal access delays.
  • Clients are migration aware. They recognize
    migration events, follow the data to its new
    location, and avoid disruptions or unexpected
    events when accessing applications.
  • Benefits of migration include
  • Managing load across servers.
  • Server-off load for replacement, retirement,
    maintenance.

24
Strategic /Business Context
  • Enterprise file system is an important component
    of an information technology infrastructure, it
    is only part of the solution. Customers are faced
    with the problem of simplifying and optimizing
    existing infrastructures. This includes servers,
    networks, clients, management processes, and
    applications.
  • The overall goal is to reduce cost and complexity
    while providing a foundation for growth.
  • The overall performance of an enterprise file
    system is directly affected by the larger
    environment in which it operates. Servers,
    storage, and network all have impacts. There is a
    wide range of middleware products, application
    packages, and custom applications that might need
    modifications in order to effectively exploit an
    enterprise file system.

25
AFS
  • The Andrew File System (AFS) was developed at the
    Information Technology Center at Carnegie-Mellon
    University in the early 1980s.
  • Transarc Corporation, a startup company, provided
    a commercially available version of AFS. Transarc
    was fully integrated into IBM in 1999.
  • AFS provides many advantages over NFSv2 and NFSv3
    in a large enterprise environment. These include
  • Client-side caching that reduces network traffic.
  • Global namespace that eliminates the need for
    user logins on multiple servers and multiple
    mount points.

26
AFS
  • A derivative of MIT Kerberos IV was used to
    provide authentication and encryption services.
  • Greatly improved security over public networks.
  • Powerful administrator functions that enable data
    movement and backup without shutting down user
    access to the data.
  • Replication capabilities.
  • Clients are provided with an automatic fail-over
    capability, allowing them to detect the loss of a
    server and connect to another machine with no
    user intervention required.

27
DCE/DFS
  • Distributed Computing Environment / Distributed
    File System
  • The Distributed Computing Environment (DCE) was
    designed in the late 1980s by the Open Software
    Foundation (OSF)
  • The DCE product fundamentally offers a remote
    procedure call (RPC) mechanism, which provides
    the basis for the products centralized namespace
    management, encryption and authentication
    subsystem based on an early release of the MIT
    Kerberos V5 protocol, Distributed Time Service
    (DTS), and an application threading mechanism.

28
DCE/DFS
  • DFS provides centralized file system creation,
    management, optimization, replication, and backup
    services, along with local caching to improve
    performance across the network.
  • Large installations often involve multiple
    replicas of critical file systems, distributed
    across geographically.
  • DFS extends AFS technology, providing file-level
    access control, improved encryption and security,
    and other enhancements including
  • Improvements to ACL management and granularity.
  • Full POSIX file system semantics, including
    byte-range file locking.

29
DCE/DFS
  • Elimination of the AFS 2 GB file size limitation.
  • Scheduled replication, in addition to the AFS
    release replication.
  • Better performance with a kernel resident file
    server.
  • Better security and directory services.
  • The more robust Episode file system.
  • Log-based file system, which is more reliable and
    offers better recovery.
  • DFS is also an OSF/Open Group product, which is
    licensed to IBM and other vendors that offer it
    commercially on a number of platforms.

30
Network File System
  • NFS is a distributed file system that enables
    users to access files and directories on remote
    servers as if they were local.
  • The user can use operating system commands to
    create, remove, read, write, and set file
    attributes for remote files and directories.
  • NFS is independent of machine types, operating
    systems, and network architectures through the
    use of Remote Procedure Calls (RPCs)

31
History
  • Developed originally by Sun Microsystems in the
    early 1980s, Network File System v1 (NFSv1) was
    used internally by Sun to access and move files
    over the network between servers.
  • NFS enables servers to mount remote file systems
    from other servers over the network and allowed
    local access to the files on that remote file
    system.
  • In 1985, NFSv2 was released with the SunOS
    (UNIX) operating system. NFS was a useful tool
    that became very popular, and several variants
    were produced by various vendors. The University
    of California created a free version of NFS in
    the late 1980s, and a standard protocol was
    produced with RFC 1094 in 1987.

32
History
  • By the early 1990s, efforts were underway to
    produce an enhanced version of NFSv2 NFSv3 (RFC
    1813). NFSv3 was released in 1995. Its focus was
    to provide enhancements to performance while
    maintaining backward compatibility with NFSv2.
    The basic design of NFS did not significantly
    change from version 2 to version 3, and it
    retained major design features such as stateless
    design, security, and recovery.
  • In 1998, Sun initiated an effort to design NFSv4
    (RFC 3530). This design resulted in significant
    changes for NFS.

33
NFS
  • NFS has evolved into a powerful enterprise file
    system.
  • Standards-based with multiple vendor support,
    NFSv4 offers the ability to quickly deploy an
    enterprise file system without imposing
    dependencies on custom code.
  • Because the NFS protocol is a standard, it can
    interoperate with other clients and platforms
    offering NFS support.

34
NFS
  • NFS operates on a client/server basis.
  • A NFS server has files on a local disk, which
    are accessed through NFS on a client machine.
    To handle these operations, NFS consists of
  • Networking protocols
  • Client and server daemons
  • Kernel extensions

35
NFS Client/Server Model
36
NFS
  • NFS is built on top of the TCP/IP protocol stack
    over the Remote Procedure Call (RPC) protocol.
  • NFS is an application layer protocol that uses
    other underlying protocols defined in the TCP/IP
    model.

37
Representing NFS in OSI 7 Layer
38
Remote Procedure Call
  • RPC is a library of procedures.
  • These procedures enable one process (the client
    process) to direct another process (the server
    process) to execute procedure calls as though the
    client process had executed the calls in its own
    address space.
  • Because the client and the server are two
    separate processes, they are not required to be
    on the same physical system, although they can
    be.
  • The RPC call used is based on the file system
    action taken by the user.
  • The server in turn will send the output from the
    command through RPC back to the client. To the
    user, this transaction is totally transparent.

39
XDR
  • External Data Representation is the specification
    for a standard representation of various data
    types. RPC uses data types defined by the XDR
    protocol.
  • Server and client processes can reside on two
    different physical systems and completely
    different architectures. and represent data in
    different manner.
  • By using a standard data type representation,
    data can be interpreted correctly, even if the
    source of the data is a machine with a completely
    different architecture.
  • A conversion of data into XDR format is needed
    before sending the data. Conversely, when it
    receives data, it converts it from XDR format
    into its own specific data type representation.

40
NFS
  • A server has files on a local disk, irrespective
    of file system type, that are accessed through
    NFS on a client machine.
  • Underlying File System Types could be JFS, JFS2,
    UFS, ZFS.

41
NFS
42
NFS Mounts
43
Export on Server
44
Commands
  • Server
  • /exports/home/sally -vers4,ro
  • /exports/home/bob -vers4,ro
  • /exports/home/mary -vers4,ro
  • /exports/project -vers4,ro
  • /exports/project/projA -vers4,ro
  • /exports/project/projB -vers4,ro
  • exportfs -va
  • Client
  • mount -o vers4 ltnfsv4_svr_namegt/ /nfs

45
Mount on Client
46
NFS V4 Goals
  • Improved access and good performance on the
    Internet
  • The protocol is designed to transit firewalls
    easily, perform well where latency is high and
    bandwidth is low, and scale to very large numbers
    of clients per server.
  • Strong security with negotiation built into the
    protocol
  • The protocol builds on the work of the ONCRPC
    working group in supporting the RPCSEC_GSS
    protocol. Additionally, the NFS version 4
    protocol provides a mechanism to allow clients
    and servers the ability to negotiate security and
    require clients and servers to support a minimal
    set of security schemes.

47
NFS V4 Goals
  • Good cross-platform interoperability
  • The protocol features a file system model that
    provides a useful, common set of features that
    does not unduly favor one file system or
    operating system over another.
  • Designed for protocol extensions
  • The protocol is designed to accept standard
    extensions that do not compromise backward
    compatibility.

48
NFS V4 Features
  • The new features incorporated into NFSv4 include
  • File system model
  • Access control lists (ACLs)
  • Caching and delegation
  • Stronger security
  • Compound RPC
  • Attributes
  • File Locking
  • Referral
  • Replication

49
Pseudo File System Model
  • The server provides multiple file systems by
    gluing them together with pseudo file systems.
    These pseudo file systems provide for potential
    gaps in the path names between real file systems.

50
Pseudo File System Model
51
Access Control Permissions
  • r READ_DATA or LIST_DIRECTORY Permission to read
    the data of the file or list the contents of the
    directory
  • w WRITE_DATA or ADD_FILE Permission to modify
    the files data or add a new file to the
    directory
  • p APPEND_DATA or ADD_SUBDIRECTORY Permission to
    append data to the file or add a new subdirectory
    to the directory
  • R/W READ/WRITE_NAMED_ATTRS Permission to
    read/write the named attributes of the file or
    directory
  • x EXECUTE Permission to execute the file or
    traverse the directory

52
Access Control Permissions
  • D DELETE_CHILD Permission to delete files or
    subdirectories from within the directory
  • a/A READ/WRITE_ATTRIBUTES Permission to
    read/change basic attributes (non-ACLs) of the
    file or directory
  • d DELETE Permission to delete the file or
    directory
  • c/C READ/WRITE_ACL Permission to read/change the
    ACL of the file or directory
  • o WRITE_OWNER Permission to change the owner of
    the file or directory
  • s SYNCHRONIZE Permission to access file locally
    at the server with synchronous reads and writes

53
Access Control Lists
54
Access Control Lists
55
Caching and Delegation
  • The file, attribute, and directory caching for
    the NFS version 4 protocol is similar to previous
    versions.
  • Delegations Allows a server to delegate specific
    actions on a file to a client.
  • Delegations are optional and are granted at the
    NFS servers discretion.
  • WRITE Delegation single client accessing a file,
    full control on all operations is delegated by
    the server.
  • READ Delegation multiple clients sharing a file
    in the absence of writing, server delegates the
    read-only OPEN to these clients.
  • Revocation of delegation requires a callback
    path.

56
Security
  • Traditional RPC implementations have included
    AUTH_NONE, AUTH_SYS, AUTH_DH, and AUTH_KRB4 as
    security flavors.
  • An additional security flavor of RPCSEC_GSS has
    been introduced which uses the functionality of
    GSS-API. Eg- Kerberos 5.
  • For NFS version 4 conformance, the RPCSEC_GSS
    security flavor MUST be implemented. Other
    flavors, such as, AUTH_NONE, AUTH_SYS, and
    AUTH_DH MAY be implemented as well.
  • The use of RPCSEC_GSS requires selection of
    mechanism, quality of protection, and service
    (authentication, integrity, privacy).

57
Compound RPC
  • Reduces number of RPCs in a complex request.
  • With the use of the COMPOUND procedure, the
    client is able to build simple or complex
    requests.
  • Combines related operations (READ LOOKUP OPEN
    READ)
  • There is no logical OR or AND ing of operations.
    The operations combined within a COMPOUND request
    are evaluated in order by the server.
  • Once an operation returns a failing result, the
    evaluation ends and the results of all evaluated
    operations are returned to the client.

58
Attribute Classes
  • Better interoperability
  • The set of attributes that were passed
    over-the-wire with earlier versions of NFS were
    very UNIX-oriented. This meant that the
    information returned by the server was sufficient
    to respond to a stat() call on the client. This
    made it difficult for non-UNIX systems to
    understand the protocol properly.
  • NFS V4 introduces a new set of file attributes in
    three different classes Mandatory, Recommended
    and Named

59
File Locking
  • Lease A time-bound grant of control of the state
    of a file, through a lock or a delegation, from
    the server to the client.
  • At the end of a lease period the lock may be
    revoked if the lease has not been extended. The
    lock must be revoked if a conflicting lock has
    been granted after the lease interval.
  • A lease is associated with a delegation. If the
    lease expires, delegation is revoked unless lease
    has been extended.

60
File Locking
  • Mandatory Locking block I/O operations by other
    applications on a file that contains a record
    lock. NLM provided only advisory locking.
  • Share reservation Grant a client access to open
    a file and ability to deny other clients access
    to this file, granularity over entire file.

61
Referral (AIX)
  • A feature useful for namespace building.
  • Allows you to distribute data across multiple
    servers in a way that is transparent to users.
  • A primary server redirects operations to a
    referral server. Single mount of the primary
    server required by the client.
  • An AIX feature useful for load balancing or
    general resource reallocation.

62
Referral (AIX)
63
Replication
  • A means of specifying multiple locations where
    copies of data reside.
  • If the primary data source goes down, use data
    from replication servers.

64
Questions
65
Thank You
Write a Comment
User Comments (0)
About PowerShow.com