Distributed File System - PowerPoint PPT Presentation

About This Presentation
Title:

Distributed File System

Description:

Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference DFS A distributed implementation of the classical time sharing model of a ... – PowerPoint PPT presentation

Number of Views:396
Avg rating:3.0/5.0
Slides: 24
Provided by: mans49
Category:

less

Transcript and Presenter's Notes

Title: Distributed File System


1
Distributed File System
2
Outline
  • Basic Concepts
  • Current project
  • Hadoop Distributed File System
  • Future work
  • Reference

3
DFS
  • A distributed implementation of the classical
    time sharing model of a file system, where
    multiple users share files and storage resources.

4
Key Characteristics of DFS
  • Dispersion
  • Clients and files
  • Multiplicity
  • Clients and files

5
Primary issues of DFS
  • Naming and Transparency
  • Fault Tolerance

6
Naming
  • Naming mapping between logical and physical
    objects.
  • Multilevel mapping.
  • Transparent replicas and location

7
Naming Schemes Three Main Approaches
  • Host name local name
  • guarantees a unique system wide name.
  • Mount remote directories to local directories
  • once mounted, files can be referenced in a
    location-transparent manner
  • Total integration of the component file systems.
  • A single global name structure
  • If a server is unavailable, some arbitrary set of
    directories on on different machines also becomes
    unavailable

8
Transparency(1)
  • Login Transparency User can log in at any host
    with uniform login procedure and perceive a
    uniform view of the file system.
  • Access Transparency Client process on a hots has
    uniform mechanism to access all files in system
    regardeless of files are on local/remote host.
  • Location Transparency The names of the files do
    not reveal their physical location.

9
Transparency(2)
  • Concurrency Transparency An update to a file
    should not have effect on the correct execution
    of other process that is concurrently sharing a
    file.
  • Replication Transparency Files may be
    replicated to provide redundancy for availability
    and also to permit concurrent access for
    efficiency.

10
Fault Tolerance
  • Stateful Vs. Stateless
  • Maintain information on client
  • File Replication

11
Distinctions Between Stateful Stateless Service
  • Failure Recovery.
  • A stateful server loses all its volatile state in
    a crash.
  • With stateless server, the effects of server
    failure and recovery are almost unnoticeable.

12
File Replication
  • Several copies of a file's contents at different
    locations enable multiple servers to share the
    load of providing the service
  • Naming scheme maps a replicated file name to a
    particular replica.
  • Updates

13
Current Project
  • HDFS Hadoop Distributed File System
  • Distributed parallel fault tolerant file system.
    It is designed to reliably store very large files
    across machines in a large cluster.
  • Efficient, reliable, and open source

14
Hadoop is a framework for running applications on
large clusters built of commodity hardware. The
Hadoop framework transparently provides
applications both reliability and data motion.
Hadoop implements a computational paradigm named
Map/Reduce, where the application is divided into
many small fragments of work, each of which may
be executed or reexecuted on any node in the
cluster. In addition, it provides a distributed
file system (HDFS) that stores data on the
compute nodes, providing very high aggregate
bandwidth across the cluster. Both Map/Reduce and
the distributed file system are designed so that
node failures are automatically handled by the
framework.
15
HDFS
  • Hadoop's Distributed File System is designed to
    reliably store very large files across machines
    in a large cluster. It is inspired by the Google
    File System. Hadoop DFS stores each file as a
    sequence of blocks, all blocks in a file except
    the last block are the same size. Blocks
    belonging to a file are replicated for fault
    tolerance. The block size and replication factor
    are configurable per file. Files in HDFS are
    "write once" and have strictly one writer at any
    time.
  • Hadoop Distributed File System Goals
  • Store large data sets
  • Cope with hardware failure
  • Emphasize streaming data access

16
Architecture
  • Like Hadoop Map/Reduce, HDFS follows a
    master/slave architecture. An HDFS installation
    consists of a single Namenode, a master server
    that manages the filesystem namespace and
    regulates access to files by clients. In
    addition, there are a number of Datanodes, one
    per node in the cluster, which manage storage
    attached to the nodes that they run on. The
    Namenode makes filesystem namespace operations
    like opening, closing, renaming etc. of files and
    directories available via an RPC interface. It
    also determines the mapping of blocks to
    Datanodes. The Datanodes are responsible for
    serving read and write requests from filesystem
    clients, they also perform block creation,
    deletion, and replication upon instruction from
    the Namenode.

17
(No Transcript)
18
  • Naming central metadata server
  • Synchronization write-once-read-many, give locks
    on objects to clients, using leases
  • Consistency and replication server side
    replication, asynchronous replication, checksum
  • Fault tolerance failure as norm
  • Security no dedicated security mechanism

19
Future Work
  • Robustness of data sharing model
  • The preceding section, architecture, naming,
    synchronization, availability, heterogeneity and
    support for databases
  • Security

20
Reference
  • 1 Thanh, T.D. Mohan, S. Choi, E. SangBum
    Kim Pilsung Kim. 2008Networked Computing and
    Advanced Information Management. A Taxonomy and
    Survey on Distributed File Systems
  • 2 Randy chow,1997,Distributed operating systems
    Algorithms
  • 3 Eliezer Levy, Abraham Silberschatz. December
    1990 Computing Surveys (CSUR) , Volume 22 Issue
    4. Distributed file systems concepts and
    examples.
  • 4http//hadoop.apache.org/common/docs/current/hd
    fs_design.htmlIntroduction
  • 5http//www.snia.org/events/wintersymp2009/cloud
    /dhruba_hadoop_snia.pdf

21
  • 6http//en.wikipedia.org/wiki/List_of_file_syste
    msDistributed_file_systems
  • 7http//en.wikipedia.org/wiki/HadoopHadoop_Dist
    ributed_File_System
  • 8http//www.cs.gsu.edu/cscyqz/courses/aos/slide
    s08/ch6.1-Fall08.pptx

22
QA?
23
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com