15-440, Hadoop Distributed File System Allison Naaktgeboren PowerPoint PPT Presentation

presentation player overlay

About This Presentation

Transcript and Presenter's Notes

Title: 15-440, Hadoop Distributed File System Allison Naaktgeboren

1
15-440, Hadoop Distributed File SystemAllison
Naaktgeboren

Ur doin' it rong kitteh

Wut u mean? I iz loadin a HA-doop fileh

2
Annoucements

Go Vote!
Interpretive Dances happen only after Lecture
Office Hour Change
Mon 630-930
Tues 6-730
Exams are graded

3
Hadoop Core at 30,000 ft
4
Back to the Map Reduce Model

Recall that
map (in_key, in_value) -gt
(inter_key, inter_value) list
combine (inter_key, inter_value) ? (inter_key,
inter_value)
reduce (inter_key, inter_value list) -gt
(out_key, out_vlaue)?
What resource are we most constrained by?
Oceans of Data, Skinny pipes
How many types of data will the file system care
about?
How long will we need each kind?
What is the common case for each?

5
(No Transcript)
6
What would a MR Filesytem need?

General Use case large files
Mostly append to end, long sequential reads, few
deletes
Appends might be concurrent
Scability
Adding (or losing) machines should be relatively
painless
Nodes work on nearby data
Minimize moving data between machines
Bandwidth is our limiting resource
Remember how much data
Failure (handling)is Common
Yea, yea we know, we took 213, we know hardware
sucks
No, really failure (handling) is common
(constant)?
Disks, processors,whole nodes, racks, and
datacenters

7
Addressing Those Concerns

Sequential Reads, appends need to be fast
Deletes can be painful
Hot plug machines
Add or lose machines while system is running jobs
System should auto detect the change
HDFS should distribute data somewhat evenly
So that all workers have a reasonable amount of
data to chew on
And coordinating with the Jobtracker (job
master)?
Data Replication
Should be spread out. Why?
What type of problems could arise?

8
Moving into the Details

Nodes in HDFS
NameNode (master) ( like GFS Master)?
DataNodes (slaves) ( like GFS chunkservers)?
NB Hadoop and HDFS closely paired
careful use of jargon defines the true expert
worker node A and data node 1 are frequently
the same machine
Two types of Masters
Jobtracker (Hadoop Job Master)?
NameNode (file system Master)?
What I mean by 'master' for the rest of the
lecture

9
Your Data goes in ....

Files are divided into Chunks
64 MB
The mapping between filename and chunks goes to
the Master
Each chunk is replicated and sent off to
DataNodes
By default, 3
The master determines which dataNodes

10
What the Clients Do

Where the data starts
On file creation creates a seperate file
w/checksum
When data fetched back from a dataNode, checksum
computed again
Cache file data
Avoid bothering the Master too often
When a Client has 1 chunk's worth of data
Contacts the Master,
Master sends name of dataNodes to send it to
ONLY sends it to the 1st

11
What the DataNodes Do

Heartbeat to the Master
Opens, closes, or replicates a chunk if requested
from Master
During replication, sends data to next dataNode
in chain

12
What the Namespace Node Does

System metadata!
Holds Name-gtID mapping
Chunk replicas locations
Transcation Logs
EditLog
FSImage
It is responsible for coherency
Uses the logs atomically
Addresses the conccurent writes issue
It is checkpointed
Similar to AFS volume snapshots
Will pull last consistent log upon restart

13
What the Namespace Node Does

Listens for Heartbeats
Listens for Client Requests
If no heartbeat
marks a node as dead
Its data is deregistered
It selects dataNodes
Which nodes get which chunks
Signals creating, opening, closing
Deletes
Orders move to /trash
Starts delete timer

14
All together Now!
15
Additional Resources

Hadoop wiki
Youtube ? Hadoop ? Google developer videos (1-3
will be helpful)?
Google University
Includes UW course, the other UW course, a couple
others
Use are your own risk
The Google File System paper is rather readable
as research papers go

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user