Hadoop - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

Hadoop

Description:

Hadoop 3. MapReduce MapReduce MapReduce ... – PowerPoint PPT presentation

Number of Views:249
Avg rating:3.0/5.0
Slides: 41
Provided by: Andre615
Category:
Tags: hadoop | mapreduce

less

Transcript and Presenter's Notes

Title: Hadoop


1
Hadoop
  • ?????? 3.
  • ???????? MapReduce

2
????
  • ??????? ???????? MapReduce
  • ?????? MapReduce
  • ??????? ????????????? MapReduce
  • ??????????? ?????????? MapReduce ? Hadoop
  • ?????????? MapReduce

3
??????? ????????
  • MapReduce ??????????? ?????? ???????????????
    ??? ???????????? ????????? ??????? ??????? ??????
    ?? ???? ?????????? ?????? ?? ??????????? ?????
  • MapReduce ????????? ? Google
  • Jeffrey Dean, Sanjay Ghemawat. MapReduce
    Simplified Data Processing on Large Clusters.
    2004.
  • ???? ????????? ?????????? Web ??? ?????????
    ???????

4
MapReduce ? Google
  • ???????? ???? ????????? ???, ?????????? ??
    ?????? ????
  • ?????????? ? ????????? ??????
  • ???????????????, ????????????????? ? ?????????
    ???????
  • ??????????
  • ????? ??????? ? ???????? ?????????? ???. ???
    ????? ?? ??? ?????????? ??? ?????????? ? 3800
    ????? C ?? 700
  • ??????? ???????? ????????? ? ??????? ??????????
    (??? ??? ?????? ?????? ???????)
  • ??????? ??????????????? ????? ??????????? ?????
    ????? ??? ????????? ?????????? ?????????

5
?????????? MapReduce
  • Google ???????? ?????????? ?? C
  • Apache Hadoop ???????? ?????????? ?? Java
  • Erlang
  • NoSQL
  • MongoDB
  • CouchB

6
?????? MapReduce
  • ?????????????? ????????????????
  • ????????? ???????
  • ??????? ?????? ?? ??????????
  • ???????????, ??? ????? ??????? ? ???????, ? ??
    ??? ??? ??????
  • MapReduce ????????????? ???????????? ????????????
    ????????? ?? ??????? ?????
  • ????? ??????, ???? ?????? ???????? ????????????
    ???????

7
??????? Map
  • ?????? ? ???????????? ?????? ?????? ??????
  • ?????? ??????? Map toUpper(str)
  • ???????? ?????? ?? ????????, ????????? ?????!

8
??????? Reduce
  • ?????? ? ???????????? ?????? ???? ????????
  • ?????? ??????? Reduce
  • ??????? ?????? ????? ?? ????????

9
MapReduce
  • ? MapReduce ? ???????? ?????? ???????????
    ??????????????? ??????? Map ? Reduce
  • ?????? ??????? ?? ??? ????-????????
  • ??????? Map ? Reduce ?? ?? ?????? ?
    ?????????????? ??????
  • Map ????? ???????????? ??? ??????? ????????
    ???????? ?????? ????????? ????????? ?????????
  • Reduce ????? ???????????? ????????? ????????
    ????????

10
?????? ????-????????
  • ?????? ?????? ???1
  • ???? ??? ????????, ??????????? ?????
  • ???????? ?????? ? ????? ?????

1http//ftp.rts.ru/pub/info/stats/
11
????? ? Reduce
  • ????????? ????????? ?????? ? ??????? ???????
    ????????? Reduce ??????????? ????????
  • ??? ??????? ????? ???????????? ????????? ????????
    ????????

12
?????? WordCount
  • ??????? ?????????? ???? ?? ??????? ??????
  • ??????? ??????
  • file1 Hello World Bye World 
  • file2 Hello Hadoop Goodbye Hadoop
  • ????????? ??????????
  • Bye 1 Goodbye 1 Hadoop 2 Hello 2 World 2 

13
??????? Map ??? WordCount
  • ????????
  • map (filename, file-contents)
  • for each word in file-contents
  • emit (word, 1)
  • ?????????
  • file 1
  • Hello 1
  • World 1
  • Bye 1
  • World 1
  • file2 
  • Hello 1
  • Hadoop 1
  • Goodbye 1
  • Hadoop 1

14
??????? Reduce ??? WordCount
  • ????????
  • reduce (word, values)
  • sum 0
  • for each value in values
  • sum sum value
  • emit (word, sum)
  • ?????????
  • Bye 1 Goodbye 1 Hadoop 2 Hello 2 World 2 

15
????? ?????? ? MapReduce
16
????? ?????? ? MapReduce
  • ??????? ????? ??????????? ? HDFS ? ??????????????
    ?? ???? ?????
  • ?? ?????? ???? ??????????? ???????? Map ?
    ???????????? ??????? ?????
  • ????? ??????? Map ????? ???????????? ????? ????
  • ???????? Map ?????????? ????????????? ?????? ???
    ????-????????

17
????? ?????? MapReduce
  • ???? ????-???????? ?????????? ?? ???? ???
    ????????? Reduce
  • ??? ???????? ? ?????????? ?????? ??????????
    ?????? ???????? Reduce
  • ???????? ?????? ? ???? ?????? ???????????? ? HDFS

18
????? ?????? ? MapReduce
  • ?????????? ??????? ?????? MapReduce ????????????
    ?????????????
  • ??????????? ?? ????? ?????? ????? ??????
  • ????????? ?????????? ?????? ?? ????????????
    ??????? ????? ?????
  • ? ????????? ?????? ?????????? ??????????????
    ?????????????? ????? ????
  • ? ?????? ???? ???? ???????? Map ? Reduce
    ????????????? ??????????????? ?? ?????? ????

19
????????????? MapReduce
  • MapReduce ?????????? ???
  • ??????? ?????? ??????? ?????? (?? ????????
    ???????? ? ??????)
  • ??????? ?????????? ????? (?? ???????? ????? ?
    ??????)
  • ??? ????????? ??????? ??????? ?????? ?????????
    ???????
  • Hadoop ?? ???????? ??????????? ???????????????????
  • ?????????????????? ??????? ???????? ???????

20
?????????????? grep
  • ?????? ????? ????????? ? ????????? ??????
  • ??????? Map ?????? ?????? ????? ? ?????????? ?
    ????????. ??? ?????????? ?????????? ????
  • ???? ??? ?????
  • ???????? ??????????? ??????
  • ??????? Reduce ???????? ??????? ??????

21
????????? ? URL
  • ?????? ????????? ?????????? ????????? ? URL
  • ??????? Map ?????? ??????? ????????? ?
    Web-??????? ? ?????? ????
  • ???? URL
  • ???????? 1
  • ??????? Reduce ????????? ?????????? ??????????
    URL ? ?????? ????
  • ???? URL
  • ???????? ????? ?????????? ?????????

22
??????????????? ??????
  • ?????? ????????? ?????? ??????????, ? ???????
    ??????????? ???????? ?????
  • ??????? Map ?????? ????????? ? ??? ??????? ?????
    ?????????? ????
  • ???? ?????
  • ???????? ????????????? ?????????
  • ??????? Reduce ?????????? ?????? ??? ???????
    ????? ? ?????? ????
  • ???? ?????
  • ???????? ?????? ??????????????? ??????????

23
????????? ??? ?????
  • ?????? ????? ???????????? ??????? ????????? ???
    ????? ?? ????????? ??????
  • ??????? Map ?????? ?????? ? ????????? ????? ?
    ?????????? ????
  • ???? ??? ????????
  • ???????? ????????? ???? ????? ?? ????
  • ??????? Reduce ???? ???????? ????? ????????? ??
    ???? ??? ??????? ????????

24
MapReduce ? Hadoop
  • Hadoop ?????????? ?????????? MapReduce ?
    ????????? ????????? ??????
  • ???? ???????????????? Java
  • ???? ??????????? ?????? ??????? Map ? Reduce ??
    ?????? ?????? ? ?????????????? Streaming
  • ???????????? ??????? Linux ? Windows
    (??????????), ????? Unix ? Java

25
?????? ????? ? Hadoop
26
?????? ????? ? Hadoop
  • Job ?????? MapReduce
  • Task ????? Job, ??????????? Map ??? Reduce
  • Job Tracker ?????? ? ???????? Hadoop,
    ?????????? ?? ?????? ????? ????????????? ??????
    ?? ?????
  • Task Tracker ???????????? Task

27
????????? ????????? ? Hadoop
public class WordCount //????? ??? ???????
Map public static class Map extends
MapReduceBase implements MapperltLongWritable,
Text, Text, IntWritablegt //????? ???
??????? Reduce public static class Reduce
extends MapReduceBase implements ReducerltText,
IntWritable, Text, IntWritablegt //???????
main ????????? ?????? Hadoop public static void
main(String args) throws Exception
28
WordCount Map ? Hadoop
public static class Map extends MapReduceBase
implements MapperltLongWritable, Text, Text,
IntWritablegt private final static
IntWritable one new IntWritable(1) private
Text word new Text() public void
map(LongWritable key, Text value,
OutputCollectorltText, IntWritablegt output,
Reporter reporter) throws IOException
String line value.toString()
StringTokenizer tokenizer new
StringTokenizer(line) while
(tokenizer.hasMoreTokens())
word.set(tokenizer.nextToken())
output.collect(word, one)
29
WordCount Reduce ? Hadoop
public static class Reduce extends MapReduceBase
implements ReducerltText, IntWritable, Text,
IntWritablegt public void reduce(Text key,
IteratorltIntWritablegt values, OutputCollectorltTex
t, IntWritablegt output, Reporter reporter)
throws IOException int sum 0 while
(values.hasNext()) sum
values.next().get() output.collect(key,
new IntWritable(sum))
30
?????? ?????? ? Hadoop
public static void main(String args) throws
Exception JobConf conf new
JobConf(WordCount.class) conf.setJobName("wordc
ount") conf.setOutputKeyClass(Text.class)
conf.setOutputValueClass(IntWritable.class)
conf.setMapperClass(Map.class)
conf.setCombinerClass(Reduce.class)
conf.setReducerClass(Reduce.class)
conf.setInputFormat(TextInputFormat.class)
conf.setOutputFormat(TextOutputFormat.class)
FileInputFormat.setInputPaths(conf, new
Path(args0)) FileOutputFormat.setOutputPath(c
onf, new Path(args1)) JobClient.runJob(conf)

31
?????????
  • ??? ???????? ??????
  • conf.setCombinerClass(Reduce.class)
  • ????????? ????????? ?????????? ???????? ?
    ??????????? ?????????? ????? ????? ?????? Map ??
    ???????? Reduce
  • ??????
  • Map (ltHello, 1gt, ltWorld, 1gt, ltHello, 1gt,
    ltHadoop, 1gt)
  • Combiner (ltHello, 2gt, ltWorld, 1gt, ltHadoop, 1gt)
  • ????????? ????????? ????????? ????? ????????????
    ?? ???? ??????
  • ????? ???????????? Reducer, ???? ???????
    ?????????????? ? ????????????

32
????????????? ??????
  • MapReduce ????????????? ???????????? ???????
    ?????? ????? ?????????? Map
  • ?????? ??????? Map ???????????? ???? ???
    ????????? ??????? ??????
  • ???? ???? ???????, ?? ?? ??????? ?? ????? ?
    ?????????????? ??????? ?????????? Map
  • ?? ????????? ?????? ????? ?????? ????? 64??
  • Hadoop ????????? ????????? ?????? Map ?? ???
    ????, ??? ????? ??????? ??????
  • ??????????? ?????????? ? ??????

33
??????????? ?????????? ? ??????
34
???????? ??????????? HDFS
  • ? MapReduce ??????? ?????? ?? ??????????, ?
    ????????? ?????
  • ? HDFS ????? ???????????? ?????? ???? ??? (WORM)
  • ? MapReduce ?????? ?? ??????? Map ??????????????
    ???????? ?? 64??
  • ?????? ????? HDFS 64 ??. ?????? ??? ????? ??????
    ??????????? ?? ???? ????????
  • MapReduce ?????? ??????? ????? ??????? ??????
    ???????????????, ? ????? ???????????????
    ?????????? ??????? ????? ???????? ??????
  • HDFS ?????????????? ??? ????????? ????????

35
?????????? MapReduce
  • ??????? ?????????????? ??????????
  • ???? HBase, Hive, Pig, Mahout ? ??.
  • ???????????? ? ????????? ????????? ? ?????????
    ??????? ??????
  • ????????? ??????? ??????
  • ???????? ? ????? ???????? Map ??? Reduce ????? ?
    ???????? ???? ??????

36
???????? ? MapReduce
  • ???? ?? ?????????????? ???????? Map, Reduce ??
    ????? ???????????
  • ??????? ?????????????????????

37
???????? ? MapReduce
38
?????
  • MapReduce ??????????? ?????? ???????????????
    ??? ???????????? ????????? ??????? ??????? ??????
    ?? ???? ?????????? ?????? ?? ??????????? ?????
  • ??????? Map ? Reduce ???????? ? ??????????????
    ?????????????????
  • ?????????? ??????????? ?????????? ? ??????
  • ?????????? MapReduce ?????????????? ??????????,
    ????????????? ?????? ? ??????? ?????????????,
    ????????

39
?????????????? ?????????
  • MapReduce Simplified Data Processing on Large
    Clusters
  • http//labs.google.com/papers/mapreduce.html
  • MapReduce Tutorial
  • http//hadoop.apache.org/common/docs/stable/mapred
    _tutorial.html
  • A Study of Skew in MapReduce Applications
  • http//nuage.cs.washington.edu/pubs/opencirrus2011
    .pdf

40
  • ????????
Write a Comment
User Comments (0)
About PowerShow.com