浅谈Hadoop_技术资料_物联网_中国计算网——工业互联网一站式服务平台—

　　System.err.println("Usage: wordcount [...] ");

　　System.exit(2);

　　}

　　Job job = Job.getInstance(conf, "word count");

　　job.setJarByClass(WordCountTask.class);

　　job.setMapperClass(WordCountMap.class);

　　job.setReducerClass(WordCountReduce.class);

　　job.setOutputKeyClass(Text.class);

　　job.setOutputValueClass(IntWritable.class);

　　FileSystem fs = FileSystem.get(conf);

　　for (int i = 0; i < otherArgs.length - 1; i++) {

　　FileInputFormat.addInputPath(job, new Path(otherArgs[i]));

　　}

　　if(fs.exists(new Path(otherArgs[otherArgs.length - 1]))){

　　fs.delete(new Path(otherArgs[otherArgs.length - 1]));

　　}

　　FileOutputFormat.setOutputPath(job, new Path(otherArgs[(otherArgs.length - 1)]));

　　job.setNumReduceTasks(1);

　　System.exit(job.waitForCompletion(true) ? 0 : 1);

　　}

　　6.提交：

　　hadoop jar hadoop-examples.jar demo.wordcount(主类名) Dmapreduce.job.queuename=XX(系统参数) input output

　　缺点：无定时调度

　　常用的InputFormat:

　　TextInputFormat key:行便宜 value:文本内容，split计算：splitSize=max("mapred.min.split.size",min("mapred.max.split.size",blockSize)) mapred.min.split.size 在大量文本输入的情况下，需要控制map的数量，可以调此选项。

　　CombineTextInputFormat(集群默认)，多个小文件分片送到一个map中处理，主要解决多个小文件消耗map资源的问题。

　　sequenceFileInputFormat，采用自己的序列化方式，通常文件名为key，value为文件内容，可在存储上解决小文件对namenode的影响。

3/3 首页上一页 1 2 3