当前位置: 首页 > news >正文

免费博客网站做装修网站公司

免费博客网站,做装修网站公司,wordpress无插件美化,网站竞争对手的选定一般参考什么标准的目录 1. 高级WordCount1.1 IntWritable降序排列1.2 输入输出格式1.3 处理流程 2. 代码和结果2.1 pom.xml中依赖配置2.2 工具类util2.3 高级WordCount2.4 结果 参考 本文引用的Apache Hadoop源代码基于Apache许可证 2.0#xff0c;详情请参阅 Apache许可证2.0。 1. 高级WordCo… 目录 1. 高级WordCount1.1 IntWritable降序排列1.2 输入输出格式1.3 处理流程 2. 代码和结果2.1 pom.xml中依赖配置2.2 工具类util2.3 高级WordCount2.4 结果 参考 本文引用的Apache Hadoop源代码基于Apache许可证 2.0详情请参阅 Apache许可证2.0。 1. 高级WordCount 文本内容就是下文2.3中的代码目标是要实现文本计数并且数量在前文本在后同时数量要升序排列。 1.1 IntWritable降序排列 IntWritable类型中实现一个升序排列的比较器代码如下。而实现IntWritable降序排序只需要定义一个新类继承IntWritable.Comparator并且重载public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2)使其返回值为父类该方法返回值的相反数。此外如果你想要让作为键的IntWritable类型进行降序排列还需要在MapReduce任务调度代码中设置Job.setSortComparatorClass(比较器.class)。 /*** Licensed to the Apache Software Foundation (ASF) under one* or more contributor license agreements. See the NOTICE file* distributed with this work for additional information* regarding copyright ownership. The ASF licenses this file* to you under the Apache License, Version 2.0 (the* License); you may not use this file except in compliance* with the License. You may obtain a copy of the License at** http://www.apache.org/licenses/LICENSE-2.0** Unless required by applicable law or agreed to in writing, software* distributed under the License is distributed on an AS IS BASIS,* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.* See the License for the specific language governing permissions and* limitations under the License.*/public static class Comparator extends WritableComparator {public Comparator() {super(IntWritable.class);}Overridepublic int compare(byte[] b1, int s1, int l1,byte[] b2, int s2, int l2) {int thisValue readInt(b1, s1);int thatValue readInt(b2, s2);return (thisValuethatValue ? -1 : (thisValuethatValue ? 0 : 1));}}1.2 输入输出格式 java类名输入/输出功能org.apache.hadoop.mapreduce.lib.input.TextInputFormatMapReduce默认的输入格式将输入文件按行分割每一行作为key, value对其中key是行的偏移量(从0开始)value 是行的内容org.apache.hadoop.mapreduce.lib.output.TextOutputFormatMapReduce默认的输出格式将输出写成文本文件每个key, value对占一行key和value之间用制表符(\t)分隔org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormatSequenceFile的输入格式读取Hadoop的二进制文件格式SequenceFileorg.apache.hadoop.mapreduce.lib.input.SequenceFileOutputFormatSequenceFile的输出格式将输出写成Hadoop的二进制文件格式SequenceFile (Hadoop定义的SequenceFile是一种高效、可分割的二进制文件格式支持压缩)   (Hadoop定义了好多输入输出格式由于我没有详细使用这里就不介绍了)   如果要进行多次MapReduce作业中间结果可以以SequenceFile的形式存储加速作业的运行。 1.3 处理流程 首先高级WordCount也要像普通WordCount一样对文本进行计数因此Reduce函数输入的键值对为TextIntWritable。而最终要求的结果键值对为IntWritable, Text如果把Reduce函数的输出键值对直接变为IntWritable, Text并且在该任务中只使用一个作业的话你会发现无法完成IntWritable降序排列(尽管你可以已经设置SortComparatorClass)那是因为Shuffle过程的排序只会发生在Map结束后Reduce发生前这时键的类型是Text而非IntWritable。   为了解决这个任务需要进行两次作业第一次作业负责计数并以SequenceFile的格式输出Map的输出、Reduce的输入和输出均为Text, IntWritable最终文件输出格式选择SequenceFileOutputFormat第二次作业负责交换键值对并以SequenceFile的个数读入然后再对键进行降序排列这就需要使用Hadoop自带的org.apache.hadoop.mapreduce.lib.map.InverseMapper它能交换键值对。这次作业的输入格式选择SequenceFileInputFormatMap输入和Map输出分别是Text, IntWritable、IntWritable, Text这时设置SortComparatorClass就可以实现IntWritable降序排列。 2. 代码和结果 2.1 pom.xml中依赖配置 dependenciesdependencygroupIdjunit/groupIdartifactIdjunit/artifactIdversion4.11/versionscopetest/scope/dependencydependencygroupIdorg.apache.hadoop/groupIdartifactIdhadoop-common/artifactIdversion3.3.6/versionexclusionsexclusiongroupIdorg.slf4j/groupIdartifactIdslf4j-log4j12/artifactId/exclusion/exclusions/dependencydependencygroupIdorg.apache.hadoop/groupIdartifactIdhadoop-mapreduce-client-core/artifactIdversion3.3.6/versiontypepom/type/dependencydependencygroupIdorg.apache.hadoop/groupIdartifactIdhadoop-mapreduce-client-jobclient/artifactIdversion3.3.6/version/dependency/dependencies2.2 工具类util import java.net.URI; import java.util.regex.Matcher; import java.util.regex.Pattern;import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils;public class util {public static FileSystem getFileSystem(String uri, Configuration conf) throws Exception {URI add new URI(uri);return FileSystem.get(add, conf);}public static void removeALL(String uri, Configuration conf, String path) throws Exception {FileSystem fs getFileSystem(uri, conf);if (fs.exists(new Path(path))) {boolean isDeleted fs.delete(new Path(path), true);System.out.println(Delete Output Folder? isDeleted);}}public static void removeALL(String uri, Configuration conf, String[] pathList) throws Exception {FileSystem fs getFileSystem(uri, conf);for (String path : pathList) {if (fs.exists(new Path(path))) {boolean isDeleted fs.delete(new Path(path), true);System.out.println(String.format(Delete %s? %s, path, isDeleted));}}}public static void showResult(String uri, Configuration conf, String path) throws Exception {FileSystem fs getFileSystem(uri, conf);String regex part-r-;Pattern pattern Pattern.compile(regex);if (fs.exists(new Path(path))) {FileStatus[] files fs.listStatus(new Path(path));for (FileStatus file : files) {Matcher matcher pattern.matcher(file.getPath().toString());if (matcher.find()) {System.out.println(file.getPath() :);FSDataInputStream openStream fs.open(file.getPath());IOUtils.copyBytes(openStream, System.out, 1024);openStream.close();}}}} }2.3 高级WordCount import java.io.IOException;import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat; import org.apache.hadoop.mapreduce.lib.map.InverseMapper; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;public class App {public static class IntWritableDecreaseingComparator extends IntWritable.Comparator {Overridepublic int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {return -super.compare(b1, s1, l1, b2, s2, l2);}}public static class MyMapper extends MapperLongWritable, Text, Text, IntWritable {Overrideprotected void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException {String[] splitStr value.toString().split(\\s);for (String str : splitStr) {context.write(new Text(str), new IntWritable(1));}}}public static class MyReducer extends ReducerText, IntWritable, Text, IntWritable {Overrideprotected void reduce(Text key, IterableIntWritable values, Context context)throws IOException, InterruptedException {int sum 0;for (IntWritable val : values) {sum val.get();}context.write(key, new IntWritable(sum));}}public static void main(String[] args) throws Exception {Configuration conf new Configuration();String tempPath hdfs://localhost:9000/user/developer/Temp;String[] myArgs {file:///home/developer/CodeArtsProjects/advanced-word-count/AdvancedWordCount.txt,hdfs://localhost:9000/user/developer/AdvancedWordCount/output};util.removeALL(hdfs://localhost:9000, conf, new String[] { tempPath, myArgs[myArgs.length - 1] });Job job Job.getInstance(conf, AdvancedWordCount);job.setJarByClass(App.class);job.setMapperClass(MyMapper.class);job.setReducerClass(MyReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);job.setOutputFormatClass(SequenceFileOutputFormat.class);job.setNumReduceTasks(2);for (int i 0; i myArgs.length - 1; i) {FileInputFormat.addInputPath(job, new Path(myArgs[i]));}FileOutputFormat.setOutputPath(job, new Path(tempPath));int res1 job.waitForCompletion(true) ? 0 : 1;if (res1 0) {Job sortJob Job.getInstance(conf, Sort);sortJob.setJarByClass(App.class);sortJob.setMapperClass(InverseMapper.class);sortJob.setInputFormatClass(SequenceFileInputFormat.class);sortJob.setOutputKeyClass(IntWritable.class);sortJob.setOutputValueClass(Text.class);sortJob.setSortComparatorClass(IntWritableDecreaseingComparator.class);FileInputFormat.addInputPath(sortJob, new Path(tempPath));FileOutputFormat.setOutputPath(sortJob, new Path(myArgs[myArgs.length - 1]));int res2 sortJob.waitForCompletion(true) ? 0 : 1;if (res2 0) {System.out.println(高级WordCount结果为:);util.showResult(hdfs://localhost:9000, conf, myArgs[myArgs.length - 1]);}System.exit(res2);}System.exit(res1);} } 2.4 结果 结果文件内容如下 64 14 { 13 } 12 import 8 int 7 public 7 4 static 4 class 4 - 4 new 4 Override 3 for 3 : 3 void 3 throws 3 extends 2 l1, 2 1; 2 0; 2 String[] 2 s2, 2 s1, 2 i 2 context.write(new 2 context) 2 conf, 2 InterruptedException 2 key, 2 IntWritable, 2 return 2 IOException, 2 b2, 2 sum 2 Context 2 protected 2 myArgs[myArgs.length 2 Text, 2 1]); 1 }; 1 values, 1 values) 1 value.toString().split(\\s); 1 value, 1 val.get(); 1 val 1 util.showResult(hdfs://localhost:9000, 1 util.removeALL(hdfs://localhost:9000, 1 str 1 splitStr) 1 splitStr 1 res 1 reduce(Text 1 org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; 1 org.apache.hadoop.mapreduce.lib.input.FileInputFormat; 1 org.apache.hadoop.mapreduce.Reducer; 1 org.apache.hadoop.mapreduce.Mapper; 1 org.apache.hadoop.mapreduce.Job; 1 org.apache.hadoop.io.WritableComparable; 1 org.apache.hadoop.io.Text; 1 org.apache.hadoop.io.LongWritable; 1 org.apache.hadoop.io.IntWritable; 1 org.apache.hadoop.fs.Path; 1 org.apache.hadoop.conf.Configuration; 1 myArgs.length 1 myArgs 1 map(LongWritable 1 main(String[] 1 l2); 1 l2) 1 key); 1 job.waitForCompletion(true) 1 job.setSortComparatorClass(IntWritableDecreaseingComparator.class); 1 job.setReducerClass(MyReducer.class); 1 job.setOutputValueClass(Text.class); 1 job.setOutputKeyClass(IntWritable.class); 1 job.setMapperClass(MyMapper.class); 1 job.setJarByClass(App.class); 1 job.setCombinerClass(MyReducer.class); 1 job 1 java.io.IOException; 1 if 1 i) 1 compare(byte[] 1 compare(WritableComparable 1 byte[] 1 b1, 1 b); 1 b) 1 args) 1 a, 1 WritableComparable 1 Text 1 Text(str), 1 Text 1 System.out.println(高级WordCount结果为:); 1 System.exit(res); 1 ReducerText, 1 Path(myArgs[myArgs.length 1 Path(myArgs[i])); 1 MyReducer 1 MyMapper 1 MapperLongWritable, 1 Job.getInstance(conf, 1 Job 1 IterableIntWritable 1 IntWritableDecreaseingComparator 1 IntWritable 1 IntWritable.Comparator 1 IntWritable(sum), 1 IntWritable(1)); 1 FileOutputFormat.setOutputPath(job, 1 FileInputFormat.addInputPath(job, 1 Exception 1 Configuration(); 1 Configuration 1 App 1 ? 1 1 1 1])); 1 0) 1 0 1 -super.compare(b1, 1 -super.compare(a, 1 1 (res 1 (int 1 (String 1 (IntWritable 1 hdfs://localhost:9000/user/developer/AdvancedWordCount/output 1 file:///home/developer/CodeArtsProjects/AdvancedWordCount.txt, 1 AdvancedWordCount); 1 conf参考
http://www.pierceye.com/news/939398/

相关文章:

  • 西安烽盈网站建设有人上相亲网站做传销燕窝
  • 您身边的网站建设专家四川省建设注册资格中心网站
  • 东莞公司网站做优化做企业网站的合同
  • 网站域名空间怎么提交北京公司网站制作方法
  • 网站伪静态是什么意思中国知名网站排行榜
  • 国外网站注册软件用python做购物网站
  • 网站设计 素材如何攻克房地产网站
  • 上不了国外网站 怎么做贸易网页浏览器设置在哪里
  • delphi可以做网站吗百色建设局网站
  • 网站建设及维护课件免费请人做装修设计上什么网站
  • 川沙网站建设淘客插件wordpress
  • 门户网站开发技术服务合同免费网页游戏源码
  • 网站批量查询工具做影视外包的网站
  • 营销型网站建设试题html5网站网址
  • 网站建设策划书(建设前的市场分析)环球资源网的定位
  • 上海企业都用什么网站网站公司建站
  • 华为云速建站可以做英文网站高端服装产品网站建设
  • 网站建设中html 下载哪个平台做网站比较好
  • 成都网站设计哪家比较好邯郸市空船网络科技有限公司
  • 网站制作类软件推荐南昌网站建设推广专家
  • 做英文兼职的网站四川路桥建设股份有限公司网站
  • 电商网站开发的意义传统营销
  • 怎么自己创建网站或者app足球世界排名
  • 营站快车代理平台跑腿网站开发
  • 免费自助建站系统下载html5手机网站制作
  • 工信部网站怎么查网址邹平县建设局网站
  • 郑州有学网站制作网站背景修改
  • 免费建建网站域名没过期 网站打不开怎么办
  • 单页企业网站模板WordPress社团展示
  • 网站建设需要具备什么条件网站首页新世纪建设集团有限公司