当前位置：首页 > news >正文

网站建设云技术公司推荐重庆网页设计培训

news 2025/12/21 17:15:02

网站建设云技术公司推荐,重庆网页设计培训,品牌seo是什么,影视网站怎么做原创Bucket Map Join 之前的map join适用场景是大表join小表的情况#xff0c;但是两张表都相对较大#xff0c;若采用普通的Map Join算法#xff0c;则Map端需要较多的内存来缓存数据#xff0c;当然可以选择为Map段分配更多的内存#xff0c;来保证任务运行成功。但是#…Bucket Map Join 之前的map join适用场景是大表join小表的情况但是两张表都相对较大若采用普通的Map Join算法则Map端需要较多的内存来缓存数据当然可以选择为Map段分配更多的内存来保证任务运行成功。但是Map端的内存不可能无上限的分配所以当参与Join的表数据量均过大时就可以考虑采用Bucket Map Join算法。比如下面两张表进行join操作表名大小 order_detail 1176009934约1122M payment_detail 334198480约319M 首先需要依据源表创建两个分桶表order_detail建议分16个bucketpayment_detail建议分8个bucket,注意分桶个数的倍数关系以及分桶字段。 --订单表 hive (default) drop table if exists order_detail_bucketed; create table order_detail_bucketed(id string comment 订单id,user_id string comment 用户id,product_id string comment 商品id,province_id string comment 省份id,create_time string comment 下单时间,product_num int comment 商品件数,total_amount decimal(16, 2) comment 下单金额 ) clustered by (id) into 16 buckets row format delimited fields terminated by \t;--支付表 hive (default) drop table if exists payment_detail_bucketed; create table payment_detail_bucketed(id string comment 支付id,order_detail_id string comment 订单明细id,user_id string comment 用户id,payment_time string comment 支付时间,total_amount decimal(16, 2) comment 支付金额 ) clustered by (order_detail_id) into 8 buckets row format delimited fields terminated by \t; 然后向两个分桶表导入数据 --订单表 hive (default) insert overwrite table order_detail_bucketed selectid,user_id,product_id,province_id,create_time,product_num,total_amount from order_detail where dt2020-06-14;--分桶表 hive (default) insert overwrite table payment_detail_bucketed selectid,order_detail_id,user_id,payment_time,total_amount from payment_detail where dt2020-06-14; 然后设置以下参数 --关闭cbo优化cbo会导致hint信息被忽略需将如下参数修改为false set hive.cbo.enablefalse; --map join hint默认会被忽略(因为已经过时)需将如下参数修改为false set hive.ignore.mapjoin.hintfalse; --启用bucket map join优化功能,默认不启用需将如下参数修改为true set hive.optimize.bucketmapjoin true; 最后在重写SQL语句如下 select /* mapjoin(pd) */* from order_detail_bucketed od join payment_detail_bucketed pd on od.id pd.order_detail_id; 需要注意的是Bucket Map Join的执行计划的基本信息和普通的Map Join无异若想看到差异可执行如下语句查看执行计划的详细信息。详细执行计划中如在Map Join Operator中看到 “BucketMapJoin: true”则表明使用的Join算法为Bucket Map Join。 explain extended select /* mapjoin(pd) */* from order_detail_bucketed od join payment_detail_bucketed pd on od.id pd.order_detail_id; Sort Merge Bucket Map Join 两张表都相对较大除了可以考虑采用Bucket Map Join算法还可以考虑SMB Join。相较于Bucket Map JoinSMB Map Join对分桶大小是没有要求的。需要设置如下参数 --启动Sort Merge Bucket Map Join优化 set hive.optimize.bucketmapjoin.sortedmergetrue; --使用自动转换SMB Join set hive.auto.convert.sortmerge.jointrue;

查看全文

http://www.pierceye.com/news/878369/