电商网站建设策划方案,怎么在网上发布广告,营销策划公司排名,物流公司网站建设摘要#xff1a; 标签 PostgreSQL , Greenplum , varbitx , roaring bitmap , pilosa , varbit , hll , 多阶段聚合 背景 roaring bitmap是一个压缩比很高同时性能不错的BIT库#xff0c;被广泛使用#xff08;例如Greenplum, ES, InfluxDB. 点此查看原文 标签 PostgreSQL ,… 摘要 标签 PostgreSQL , Greenplum , varbitx , roaring bitmap , pilosa , varbit , hll , 多阶段聚合 背景 roaring bitmap是一个压缩比很高同时性能不错的BIT库被广泛使用例如Greenplum, ES, InfluxDB. 点此查看原文 标签 PostgreSQL , Greenplum , varbitx , roaring bitmap , pilosa , varbit , hll , 多阶段聚合 背景 roaring bitmap是一个压缩比很高同时性能不错的BIT库被广泛使用例如Greenplum, ES, InfluxDB...... Roaring bitmaps are compressed bitmaps which tend to outperform conventional compressed bitmaps such as WAH, EWAH or Concise. They are used by several major systems such as Apache Lucene and derivative systems such as Solr and Elasticsearch, Metamarkets Druid, LinkedIn Pinot, Netflix Atlas, Apache Spark, OpenSearchServer, Cloud Torrent, Whoosh, InfluxDB, Pilosa, Bleve, Microsoft Visual Studio Team Services (VSTS), and eBays Apache Kylin. 《Roaring Bitmap - A better compressed bitset》 https://github.com/RoaringBitmap/CRoaring 在PostgreSQL中内置了varbit的数据类型阿里云在其基础上扩展了对varbit的操作符 《阿里云RDS for PostgreSQL varbitx插件与实时画像应用场景介绍》 是的阿里云RDS PG支持以更低的成本、更高的性能支持海量画像的实时计算 《阿里云RDS PostgreSQL varbitx实践 - 流式标签 (阅后即焚流式批量计算) - 万亿级任意标签圈人毫秒响应》 《基于 阿里云 RDS PostgreSQL 打造实时用户画像推荐系统(varbitx)》 《惊天性能单RDS PostgreSQL实例 支撑 2000亿 - 实时标签透视案例》 对于Greenplum同样有社区的朋友贡献的插件让Greenplum可以支持roaringbitmap类型。 开源代码如下感谢贡献代码的小伙伴 https://github.com/zeromax007/gpdb-roaringbitmap 目前这个版本没有将聚合做到计算节点而是走了gather motion再聚合的方式聚合性能不佳。
postgres# explain select rb_cardinality(rb_and_agg(bitmap)) from t1; QUERY PLAN
---------------------------------------------------------------------------------------- Aggregate (cost1.05..1.07 rows1 width4) - Gather Motion 3:1 (slice1; segments: 3) (cost0.00..1.05 rows1 width1254608) - Seq Scan on t1 (cost0.00..1.01 rows1 width1254608)
(3 rows) Time: 0.727 ms 建议有兴趣的同学可以改进一下 roaringbitmap for Greenplum 聚合代码改成多阶段聚合让聚合先在计算节点做。 自定义分布式聚合的方法参考如下 《PostgreSQL 10 自定义并行计算聚合函数的原理与实践》 《Postgres-XC customized aggregate introduction》 《PostgreSQL aggregate function customize》 《Greenplum 最佳实践 - 估值插件hll的使用(以及hll分式聚合函数优化)》 接下来简单介绍一下roaringbitmap的安装与功能点。 安装 1、首先你需要安装好greenplum。 2、然后你需要下载gpdb-roaringbitmap
git clone https://github.com/zeromax007/gpdb-roaringbitmap 3、编译gpdb-roaringbitmap
If $GPHOME is /usr/local/gpdb . gcc -marchnative -O3 -stdc11 -Wall -Wpointer-arith -Wendif-labels -Wformat-security \
-fno-strict-aliasing -fwrapv -fexcess-precisionstandard -fno-aggressive-loop-optimizations \
-Wno-unused-but-set-variable -Wno-address -fpic -D_GNU_SOURCE \
-I/usr/local/gpdb/include/postgresql/server \
-I/usr/local/gpdb/include/postgresql/internal \
-c -o roaringbitmap.o roaringbitmap.c 或如下主要看你的头文件在哪里 gcc -marchnative -O3 -stdc11 -Wall -Wpointer-arith -Wendif-labels -Wformat-security \
-fno-strict-aliasing -fwrapv -fexcess-precisionstandard -fno-aggressive-loop-optimizations \
-Wno-unused-but-set-variable -Wno-address -fpic -D_GNU_SOURCE \
-I/usr/local/gpdb/include/server \
-I/usr/local/gpdb/include/internal \
-c -o roaringbitmap.o roaringbitmap.c gcc -O3 -stdgnu99 -Wall -Wpointer-arith -Wendif-labels -Wformat-security \
-fno-strict-aliasing -fwrapv -fexcess-precisionstandard -fno-aggressive-loop-optimizations \
-Wno-unused-but-set-variable -Wno-address -fpic -shared --enable-new-dtags \
-o roaringbitmap.so roaringbitmap.o 4、将so文件拷贝到所有gpdb节点(所有master, slave, segment, mirror等)的软件目录对应的lib目录中.
cp ./roaringbitmap.so /usr/local/gpdb/lib/postgresql/ 5、在MASTER节点连接到需要使用roaringbitmap的DB中执行如下SQL安装对应的类型操作符函数等。
psql -f ./roaringbitmap.sql 使用DEMO 1、建表使用roaringbitmap数据类型
CREATE TABLE t1 (id integer, bitmap roaringbitmap); 2、使用rb_build生成roaringbitmap的数据(输入为数组输出为roaringbitmap。含义数组位置对应的bit值设置为1)。
INSERT INTO t1 SELECT 1,RB_BUILD(ARRAY[1,2,3,4,5,6,7,8,9,200]); -- 将输入的多条记录的值对应位置的BIT值设置为1最后聚合为一个roaringbitmap INSERT INTO t1 SELECT 2,RB_BUILD_AGG(e) FROM GENERATE_SERIES(1,100) e; 3、两个roaringbitmap的BIT计算OR, AND, XOR, ANDNOT。andnot表示第一个参数与第二个参数的NOT进行AND操作等同于andnot(c1,c2)and(c1, not(c2))
SELECT RB_OR(a.bitmap,b.bitmap) FORM (SELECT bitmap FROM t1 WHERE id 1) AS a, (SELECT bitmap FROM t1 WHERE id 2) AS b; 4、一些聚合操作并生成新的roaringbitmap (OR, AND, XOR, BUILD)
SELECT RB_OR_AGG(bitmap) FROM t1;
SELECT RB_AND_AGG(bitmap) FORM t1;
SELECT RB_XOR_AGG(bitmap) FROM t1;
SELECT RB_BUILD_AGG(e) FROM GENERATE_SERIES(1,100) e; 5、Cardinality即roaringbitmap中包含多少个位置为1的BIT位。
SELECT RB_CARDINALITY(bitmap) FROM t1; 6、从roaringbitmap返回位置为1的BIT的下标位置值。
SELECT RB_ITERATE(bitmap) FROM t1 WHERE id 1; postgres# select rb_iterate(rb_build({1,4,100})); rb_iterate
------------ 1 4 100
(3 rows) 7、一些bit设置操作
postgres# select rb_iterate(rb_flip(rb_build({1,2,3,100,4,5}),7,10)); rb_iterate
------------ 1 2 3 4 5 7 8 9 100
(9 rows) 内置计算函数说明 List of functions Schema | Name | Result data type | Argument data types | Type
---------------------------------------------------------------------------------------------------------- public | rb_and | roaringbitmap | roaringbitmap, roaringbitmap | normal public | rb_and_cardinality | integer | roaringbitmap, roaringbitmap | normal public | rb_andnot | roaringbitmap | roaringbitmap, roaringbitmap | normal public | rb_andnot_cardinality | integer | roaringbitmap, roaringbitmap | normal public | rb_build | roaringbitmap | integer[] | normal public | rb_cardinality | integer | roaringbitmap | normal public | rb_equals | boolean | roaringbitmap, roaringbitmap | normal public | rb_flip | roaringbitmap | roaringbitmap, integer, integer | normal public | rb_intersect | boolean | roaringbitmap, roaringbitmap | normal public | rb_is_empty | boolean | roaringbitmap | normal public | rb_iterate | SETOF integer | roaringbitmap | normal public | rb_maximum | integer | roaringbitmap | normal public | rb_minimum | integer | roaringbitmap | normal public | rb_or | roaringbitmap | roaringbitmap, roaringbitmap | normal public | rb_or_cardinality | integer | roaringbitmap, roaringbitmap | normal public | rb_rank | integer | roaringbitmap, integer | normal public | rb_remove | roaringbitmap | roaringbitmap, integer | normal public | rb_xor | roaringbitmap | roaringbitmap, roaringbitmap | normal public | rb_xor_cardinality | integer | roaringbitmap, roaringbitmap | normal Function Input Output Desc Example rb_build integer[] roaringbitmap Build a roaringbitmap tuple from integer array. rb_build({1,2,3,4,5}) rb_and roraingbitmap,roaringbitmap roaringbitmap Two roaringbitmap tuples and calculation. rb_and(rb_build({1,2,3}),rb_build({3,4,5})) rb_or roraingbitmap,roaringbitmap roaringbitmap Two roaringbitmap tuples or calculation. rb_or(rb_build({1,2,3}),rb_build({3,4,5})) rb_xor roraingbitmap,roaringbitmap roaringbitmap Two roaringbitmap tuples xor calculation. rb_xor(rb_build({1,2,3}),rb_build({3,4,5})) rb_andnot roraingbitmap,roaringbitmap roaringbitmap Two roaringbitmap tuples andnot calculation. rb_andnot(rb_build({1,2,3}),rb_build({3,4,5})) rb_cardinality roraingbitmap integer Retrun roaringbitmap tuples cardinality. rb_cardinality(rb_build({1,2,3,4,5})) rb_and_cardinality roraingbitmap,roaringbitmap integer Two roaringbitmap tuples and calculation, return cardinality. rb_and_cardinality(rb_build({1,2,3}),rb_build({3,4,5})) rb_or_cardinality roraingbitmap,roaringbitmap integer Two roaringbitmap tuples or calculation, return cardinality. rb_or_cardinality(rb_build({1,2,3}),rb_build({3,4,5})) rb_xor_cardinality roraingbitmap,roaringbitmap integer Two roaringbitmap tuples xor calculation, return cardinality. rb_xor_cardinality(rb_build({1,2,3}),rb_build({3,4,5})) rb_andnot_cardinality roraingbitmap,roaringbitmap integer Two roaringbitmap tuples andnot calculation, return cardinality. rb_andnot_cardinality(rb_build({1,2,3}),rb_build({3,4,5})) rb_is_empty roraingbitmap boolean Check if roaringbitmap tuple is empty. rb_is_empty(rb_build({1,2,3,4,5})) rb_equals roraingbitmap,roaringbitmap boolean Check two roaringbitmap tuples are equal. rb_equals(rb_build({1,2,3}),rb_build({3,4,5})) rb_intersect roraingbitmap,roaringbitmap boolean Check two roaringbitmap tuples are intersect. rb_intersect(rb_build({1,2,3}),rb_build({3,4,5})) rb_remove roraingbitmap,integer roraingbitmap Remove the specified offset from roaringbitmap tuple. rb_remove(rb_build({1,2,3}),3) rb_flip roraingbitmap,integer,integer roraingbitmap Flip the specified offsets range (not include the end) from roaringbitmap tuple. rb_flip(rb_build({1,2,3}),7,10) -- 翻转BIT位置为7到10不含10的BIT值 rb_minimum roraingbitmap integer Return the smallest offset in roaringbitmap tuple. Return UINT32_MAX if the bitmap tuple is empty. rb_minimum(rb_build({1,2,3})) -- 返回该roaringbitmap中bit值设置为1的最小位置 rb_maximum roraingbitmap integer Return the greatest offset in roaringbitmap tuple. Return 0 if the bitmap tuple is empty. rb_maximum(rb_build({1,2,3})) -- 返回该roaringbitmap中bit值设置为1的最大位置 rb_rank roraingbitmap,integer integer Return the number of offsets that are smaller or equal to the specified offset. rb_rank(rb_build({1,2,3}),3) -- BIT位置小于等于N的BIT中有多少个bit位置被设置为1 rb_iterate roaringbitmap SETOF integer Bitmap to SETOF integer rb_iterate(rb_build({1,2,3,100})) 内置聚合函数说明 List of functions Schema | Name | Result data type | Argument data types | Type
---------------------------------------------------------------------------------------------------------------------- public | rb_and_agg | roaringbitmap | roaringbitmap | agg public | rb_and_cardinality_agg | integer | roaringbitmap | agg public | rb_build_agg | roaringbitmap | integer | agg public | rb_or_agg | roaringbitmap | roaringbitmap | agg public | rb_or_cardinality_agg | integer | roaringbitmap | agg public | rb_xor_agg | roaringbitmap | roaringbitmap | agg public | rb_xor_cardinality_agg | integer | roaringbitmap | agg Function Input Output Desc Example rb_build_agg integer roraingbitmap Build a roaringbitmap tuple from a integer set. rb_build_agg(1) rb_or_agg roraingbitmap roraingbitmap Or Aggregate calculations from a roraingbitmap set. rb_or_agg(rb_build({1,2,3})) rb_and_agg roraingbitmap roraingbitmap And Aggregate calculations from a roraingbitmap set. rb_and_agg(rb_build({1,2,3})) rb_xor_agg roraingbitmap roraingbitmap Xor Aggregate calculations from a roraingbitmap set. rb_xor_agg(rb_build({1,2,3})) rb_or_cardinality_agg roraingbitmap integer Or Aggregate calculations from a roraingbitmap set, return cardinality. rb_or_cardinality_agg(rb_build({1,2,3})) rb_and_cardinality_agg roraingbitmap integer And Aggregate calculations from a roraingbitmap set, return cardinality. rb_and_cardinality_agg(rb_build({1,2,3})) rb_xor_cardinality_agg roraingbitmap integer Xor Aggregate calculations from a roraingbitmap set, return cardinality. rb_xor_cardinality_agg(rb_build({1,2,3})) 例子 《惊天性能单RDS PostgreSQL实例 支撑 2000亿 - 实时标签透视案例》 背景 有20亿个BIT有几千万的标签。意味着有几千万行每一行有20亿个BIT组成的roaringbitmap。 求任意标签组合的cardinate. (rb_???_cardinality_agg) 设计 数据按标签字段分布
create table tbl (tagid int primary key, bitmap roaringbitmap)
distributed by (tagid) ; SQL 1、求合并的BIT中有多少为1的BIT
select rb_and_cardinality_agg(bitmap) from tbl where tagid in (?,......?); 2、求合并的BIT对应的BIT位置
select RB_ITERATE(rb) from (select rb_and_agg(bitmap) as rb from tbl where tagid in(1,2,3)) t; 加速 由于目前roaringbitmap gp这个插件没有支持agg中的prefunc所以聚合是收集到master节点操作的这个势必影响性能。
postgres# explain select rb_and_cardinality_agg(bitmap) from tbl where tagid in (1,2,3,4,5,6,7,8); QUERY PLAN
----------------------------------------------------------------------------------- Aggregate (cost0.04..0.06 rows1 width4) - Gather Motion 3:1 (slice1; segments: 3) (cost0.00..0.04 rows1 width32) - Seq Scan on tbl (cost0.00..0.00 rows1 width32) Filter: tagid ANY ({1,2,3,4,5,6,7,8}::integer[])
(4 rows) postgres# explain select RB_ITERATE(rb) from (select rb_and_agg(bitmap) as rb from tbl where tagid in(1,2,3)) t; QUERY PLAN
----------------------------------------------------------------------------------------- Result (cost0.04..0.07 rows3 width32) - Aggregate (cost0.04..0.06 rows1 width32) - Gather Motion 3:1 (slice1; segments: 3) (cost0.00..0.04 rows1 width32) - Seq Scan on tbl (cost0.00..0.00 rows1 width32) Filter: tagid ANY ({1,2,3}::integer[])
(5 rows) 为了加速务必要实现这些聚合函数的prefunc。 Greenplum支持的两种聚合运算模式 1. 如果只配置了sfunc则相关数据全部收集到master节点在master节点对所有数据依条加上sfunc的结果(第一次可选为initcond)输入给sfunc计算直到所有数据都跑完sfunc最后如果设置了finalfunc则计算并得到最终结果。 2. 如果同时配置了sfunc和prefunc则在segment节点并行完成sfunc然后将segment节点执行的结果发给master在master调用prefunc进行再次聚合输出结果如果配置了finalfunc则这个结果再给finalfunc执行并输出最终结果。 优化例子
//bitmap and trans
PG_FUNCTION_INFO_V1(rb_and_trans_pre);
Datum rb_and_trans_pre(PG_FUNCTION_ARGS); Datum
rb_and_trans_pre(PG_FUNCTION_ARGS) { MemoryContext aggctx; roaring_bitmap_t *r1; roaring_bitmap_t *r2; // We must be called as a transition routine or we fail. if (!AggCheckCallContext(fcinfo, aggctx)) ereport(ERROR, (errcode(ERRCODE_DATA_EXCEPTION), errmsg(rb_and_trans outside transition context))); // Is the first argument a NULL? if (PG_ARGISNULL(0)) { r1 setup_roaringbitmap(aggctx); } else { r1 (roaring_bitmap_t *) PG_GETARG_POINTER(0); } // Is the second argument non-null? if (!PG_ARGISNULL(1)) { r2 (roaring_bitmap_t *) PG_GETARG_POINTER(1); if (PG_ARGISNULL(0)) { r1 roaring_bitmap_copy(r2); } else { roaring_bitmap_and_inplace(r1, r2); } roaring_bitmap_free(r2); } PG_RETURN_POINTER(r1);
} CREATE OR REPLACE FUNCTION rb_and_trans_pre(internal, internal) RETURNS internal AS roaringbitmap.so, rb_and_trans_pre LANGUAGE C IMMUTABLE; CREATE AGGREGATE rb_and_agg(roaringbitmap)( SFUNC rb_and_trans, PREFUNC rb_and_trans_pre, STYPE internal, FINALFUNC rb_serialize
); 实现prefunc后执行计划就会变成这样的先在计算节点执行一阶段聚合然后再到master执行第二阶段的聚合效率明显提升。
postgres# explain select RB_ITERATE(rb) from (select rb_and_agg(bitmap) as rb from tbl where tagid in(1,2,3)) t;QUERY PLAN
----------------------------------------------------------------------------------------Result (cost0.07..0.10 rows3 width32)- Aggregate (cost0.07..0.08 rows1 width32)- Gather Motion 3:1 (slice1; segments: 3) (cost0.01..0.06 rows1 width4)- Aggregate (cost0.01..0.01 rows1 width4)- Seq Scan on tbl (cost0.00..0.00 rows1 width32)Filter: tagid ANY ({1,2,3}::integer[])
(6 rows)postgres# explain select rb_and_agg(bitmap) from tbl where tagid in (1,2,3,4,5,6,7,8);QUERY PLAN
----------------------------------------------------------------------------------Aggregate (cost0.07..0.08 rows1 width32)- Gather Motion 3:1 (slice1; segments: 3) (cost0.01..0.06 rows1 width4)- Aggregate (cost0.01..0.01 rows1 width4)- Seq Scan on tbl (cost0.00..0.00 rows1 width32)Filter: tagid ANY ({1,2,3,4,5,6,7,8}::integer[])
(5 rows)小结 gpdb-roaringbitmap是一个很好的插件可以帮助用户高效的实现多组标签的人群圈选。 目前需要实现prefunc来支持多阶段聚合否则只能gather到master去聚合。文中有例子。 参考 《PostgreSQL (varbit, roaring bitmap) VS pilosa(bitmap库)》 《Roaring Bitmap - A better compressed bitset》 《阿里云RDS PostgreSQL varbitx实践 - 流式标签 (阅后即焚流式批量计算) - 万亿级任意标签圈人毫秒响应》 《基于 阿里云 RDS PostgreSQL 打造实时用户画像推荐系统(varbitx)》 《阿里云RDS for PostgreSQL varbitx插件与实时画像应用场景介绍》 《Greenplum 最佳实践 - 估值插件hll的使用(以及hll分式聚合函数优化)》 《PostgreSQL 10 自定义并行计算聚合函数的原理与实践》 《Postgres-XC customized aggregate introduction》 《PostgreSQL aggregate function customize》 https://github.com/RoaringBitmap/CRoaring https://github.com/zeromax007/gpdb-roaringbitmap 《惊天性能单RDS PostgreSQL实例 支撑 2000亿 - 实时标签透视案例》 扫描二维码获取更多消息