网站备案与域名备案,口碑营销概念,建个商城网站多少钱,重庆网站关键词优化推广区别于MergeTree表引擎#xff0c;ReplacingMergeTree删除重复数据时是通过相同的分区值#xff08;ORDER BY的值#xff09;
数据去重发生在后台合并数据时#xff0c;后台合并数据是随机的#xff0c;所以有时会有一些没处理的数据#xff0c;可以通过OPTIMIZI来手动合…区别于MergeTree表引擎ReplacingMergeTree删除重复数据时是通过相同的分区值ORDER BY的值
数据去重发生在后台合并数据时后台合并数据是随机的所以有时会有一些没处理的数据可以通过OPTIMIZI来手动合并官方建议不要指望它因为OPTIMIZE会读写大量的数据可能是会从头再合并一的原因吧
所以ReplacingMergeTre适用于后台去重数据来节省空间的场景但不保证没有一个重复的官方说的不是我说的 建一个表
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],...
) ENGINE ReplacingMergeTree([ver [, is_deleted]])
[PARTITION BY expr]
[ORDER BY expr]
[PRIMARY KEY expr]
[SAMPLE BY expr]
[SETTINGS namevalue, clean_deleted_rowsvalue, ...]
建表参数描述
ver
可选填入类型UInt*, Date, DateTime or DateTime64
这个字段的作用是在合并时决定要留下哪一个
原则一选最新的那个ver没设置时替换为最新插入的那一行
原则二选最大的那个ver设置时选择设置值中最大的那一行
例子
-- without ver - the last inserted wins
CREATE TABLE myFirstReplacingMT
(key Int64,someCol String,eventTime DateTime
)
ENGINE ReplacingMergeTree
ORDER BY key;
INSERT INTO myFirstReplacingMT Values (1, first, 2020-01-01 01:01:01);
INSERT INTO myFirstReplacingMT Values (1, second, 2020-01-01 00:00:00);
SELECT * FROM myFirstReplacingMT FINAL;
┌─key─┬─someCol─┬───────────eventTime─┐
│ 1 │ second │ 2020-01-01 00:00:00 │
└─────┴─────────┴─────────────────────┘
-- with ver - the row with the biggest ver wins
CREATE TABLE mySecondReplacingMT
(key Int64,someCol String,eventTime DateTime
)
ENGINE ReplacingMergeTree(eventTime)
ORDER BY key;
INSERT INTO mySecondReplacingMT Values (1, first, 2020-01-01 01:01:01);
INSERT INTO mySecondReplacingMT Values (1, second, 2020-01-01 00:00:00);
SELECT * FROM mySecondReplacingMT FINAL;
┌─key─┬─someCol─┬───────────eventTime─┐
│ 1 │ first │ 2020-01-01 01:01:01 │
└─────┴─────────┴─────────────────────┘ is_deleted
ver设置后才能设置is_deleted用来标记这行数据是否删除1代表删除deleted0代表存在state
想真正删除数据 执行OPTIMIZE ... FINAL CLEANUP 或OPTIMIZE ... FINAL 或者表引擎配置 clean_deleted_rows 设置为 Always.
例子
-- with ver and is_deleted
CREATE OR REPLACE TABLE myThirdReplacingMT
(key Int64,someCol String,eventTime DateTime,is_deleted UInt8
)
ENGINE ReplacingMergeTree(eventTime, is_deleted)
ORDER BY key;
INSERT INTO myThirdReplacingMT Values (1, first, 2020-01-01 01:01:01, 0);
INSERT INTO myThirdReplacingMT Values (1, first, 2020-01-01 01:01:01, 1);
select * from myThirdReplacingMT final;
0 rows in set. Elapsed: 0.003 sec.
-- 删除is_deleted标记为1的行
OPTIMIZE TABLE myThirdReplacingMT FINAL CLEANUP;
INSERT INTO myThirdReplacingMT Values (1, first, 2020-01-01 00:00:00, 0);
select * from myThirdReplacingMT final;
┌─key─┬─someCol─┬───────────eventTime─┬─is_deleted─┐
│ 1 │ first │ 2020-01-01 00:00:00 │ 0 │
└─────┴─────────┴─────────────────────┴────────────┘