电商设计素材网站推荐,百度一下下载,杭州编程培训机构排名,wordpress options简介#xff1a; 本篇介绍子查询、分析表和JOIN的复杂转换过程
一 背景和架构
在《庖丁解牛-图解MySQL 8.0优化器查询解析篇》一文中我们重点介绍了MySQL最新版本8.0.25关于SQL基本元素表、列、函数、聚合、分组、排序等元素的解析、设置和转换过程#xff0c;本篇我们继续…简介 本篇介绍子查询、分析表和JOIN的复杂转换过程
一 背景和架构
在《庖丁解牛-图解MySQL 8.0优化器查询解析篇》一文中我们重点介绍了MySQL最新版本8.0.25关于SQL基本元素表、列、函数、聚合、分组、排序等元素的解析、设置和转换过程本篇我们继续来介绍更为复杂的子查询、分区表和JOIN的复杂转换过程大纲如下
Transformation
remove_redundant_subquery_clause :Permanently remove redundant parts from the query if 1) This is a subquery 2) Not normalizing a view. Removal should take place when a query involving a view is optimized, not when the view is created.remove_base_options:Remove SELECT_DISTINCT options from a query block if can skip distinctresolve_subquery :Resolve predicate involving subquery, perform early unconditional subquery transformationsConvert subquery predicate into semi-join, orMark the subquery for execution using materialization, orPerform IN-EXISTS transformation, orPerform more/less ALL/ANY - MIN/MAX rewriteSubstitute trivial scalar-context subquery with its valuetransform_scalar_subqueries_to_join_with_derived:Transform eligible scalar subqueries to derived tables.flatten_subqueries :Convert semi-join subquery predicates into semi-join join nests. Convert candidate subquery predicates into semi-join join nests. This transformation is performed once in query lifetime and is irreversible.apply_local_transforms :delete_unused_merged_columns : If query block contains one or more merged derived tables/views, walk through lists of columns in select lists and remove unused columns.simplify_joins : Convert all outer joins to inner joins if possible.prune_partitions Perform partition pruning for a given table and condition.push_conditions_to_derived_tables :Pushing conditions down to derived tables must be done after validity checks of grouped queries done by apply_local_transforms();Window::eliminate_unused_objects:Eliminate unused window definitions, redundant sorts etc.
二 详细转换过程
1 解析子查询resolve_subquery
解析条件中带有子查询的语句做一些早期的无限制的子查询转换包括
标记subquery是否变成semi-join
转换判断条件
检查OPTIMIZER_SWITCH_SEMIJOIN和HINT没有限制子查询是IN/ANY和EXIST subquery的谓词子查询是简单查询块而不是UNION子查询无隐形和显性的GROUP BY子查询没有HAVING、WINDOW函数Resolve的阶段是Query_block::RESOLVE_CONDITION和Query_block::RESOLVE_JOIN_NEST并且没有用到最新的Hyper optimizer优化器。外查询块可以支持semijoins至少要一个表而不是类似SELECT 1子查询的策略还没有指定Subquery_strategy::UNSPECIFIED父查询也至少有一个表父查询和子查询都不能有straight join父查询块不禁止semijoinIN谓词返回值是否是确定的不是RAND根据子查询判断结果是否需要转成true还是false以及是否为NULL判断是可以做antijoin还是semijoinAntijoin是可以支持的或者是semijoinoffset和limit对于semjoin是有效的offset是从第一行开始limit也不是0
设置Subquery_strategy::CANDIDATE_FOR_SEMIJOIN并添加sj_candidates
标记subquery是否执行时采用materialization方案如果不符合转换semijoin尝试使用物化方式转换判断条件Optimzier开关subquery_to_derivedon子查询是IN/ANY or EXISTS谓词子查询是简单查询块而不是UNION如果是[NOT] EXISTS必须没有聚合Subquery谓词在WHERE子句目前没有在ON子句实现而且是ANDs or ORs的表达式tree父查询块支持semijoins子查询的策略还没有指定Subquery_strategy::UNSPECIFIED父查询也至少有一个表然后可以做LEFT JOIN父查询块不禁止semijoinIN谓词返回值是否是确定的不是RAND根据子查询判断结果是否需要转成true还是false以及是否为NULL判断是可以做antijoin还是semijoin不支持左边参数不是multi-column子查询WHERE (outer_subq) ROW(derived.col1,derived.col2)该子查询不支持转换为Derived tablem_subquery_to_derived_is_impossible设置Subquery_strategy::CANDIDATE_FOR_DERIVED_TABLE并添加sj_candidates如果上面两个策略无法使用根据类型选择transformerItem_singlerow_subselect::select_transformer对于简单的标量子查询在查询中直接用执行结果代替
select * from t1 where a (select 1); select * from t1 where a 1;Item_in_subselect/Item_allany_subselect::select_transformer-select_in_like_transformer
select_in_like_transformer函数来处理 IN/ALL/ANY/SOME子查询转换transformation处理SELECT 1Item_in_optimizer如果目前还没有子查询的执行方式也就是无法使用semijoin/antijoin执行的子查询会做IN-EXISTS的转换本质是在物化执行和迭代式循环执行中做选择。IN语法代表非相关子查询仅执行一次将查询结果物化成临时表之后需要结果时候就去物化表中查找EXISTS代表对于外表的每一条记录子查询都会执行一次是迭代式循环执行。子查询策略设定为Subquery_strategy::CANDIDATE_FOR_IN2EXISTS_OR_MAT重写single-column的IN/ALL/ANY子查询single_value_transformer
oe $cmp$ (SELECT ie FROM ... WHERE subq_where ... HAVING subq_having)- oe $cmp$ (SELECT MAX(...) ) // handled by Item_singlerow_subselect
- oe $cmp$ \max\(SELECT ...) // handled by Item_maxmin_subselect
failsItem_in_optimizer
- 对于已经是materialized方案不转换
- 通过equi-join转换IN到EXISTS如果是ALL/ANY单值subquery谓词尝试用MIN/MAX子查询转换
SELECT * FROM t1 WHERE a ANY (SELECT a FROM t1); SELECT * FROM t1 WHERE a (SELECT MAX(a) FROM t1)不满足上面调用single_value_in_to_exists_transformer转换IN到EXISTS转换将要将子查询设置为相关子查询设置UNCACHEABLE_DEPENDENT标识如果子查询包含聚合函数、窗口函数、GROUP语法、HAVING语法将判断条件加入到HAVING子句中另外通过ref_or_null_helper来区分NULL和False的结果如需要处理NULL IN (SELECT ...)还需要封装到Item_func_trig_cond触发器中。
SELECT ... FROM t1 WHERE t1.b IN (SELECT expr of SUM(t1.a) FROM t2)SELECT ... FROM t1 WHERE t1.b IN (SELECT expr of SUM(t1.a) FROM t2[trigcond] HAVING t1.bref-to-expr of SUM(t1.a))如果子查询不包含聚合函数、窗口函数、GROUP语法会放在WHERE查询条件中当然如果需要处理NULL情况还是要放入HAVING子句Item_func_trig_condItem_is_not_null_test。
不需要区分NULL和FALSE的子查询:
SELECT 1 FROM ... WHERE (oe $cmp$ ie) AND subq_where
需要区分的子查询:
SELECT 1 FROM ...WHERE subq_where AND trigcond((oe $cmp$ ie) OR (ie IS NULL))HAVING trigcond(is_not_null_test(ie))JOIN::optimize()会计算materialization和EXISTS转换的代价进行选择设置m_subquery_to_derived_is_impossible trueROW值转换通过Item_in_optimizer不支持ALL/ANY/SOMErow_value_transformerItem_in_subselect::row_value_in_to_exists_transformer
for (each left operand)create the equi-join conditionif (is_having_used || !abort_on_null)create the is null and is_not_null_test itemsif (is_having_used)add the equi-join and the null tests to HAVINGelseadd the equi-join and the is null to WHEREadd the is_not_null_test to HAVING没有HAVING表达式
(l1, l2, l3) IN (SELECT v1, v2, v3 ... WHERE where)
EXISTS (SELECT ... WHERE where and(l1 v1 or is null v1) and(l2 v2 or is null v2) and(l3 v3 or is null v3)[ HAVING is_not_null_test(v1) andis_not_null_test(v2) andis_not_null_test(v3)) ] -- 保证不为NULL可以去掉HAVING有HAVING表达式
(l1, l2, l3) IN (SELECT v1, v2, v3 ... HAVING having)
EXISTS (SELECT ... HAVING having and(l1 v1 or is null v1) and(l2 v2 or is null v2) and(l3 v3 or is null v3) andis_not_null_test(v1) andis_not_null_test(v2) andis_not_null_test(v3))2 转换的标量子查询转换成Derived Tabletransform_scalar_subqueries_to_join_with_derived
该特性是官方在8.0.16中为了更好的支持Secondary Engine(Heapwave)的分析下推增强了子查询的转换能力。可以先直观的看下转换和不转换的执行计划的不同
root:test set optimizer_switch subquery_to_derivedoff;
Query OK, 0 rows affected (0.00 sec)
root:test EXPLAIN SELECT b, MAX(a) AS ma FROM t4 GROUP BY b HAVING ma (SELECT MAX(t2.a) FROM t2 WHERE t2.bt4.b);
----------------------------------------------------------------------------------------------------------------------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
----------------------------------------------------------------------------------------------------------------------
| 1 | PRIMARY | t4 | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | Using temporary |
| 2 | DEPENDENT SUBQUERY | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 33.33 | Using where |
----------------------------------------------------------------------------------------------------------------------
2 rows in set, 3 warnings (0.00 sec)
root:test set optimizer_switch subquery_to_derivedon;
Query OK, 0 rows affected (0.00 sec)
root:test EXPLAIN SELECT b, MAX(a) AS ma FROM t4 GROUP BY b HAVING ma (SELECT MAX(t2.a) FROM t2 WHERE t2.bt4.b);
-----------------------------------------------------------------------------------------------------------------------------------------------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 1 | PRIMARY | t4 | NULL | ALL | NULL | NULL | NULL | NULL | 10 | 100.00 | Using temporary |
| 1 | PRIMARY | derived2 | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using where; Using join buffer (hash join) |
| 2 | DERIVED | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 3 | 100.00 | Using temporary |
-----------------------------------------------------------------------------------------------------------------------------------------------
3 rows in set, 3 warnings (0.01 sec)transform_scalar_subqueries_to_join_with_derived具体转换的过程如下首先从JOIN条件、WHERE条件、HAVING条件和SELECT list中收集可以转换的标量子查询Item::collect_scalar_subqueries。遍历这些子查询判断是否可以增加一个额外的转换transform_grouped_to_derived把隐性的GROUP BY标量子查询变成Derived Table。
SELECT SUM(c1), (SELECT SUM(c1) FROM t3) scalar FROM t1;
转换为
SELECT derived0.summ, derived1.scalar
FROM (SELECT SUM(a) AS summ FROM t1) AS derived0LEFT JOIN(SELECT SUM(b) AS scalar FROM t3) AS derived1ON TRUE
执行计划如下
explain SELECT SUM(a), (SELECT SUM(c1) FROM t3) scalar FROM t1;
-----------------------------------------------------------------------------------------------------------------------------------------------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
-----------------------------------------------------------------------------------------------------------------------------------------------
| 1 | PRIMARY | derived3 | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 1 | PRIMARY | derived4 | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where; Using join buffer (hash join) |
| 4 | DERIVED | t3 | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | NULL |
| 3 | DERIVED | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 100.00 | NULL |
-----------------------------------------------------------------------------------------------------------------------------------------------收集唯一的聚合函数Item列表collect_aggregates这些Item将会被新的Derived Table的列代替。还需要添加所有引用到这些Item的fields包括直接在SELECT列表的Window函数参数、ORDER by、Partition by包含的还有该查询块中ORDER BY的列因为他们都会引动到Derived Table里。创建Derived Table需要的Query_expression/Query_blockcreate_query_expr_and_block。添加Derived Table到查询块和top_join_list中。保留旧的子查询单元块如果包含可以转化的Derived的移到Derived Table下面的Query_block如果不包含保留到原来的子查询块中。将之前的聚合函数Item列表插入到Derived Table的查询块中。收集除GROUP AGG表达式中的列由于这些fields已经移动到Derived Table中删除不合理的fields引用。收集所有唯一的列和View的引用后将他们加到新的Derived Table列表中。对新的新的Derived Table进行flatten_subqueries/setup_tables重新resolve_placeholder_tables不处理进行转换后的子查询。处理Derived Table中新加入的HAVING条件中的聚合函数Item并通过Item_aggregate_refs引用到new_derived-base_ref_items而不是之前的父查询块base_ref_items。永久代替父查询块中的聚合函数列表变成Derived Table的列并删除他们。之前保存和加入到Derived Table的唯一的列和View的引用也要替换新的fields代替他们的引用。但目前不支持HAVING表达式中包含该子查询其实也是可以转换的。
SELECT SUM(a), (SELECT SUM(b) FROM t3) scalar
FROM t1
HAVING SUM(a) scalar;
转换为
SELECT derived0.summ, derived1.scalar
FROM (SELECT SUM(a) AS summ FROM t1) AS derived0LEFT JOIN(SELECT SUM(b) AS scalar FROM t3) AS derived1ON TRUE
WHERE derived0.sum derived1.scalar;接下来遍历所有可以转换的子查询把他们转换成derived tables并替换相应的表达式变成列transform_subquery_to_derived。生成derived table的TABLE_LISTsynthesize_derived。将可以移动到derived table的where_cond设置到join_cond上。添加derived table到查询块的表集合中。decorrelate_derived_scalar_subquery_pre添加非相关引用列NCF到SELECT list这些条件被JOIN条件所引用并且还有另外一个fields包含了外查询相关的列我们称之为lifted_where添加COUNT(*)到SELECT list这样转换的查询块可以进行cardinality的检查。比如没有任何聚合函数在子查询中。如果确定包含聚合函数返回一行一定是NCF同时在GROUP BY列表中。添加NCF到子查询的GROUP列表中如果已经在了需要加到最后如果发生GROUP BY的列由于依赖性检查失败还要加Item_func_any_value非聚合列到SELECT list。对于NCF会创建 derived.field和derived.count(field) 。设置物化的一些准备setup_materialized_derived。decorrelate_derived_scalar_subquery_post创建对应的lifted_fields。更新JOIN条件中相关列的引用不在引用外查询而换成Derived table相关的列。代替WHERE、JOIN、HAVING条件和SELECT list中的子查询的表达式变成对应的Derived Table里面列。
下面图解该函数的转换过程和结果 3 扁平化子查询flatten_subqueries
该函数主要是将Semi-join子查询转换为nested JOIN这个过程只有一次并且不可逆。
简单来讲步骤可以简化理解为创建SEMI JOIN (it1 ... itN)语以部分并加入到外层查询块的执行计划中。将子查询的WHERE条件以及JOIN条件加入到父查询的WHERE条件中。将子查询谓词从父查询的判断谓词中消除。由于MySQL在一个query block中能够join的tables数是有限的(MAX_TABLES)不是所有sj_candidates都可以做因此做flatten_subqueries 的因此需要有优先级决定的先后顺序先unnesting掉优先级规则如下相关子查询优先于非相关的inner tables多的子查询大于inner tables少的位置前的子查询大于位置后的
subq_item-sj_convert_priority (((dependent * MAX_TABLES_FOR_SIZE) // dependent subqueries firstchild_query_block-leaf_table_count) *65536) // then with many tables(65536 - subq_no); // then based on position另外由于递归调用flatten_subqueries是bottom-up依次把下层的子查询展开到外层查询块中。for SELECT#1 WHERE X IN (SELECT #2 WHERE Y IN (SELECT#3)) :
Query_block::prepare() (select#1)- fix_fields() on IN condition- Query_block::prepare() on subquery (select#2)- fix_fields() on IN condition- Query_block::prepare() on subquery (select#3)- Query_block::prepare()- fix_fields()- flatten_subqueries: merge #3 in #2- flatten_subqueries- Query_block::prepare()- fix_fields()- flatten_subqueries: merge #2 in #1遍历子查询列表删除Item::clean_up_after_removal标记为Subquery_strategy::DELETED的子查询并且根据优先级规则设置sj_convert_priority。根据优先级进行排序。遍历排序后的子查询列表对于Subquery_strategy::CANDIDATE_FOR_DERIVED_TABLE策略的子查询转换子查询([NOT] {IN, EXISTS})为JOIN的Derived tabletransform_table_subquery_to_join_with_derived
FROM [tables] WHERE ... AND/OR oe IN (SELECT ie FROM it) ...FROM (tables) LEFT JOIN (SELECT DISTINCT ie FROM it) AS derivedON oe derived.ie WHERE ... AND/OR derived.ie IS NOT NULL ...设置策略为Subquery_strategy::DERIVED_TABLEsemijoin子查询不能和antijoin子查询相互嵌套或者外查询表已经超过MAX_TABLE不做转换否则标记为Subquery_strategy::SEMIJOIN策略。判断子查询的WHERE条件是否为常量。如果判断条件永远为FALSE那么子查询结果永远为空。该情况下调用Item::clean_up_after_removal标记为Subquery_strategy::DELETED删除该子查询。如果无法标记为Subquery_strategy::DELETED/设置Subquery_strategy::SEMIJOIN策略的重新标记会Subquery_strategy::UNSPECIFIED继续下一个。替换外层查询的WHERE条件中子查询判断的条件replace_subcondition子查询内条件并不永远为FALSE或者永远为FALSE的情况下需要改写为antijoinantijoin情况下子查询结果永远为空外层查询条件永远通过。此时将条件改为永远为True。子查询永远为FALSE且不是antijoin。那么将外层查询中的条件改成永远为False。Item_subselect::EXISTS_SUBS不支持有聚合操作convert_subquery_to_semijoin函数解析如下模式的SQLIN/ANY谓词如果条件满足解关联解关联decorrelate_condition添加解关联的内表表达式到 SELECT list收集FROM子句中的外表相关的 derived table或join条件去掉关联标识UNCACHEABLE_DEPENDENT更新used tableDerived table子查询增加SELECT_DISTINCT标识转换子查询成为一个derived table并且插入到所属于的查询块FROM后transform_subquery_to_derived创建derived table及其join条件遍历父查询块的WHERE替换该子查询的Item代替成derived tablereplace_subcondition遍历排序后的子查询列表对于Subquery_strategy::CANDIDATE_FOR_SEMIJOIN策略的子查询。判断是否可以转换为semijoin遍历排序后的子查询列表对于Subquery_strategy::SEMIJOIN的子查询开始转换为semijoin/antijoinconvert_subquery_to_semijoinconvert_subquery_to_semijoin函数解析如下模式的SQLIN/ANY谓词
SELECT ...FROM ot1 ... otNWHERE (oe1, ... oeM) IN (SELECT ie1, ..., ieMFROM it1 ... itK[WHERE inner-cond])[AND outer-cond][GROUP BY ...] [HAVING ...] [ORDER BY ...]
SELECT ...FROM (ot1 ... otN) SJ (it1 ... itK)ON (oe1, ... oeM) (ie1, ..., ieM)[AND inner-cond][WHERE outer-cond][GROUP BY ...] [HAVING ...] [ORDER BY ...]EXISTS谓词
SELECT ...FROM ot1 ... otNWHERE EXISTS (SELECT expressionsFROM it1 ... itK[WHERE inner-cond])[AND outer-cond][GROUP BY ...] [HAVING ...] [ORDER BY ...]
SELECT ...FROM (ot1 ... otN) SJ (it1 ... itK)[ON inner-cond][WHERE outer-cond][GROUP BY ...] [HAVING ...] [ORDER BY ...]NOT EXISTS谓词
SELECT ...FROM ot1 ... otNWHERE NOT EXISTS (SELECT expressionsFROM it1 ... itK[WHERE inner-cond])[AND outer-cond][GROUP BY ...] [HAVING ...] [ORDER BY ...]
SELECT ...FROM (ot1 ... otN) AJ (it1 ... itK)[ON inner-cond][WHERE outer-cond AND is-null-cond(it1)][GROUP BY ...] [HAVING ...] [ORDER BY ...]NOT IN谓词
SELECT ...FROM ot1 ... otNWHERE (oe1, ... oeM) NOT IN (SELECT ie1, ..., ieMFROM it1 ... itK[WHERE inner-cond])[AND outer-cond][GROUP BY ...] [HAVING ...] [ORDER BY ...]
SELECT ...FROM (ot1 ... otN) AJ (it1 ... itK)ON (oe1, ... oeM) (ie1, ..., ieM)[AND inner-cond][WHERE outer-cond][GROUP BY ...] [HAVING ...] [ORDER BY ...]查找可以插入semi-join嵌套和其生成的条件的位置比如对于 t1 LEFT JOIN t2, embedding_join_nest为t2t2也可以是nested如t1 LEFT JOIN (t2 JOIN t3))生成一个新的semijoin嵌套的TABLE_LIST表处理Antijoin将子查询中潜在的表合并到上述join表TABLE_LIST::merge_underlying_tables将子查询的叶子表插入到当前查询块的叶子表后面重新设置子查询的叶子表的序号和依赖的外表。将子查询的叶子表重置。如果是outer join的话在join链表中传递可空性propagate_nullability将内层子查询中的关联条件去关联化这些条件被加入到semijoin的列表里。这些条件必须是确定的仅支持简单判断条件或者由简单判断条件组成的AND条件Query_block::decorrelate_condition判断左右条件是否仅依赖于内外层表将其表达式分别加入到semijoin内外表的表达式列表中decorrelate_equality解关联内层查询的join条件Query_block::decorrelate_condition移除该子查询表达式在父查询的ASTQuery_express::exclude_level根据semi-join嵌套产生的WHERE/JOIN条件更新对应的table bitmapQuery_block::fix_tables_after_pullout将子查询的WHERE条件上拉更新使用表的信息Item_cond_and::fix_after_pullout()根据semijoin的条件列表创建AND条件如果有条件为常量True则去除该条件如果常量为False则整个条件都去除Query_block::build_sj_cond将创建出来的semijoin条件加入到外层查询的WHERE条件中最后遍历排序后的子查询列表对于没有转换的子查询对于Subquery_strategy::UNSPECIFIED的策略执行IN-EXISTS改写select_transformer如果确实原有的子查询已经有替代的Item调用replace_subcondition解析并把他们加入到合适的WHERE或者ON子句。清除所有的sj_candidates列表Semi-join有5中执行方式本文并不介绍Optimizer和Execution过程详细可以参考引用文章中关于semijoin的介绍最后引入下控制semijoin优化和执行的优化器开关其中semijoinon/off是总开关。
SELECT optimizer_switch\G
*************************** 1. row ***************************
optimizer_switch: ......materializationon,semijoinon,loosescanon,firstmatchon,subquery_materialization_cost_basedon,......下图举例说明该转换过程
SELECT * FROM t1 WHERE t1.a in (SELECT t2.c1 FROM t2 where t2.c1 0);/* select#1 */
SELECT t1.a AS a
FROM t1
SEMI JOIN (t2)
WHERE ((t1.a t2.c1) and (t2.c1 0))
执行计划如下
explain SELECT * FROM t1 WHERE t1.a in (SELECT t2.c1 FROM t2 where t2.c1 0);
---------------------------------------------------------------------------------------------------------------------------------------------------------
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
---------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 | SIMPLE | t2 | NULL | ALL | NULL | NULL | NULL | NULL | 1 | 100.00 | Using where; Start temporary |
| 1 | SIMPLE | t1 | NULL | ALL | NULL | NULL | NULL | NULL | 2 | 50.00 | Using where; End temporary; Using join buffer (hash join) |
---------------------------------------------------------------------------------------------------------------------------------------------------------4 应用当前查询块转换apply_local_transforms
该函数在flattern subqueries之后bottom-up调用主要分几个步骤
删除无用列delete_unused_merged_columns
如果查询块已经删除了一些derived tables/views遍历SELECT列表的列删除不必要的列
简化JOINsimplify_joins
该函数会把Query_block中的top_join_list的嵌套join的简化为扁平化的join list。嵌套连接包括table1 join table2也包含table1, (table2, table3)这种形式。如果所示的简化过程 分区表的静态剪枝prune_partitions
由于剪枝根据HASH/RANGE/LIST及二级分区都有不同这里简单介绍下剪枝过程现有prune_partitions是在prepare和optimize阶段会被调用某些常量子查询被评估执行完。
struct TABLE {...... partition_info *part_info{nullptr}; /* Partition related information *//* If true, all partitions have been pruned away */bool all_partitions_pruned_away{false};......
}SQL tranformation phase
SELECT_LEX::apply_local_transforms
-- prune_partitions
for example, select * from employee where company_id 1000 ;
SQL optimizer phase
JOIN::prune_table_partitions
-- prune_partitions
------ based on tbl-join_cond_optim() or JOIN::where_cond
for example, explain select * from employee where company_id (select c1 from t1);举例下面RANGE剪枝的过程
root:ref CREATE TABLE R2 (- a INT,- d INT- ) PARTITION BY RANGE(a) (- PARTITION p20 VALUES LESS THAN (20),- PARTITION p40 VALUES LESS THAN (40),- PARTITION p60 VALUES LESS THAN (60),- PARTITION p80 VALUES LESS THAN (80),- PARTITION p100 VALUES LESS THAN MAXVALUE- );
Query OK, 0 rows affected (0.09 sec)
root:ref Select * From R2 where a 40 and a 80;剪枝详细过程如下由于剪枝需要根据不同条件产生的pruning结果进行交集因此剪枝过程中需要使用read_partitions这样的bitmap来保存是否使用该对应分区。另外剪枝过程类似迭代判断因此引入了part_iterator来保存开始、结束和当前以及对应需要获取区间范围的endpoint函数和获取下一个值next的迭代器函数。这里巧妙的运用了指针来兼容不同分区类型Hash/Range/List类型如下图所示获取join_cond或者m_where_cond的SEL_TREE红黑树get_mm_tree)调用find_used_partitions来获取满足的分区对于SEL_TREE的每个区间interval1. 获取区间的左右端点 2.从左边继续获取下一个满足的分区直到到右边端点结束每次调用完满足条件的分区需要使用bitmap_set_bit设置该分区在part_info-read_partitions上的位点。find_used_partitions是根据SEL_TREE的结构进行递归如图从左到右遍历next_key_partand condition然后再遍历SEL_TREE的左右也就是上下方向or condition深度递归。(start)| $| Partitioning keyparts $ subpartitioning keyparts| $| ... ... $| | | $| --------- --------- $ ----------- -----------\-| par1c1 |--| par2c2 |-----| subpar1c3|--| subpar2c5|--------- --------- $ ----------- -----------| $ | || $ | -----------| $ | | subpar2c6|| $ | -----------| $ || $ ----------- -----------| $ | subpar1c4|--| subpar2c8|| $ ----------- -----------| $| $--------- $ ------------ ------------| par1c2 |------------------| subpar1c10|--| subpar2c12|--------- $ ------------ ------------| $... $
例如第一行par1c1 and par2c2 and subpar1c3 and subpar2c5的遍历的stack将是
in find_used_partitions(key_tree subpar2c5) (***)
in find_used_partitions(key_tree subpar1c3)
in find_used_partitions(key_tree par2c2) (**)
in find_used_partitions(key_tree par1c1)
in prune_partitions(...)
然后是继续下面的条件以此类推
orpar1c1 and par2c2 and subpar1c3 and subpar2c6
orpar1c1 and par2c2 and subpar1c4 and subpar2c8
orpar1c2 and subpar1c10 and subpar2c12下图来展示了pruning的结构和过程5 下推条件到Derived Tablepush_conditions_to_derived_tables
该函数将条件下推到derived tables详细见WL#8084 - Condition pushdown to materialized derived table。
root:test set optimizer_switch derived_mergeoff; // 关闭dervied_merge 测试下推能力
Query OK, 0 rows affected (0.00 sec)
root:test EXPLAIN FORMATtree SELECT * FROM (SELECT c1,c2 FROM t1) as dt WHERE c1 10;
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| EXPLAIN |
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| - Table scan on dt (cost2.51..2.51 rows1)- Materialize (cost2.96..2.96 rows1)- Filter: (t1.c1 10) (cost0.35 rows1)- Table scan on t1 (cost0.35 rows1)|
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------过程如下
遍历derived table列表判断是否可以下推can_push_condition_to_derived如果包括下面的情况则不能下推Derived table有UNIONDerived table有LIMITDerived table不能是outer join中的内表会导致更多NULL补偿的行不能是CTE包含的Derived table创建可以下推到的Derived table的where condCondition_pushdown::make_cond_for_derived保留剩余不能下推的条件Condition_pushdown::get_remainder_condTop-down递归调用push_conditions_to_derived_tables
详细图解该过程如下 三 综述
两篇文章重点介绍了下优化器的基于规则的优化部分并没有涉及更多的基于代价的优化可以看到对于直接运用规则优化带来执行的加速那么可以直接转换尤其是对于查询结构上面的变化类转换如merge_derived。对于运用规则优化无法判断是否带来执行的加速那么优化器会保留一些临时结构为后续的代价估算提供更多选择如IN/EXIST/Materialized转换。当然还有一些又改变查询结构又无法判定是否规则转换带来的执行加速MySQL目前还不支持。文章虽然详尽但无法覆盖全部情况也是为了抛砖引玉还需要读者自己通过调试的方法更进一步了解某一类SQL的具体过程。
四 参考资料
《MySQL 8.0 Server层最新架构详解》
《WL#13520: Transform correlated scalar subqueries》
《WL#8084 - Condition pushdown to materialized derived table》
《WL#2980: Subquery optimization: Semijoin》
WL#3740: Subquery optimization: Semijoin: Pull-out of inner tablesWL#3741: Subquery optimization: Semijoin: Duplicate elimination strategyWL#3750: Subquery optimization: Semijoin: First-match strategyWL#3751: Subquery optimization: Semijoin: Inside-out strategy
《WL#4389: Subquery optimizations: Make IN optimizations also handle EXISTS》
《WL#4245: Subquery optimization: Transform NOT EXISTS and NOT IN to anti-join》
《WL#2985: Perform Partition Pruning of Range conditions》 《MySQL · 源码分析 · Semi-join优化执行代码分析》 《MySQL·源码分析·子查询优化源码分析》《Optimizing Subqueries, Derived Tables, View References, and Common Table Expressions》
原文链接 本文为阿里云原创内容未经允许不得转载。