Index merging, if you can’t use it, don’t use it!

In the previous article, Brother Song and his friends shared the data structure of the InnoDB storage engine in MySQL. Friends know that when we use indexes to search, every search is in a certain Searching in a B+Tree may also involve table backing if a secondary index is used.

So now the question is, if our search conditions contain two fields, and these two fields have independent indexes, how will MySQL handle it? Today we will discuss this topic.

1. Problem recurrence

In order to make it easier for friends to understand, I will first repeat my question through SQL.

The test data I used is the test data provided by the MySQL official website. The relevant introduction documents are at:

The corresponding database script is at:

Friends can download this database script and import it into their own database.

In the official case, there is a table like this:

CREATE  TABLE `film_actor` (
  `actor_id` smallint unsigned NOT  NULL ,
  `film_id` smallint unsigned NOT  NULL ,
   PRIMARY KEY (`actor_id`,`film_id`),
  KEY `idx_fk_film_id` (`film_id`)

There are two indexes in this table, one of which is the primary key index, which is a joint index, and the other is an ordinary index based on film_id. Now suppose I have the following SQL that needs to be executed:

select  *  from film_actor where film_id = 1  or actor_id = 1 ;

So the question is, will this query use the index?

If you want to know whether the index is used, you can use the explain keyword to check:

explain select  *  from film_actor where film_id = 1  or actor_id = 1 ;

The execution results are as follows:

Friends, you can see that at this time , there are two indexes for , and 中, and typethe index_mergevalue possible_keysof中 is .keyExtraUsing union(idx_fk_film_id,PRIMARY); Using where

It seems that an index is used, but how is it used specifically? How to interpret this execution plan?

This is actually an index merge. Next, let’s take a look at what index merge is.

2. Index merge

index_merge means index merging. When there are multiple indexes in the search conditions in the same table, MySQL will scan these indexes separately and then merge the scan results. There are three situations of merging:

  1. Compute unions of respective scan results.
  2. Find intersections of respective scan results.
  3. A combination of the first two.

The official documentation gives four examples where index merging may be used:

SELECT  *  FROM tbl_name WHERE key1 =  10  OR key2 =  20 ;

SELECT  *  FROM tbl_name
   WHERE (key1 =  10  OR key2 =  20 ) AND non_key =  30 ;

SELECT  *  FROM t1, t2
   WHERE (t1.key1 IN ( 1 , 2 ) OR t1.key2 LIKE  'value%' )
   AND t2.key1 = t1.some_col;

SELECT  *  FROM t1, t2
   WHERE t1.key1 =  1 
  AND (t2.key1 = t1.some_col OR t2.key2 = t1.some_col2);

Sometimes, the SQL we write can obviously be merged, but the system does not. At this time, we make some adjustments to the query conditions, for example:

  • (x AND y) OR z => (x OR z) AND (y OR z)
  • (x OR y) AND z => (x AND z) OR (y AND z)

Also note that index merging does not work with full-text indexes.

In the explain execution plan, if index merging is used, the value of the Extra field is generally divided into three situations, namely:

  • Using intersect(…)
  • Using union(…)
  • Using sort_union(…)

The above case belongs to the second situation.

Then let’s talk about these three situations with our friends.

2.1 Using intersect(…)

This is to find the intersection of multiple scan results.

It is not triggered as long as multiple indexes are involved and AND is involved Using intersect. There are two conditions:

  1. If it is a secondary index, it must be an equivalent query. If the secondary index is a composite index, each column of the composite index must be covered, not just some of the columns.
  2. Primary key indexes can be range queries.

Let’s take a look at an official example, as follows:

key_part1 = const1 AND key_part2 = const2 ... AND key_partN = constN

key_part1-That key_partNis, all columns in the composite index (must be all columns).

For point 2, if the primary key index is involved, the primary key index can be a range query, such as the following (but the secondary index can still only be an equivalent query):

SELECT  *  FROM innodb_table WHERE primary_key <  10  AND key_col1 =  20 ;

If it is a composite index and a normal index, then the composite index must cover all columns and the composite index and the normal index must match equal values, for example, as follows:

SELECT  *  FROM tbl_name WHERE key1_part1 =  1  AND key1_part2 =  2  AND key2 =  2 ;

key1_part1and key1_part2respectively represent the first and second columns of the same composite index (two columns in total). At this time, they are used as query conditions together with key2, and index merging may also be used.

In the above cases, the intersection is obtained after the respective searches are completed.

Let’s take a simple example. This is MySQL’s official test data. There is an actor table in the sakila library. The table structure is as follows:

CREATE  TABLE `actor` (
  `actor_id` smallint unsigned NOT  NULL AUTO_INCREMENT,
  `first_name` varchar ( 45 ) NOT  NULL ,
  `last_name` varchar ( 45 ) NOT  NULL ,
   PRIMARY KEY (`actor_id`),
  KEY `idx_actor_last_name` (`last_name`)

As you can see, there is a primary key and a normal index. I execute the following SQL:

select  *  from actor where actor_id < 10  and last_name = 'WAHLBERG'

The execution plan is as follows:

As you can see, index merging is used, and yes Using intersect.

2.2 Using union(…)

Finding the union is more similar to finding the intersection, that is, AND becomes OR.

When the secondary index is an equivalent query or a composite index, each column of the composite index must be covered, not just some columns. For example, the following query conditions:

key_part1 = const1 OR key_part2 = const2 ... OR key_partN = constN

key_part1~key_partN are different columns of the same composite index. At the same time, there are only these N fields in the composite index, which will be used in this situation Using union.

Primary key range queries on InnoBD tables may also trigger Using union.

In line with the situation in Section 2.1, it may also be triggered after replacing AND with OR Using union.

There is no need to give this example, it is mentioned at the beginning of the article.

2.3 Using sort_union(…)

Obviously, the conditions in Section 2.2 are relatively strict. The secondary index must be an equivalent query to be triggered Using union. In our daily use, range queries are also very common, so again Using sort_union, this requirement is looser:

  • Secondary indexes can also match by range
  • Composite indexes do not need to cover all columns

For example, consider the following SQL:

SELECT  *  FROM tbl_name
   WHERE key_col1 <  10  OR key_col2 <  20 ;

SELECT  *  FROM tbl_name
   WHERE (key_col1 >  10  OR key_col2 =  20 ) AND nonkey_col =  30 ;

Secondary index range search may also be triggered Using sort_union.

2.4 Principle of index merging

In Sections 2.1 and 2.2, intersection and union are found respectively. In order to facilitate intersect and union operations, when scanning each individual index, an ordered collection of primary key values ​​must be obtained, and each index must obtain It will be more convenient to find the ordered primary keys and then find the intersection or union.

Therefore, in sections 2.1 and 2.2, the primary key index can be range searched, because the primary key index itself is ordered; the secondary index has many restrictions, and the ultimate purpose of these restrictions is to achieve the final primary key. Key values ​​are ordered.

For example:

  • The secondary index must match with equal values. Equivalent matching means that the primary key value on the leaf of the B+Tree finally obtained is unique; if the secondary index can be searched according to the range, then the final B+Tree value from the secondary index The primary key values ​​obtained from the leaf nodes are not ordered.
  • Similarly, the composite index must cover all columns for a similar reason, because if it does not cover all columns, it means that the primary key values ​​finally obtained are also out of order.

Section 2.3 allows the secondary index to search according to the range. This is because in Using sort_union, the obtained primary key values ​​will be sorted first, and then the intersection or union will be found. Of course, compared to sections 2.1 and 2.2, section 2.3 The performance will also be reduced somewhat.

3. Issues with index merging

Index merging seems to improve the performance of MySQL search. However, index merging generally occurs because the index creation is unreasonable. We need to re-examine our indexes.

As mentioned in Section 2.3 above, this method requires caching temporary data during the query process and sorting before finding the intersection or union. These operations will consume most of the CPU and memory resources. And these consumption will not be calculated into the query cost, because the MySQL optimizer only cares about the reading of random pages and does not care about the additional calculation issues involved here. Therefore, in some extreme cases, the performance of index merging Probably not as good as a full table scan.

Therefore, sometimes if we are sure that we do not need index merging, we can ignore some indexes by ignoring index, as follows (compare to the screenshot in Section 2.1):

You can also turn off the index merging function through optimizer_switch, as follows:

Okay, let’s talk about index merging with my friends so much~ Friends who are interested can also try it!