Monday, March 19, 2012

Index Condition Pushdown to the rescue!

A while ago, I explained how range access in a multiple-part index works and why MySQL can't utilize key parts beyond the first occurrence of some often used comparison operators. Luckily, there is a great improvement underway in MySQL 5.6 that will remedy much of this limitation. Meet Index Condition Pushdown.

How does ICP work?

Index Condition Pushdown is a new way for MySQL to evaluate conditions. Instead of evaluating conditions on rows read from a table, ICP makes it possible to evaluate conditions in the index and thereby avoid looking at the table if the condition is false.

Let's assume that we have a multiple-part index covering columns (keypart_1, ..., keypart_n). Further assume that we have a condition with a comparison operator on keypart_1 that does not allow keypart_2,...,keypart_n to be used as a range condition (more on that here).

When MySQL does range access without ICP enabled, the index is traversed to locate rows in the table that are within the range(s) relevant for the conditions on keypart_1. The rows in these ranges are read from the table and returned from the storage engine to the MySQL server. The server then evaluates the remaining conditions, including those that apply to keypart_2,..,keypart_n.

With ICP enabled, the same ranges of the index are traversed. However, instead of immediately reading and returning rows from the table, all conditions that apply to any of the indexed key parts are evaluated in the index. The rows in the table are read only if all the conditions that were pushed to the index evaluate to true. Potentially, that's a lot of disk access saved.

In the illustration we have an index consisting of two keyparts. The query contains a "column < value" condition on the first keypart and some other condition on the second keypart. Without ICP, the storage engine will read all rows in the qualifying range for the first keypart (red and black arrows). With ICP enabled, only the ones with a black arrow will be read. The query result is the same, of course. This is only a question of performance.

Example: Orders with low value
A slightly modified version of the orders table presented here is used in the example. It contains 1M rows:

CREATE TABLE orders (
    order_id INT NOT NULL PRIMARY KEY,
    /* come columns */
    customer_id INT,
    product_id INT,
    no_products INT,    /* Number of products in the order */
    value INT,
    order_date DATE,
    KEY noprod_val (no_products, value),
    KEY custid_prodid (customer_id, product_id),
);




mysql> set optimizer_switch='index_condition_pushdown=off';
Query OK, 0 rows affected (0.00 sec)

mysql> explain select customer_id, value from orders where no_products < 15 and value <= 7000;
+----+-------------+--------+-------+------------+---------+------+--------+-------------+
| id | select_type | table  | type  | key        | key_len | ref  | rows   | Extra       |
+----+-------------+--------+-------+------------+---------+------+--------+-------------+
|  1 | SIMPLE      | orders | range | noprod_val | 5       | NULL | 161092 | Using where |
+----+-------------+--------+-------+------------+---------+------+--------+-------------+
1 row in set (0.00 sec)

mysql> select customer_id, value from orders where no_products < 15 and value <= 7000;
...
2624 rows in set (0.97 sec)

As you can see from the explain, range access is used for this query. Unfortunately, as shown by key_len=5, only the first keypart can be used by range access. There are about 160K rows in this range and 2624 that match the whole condition. Now let's try that with ICP enabled:

mysql> set optimizer_switch='index_condition_pushdown=on';
Query OK, 0 rows affected (0.00 sec)

mysql> explain select customer_id, value from orders where no_products < 15 and value <= 7000;
+----+-------------+--------+-------+------------+---------+------+--------+-----------------------+
| id | select_type | table  | type  | key        | key_len | ref  | rows   | Extra                 |
+----+-------------+--------+-------+------------+---------+------+--------+-----------------------+
|  1 | SIMPLE      | orders | range | noprod_val | 5       | NULL | 161092 | Using index condition |
+----+-------------+--------+-------+------------+---------+------+--------+-----------------------+
1 row in set (0.00 sec)

mysql> select customer_id, value from orders where no_products < 15 and value <= 7000;
...
2624 rows in set (0.20 sec)

Success! We got a five time improvement to response time.

Note that just like with ICP disabled, only the first keypart can used by range access, but this time a lot of table reads are avoided because value <= 7000 is evaluated in the index.

If rearranging the column order in the index would allow range access to use more keyparts, that would be even better since it would reduce the number of index entries to read as well. But when this is not possible, ICP is an excellent second best. The expected improvement in response time depends on how many table reads can be avoided, i.e. the selectivity of the conditions on keyparts not used by range access.

Further reading
MySQL 5.6: Index Condition Pushdown, blog, by Olav Sandstå.
The MySQL 5.6 documentation.

2 comments:

  1. This doesn't seem to work if the combined index is a Primary Key. Is there a reason that this doesn't work?

    ReplyDelete
    Replies
    1. Hi S Groot,

      Sorry for the late reply. I noticed that you have now created BUG#69846 for this issue and got an explanation from Olav in the bug report. I'm quoting Olav's reply for completeness:

      "The main goal of ICP is to reduce the number of full records that need to be read and thus potentially save on IO operations. For the case where the index is the InnoDB clustered index, the complete record is already read into the InnoDB buffer. In this case we will not save any IO by using ICP. For cases where the index condition will filter out a major part of records, there could have been some saving in CPU usage. For cases where the index condition does not filter out any records or only a small fraction of the records, there might even be a small overhead by pushing the index condition to the storage engine and evaluate it there. Thus, the current ICP implementation is conservative and only uses ICP for the case where there is most to gain and least likely to introduce CPU overhead."

      Delete