引起索引失效
简介
索引的作用,加速检索,排序,分组。
优点: 检索
缺点: 新增,更新时需要维护索引,占磁盘空间,创建时锁表。
维护: 根据统计表发生全表扫描次数,索引使用次数。合理添加删除索引。
索引失效的场景
如果where过滤条件设置不合理,即使索引存在,且where过滤条件中包含索引列,也会导致全表扫描,索引不起作用。什么条件下会导致索引失效呢?
1.任何计算、函数、类型转换
2.!=
3.NOT,相当于使用函数
4.模糊查询通配符在开头
5.索引字段在表中占比较高
6.多字段btree索引查询条件不包含第一列
7.多字段索引查询条件使用OR(有时也会走索引扫描,但查询效率不高)
8.表中数据量太少时
实例
测试表
创建表
postgres#=create table tbl_index(a bigint,b timestamp without time zone ,c varchar(12));
插入1kw数据,打开计时器 对比创建索引对数据插入的影响。
postgres=# \timing
Timing is on.
postgres=# insert into tbl_index select generate_series(1,10000000),clock_timestamp()::timestamp without time zone,'zhang';
INSERT 0 10000000
Time: 25004.214 ms (00:25.004)
postgres=# create index tbl_index_a ON tbl_index using btree (a);
CREATE INDEX
Time: 4119.733 ms (00:04.120)
postgres=# create index tbl_index_b ON tbl_index using btree (b);
CREATE INDEX
Time: 6229.857 ms (00:06.230)
postgres=# insert into tbl_index select generate_series(1,10000000),clock_timestamp()::timestamp without time zone,'eamon';
INSERT 0 10000000
Time: 153963.850 ms (02:33.964)
tips
大量数据导入时建议先导入数据后创建索引
更新频繁的字段不建议建索引,如update_time
1.任何计算、函数、类型转换
#索引检索
postgres=# explain analyze select * from tbl_index where a = 100;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Index Scan using tbl_index_a on tbl_index (cost=0.44..19.78 rows=4 width=22) (actual time=0.012..0.015 rows=2 loops=1)
Index Cond: (a = 100)
Planning time: 0.053 ms
Execution time: 0.027 ms
(4 rows)
#计算
postgres=# explain analyze select * from tbl_index where a +1 = 100;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..263387.92 rows=99999 width=22) (actual time=0.495..478.375 rows=2 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tbl_index (cost=0.00..252388.02 rows=41666 width=22) (actual time=220.138..465.801 rows=1 loops=3)
Filter: ((a + 1) = 100)
Rows Removed by Filter: 6666666
Planning time: 0.107 ms
Execution time: 478.411 ms
(8 rows)
#解决方法
create index tbl_index_a_1 on tbl_index using btree ((a+1));
#类型转换
postgres=# explain analyze select * from tbl_index where a::varchar = '100';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..284221.09 rows=99999 width=22) (actual time=0.529..724.035 rows=2 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tbl_index (cost=0.00..273221.19 rows=41666 width=22) (actual time=355.491..713.103 rows=1 loops=3)
Filter: (((a)::character varying)::text = '100'::text)
Rows Removed by Filter: 6666666
Planning time: 0.168 ms
Execution time: 724.075 ms
(8 rows)
postgres=# explain analyze select * from tbl_index where b::date = '2020-04-15';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..263387.92 rows=99999 width=22) (actual time=0.505..4764.531 rows=20000000 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tbl_index (cost=0.00..252388.02 rows=41666 width=22) (actual time=0.021..789.147 rows=6666667 loops=3)
Filter: ((b)::date = '2020-04-15'::date)
Planning time: 0.146 ms
Execution time: 5182.919 ms
(7 rows)
Time: 5184.795 ms (00:05.185)
#解决方法
create index tbl_index_a2str on tbl_index using btree ((a::varchar));
create index tbl_index_b2date on tbl_index using btree ((b::date));
select * from tbl_index where b < '2020-04-16 00:00:00' and b >= '2020-04-15 00:00:00';
2.!=
postgres=# explain analyze select * from tbl_index where a != 100;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Seq Scan on tbl_index (cost=0.00..377387.04 rows=19999839 width=22) (actual time=0.020..1384.726 rows=19999998 loops=1)
Filter: (a <> 100)
Rows Removed by Filter: 2
Planning time: 0.132 ms
Execution time: 1790.685 ms
(5 rows)
Time: 1792.412 ms (00:01.792)
3. NOT
postgres=# explain analyze select * from tbl_index where a is null;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Index Scan using tbl_index_a on tbl_index (cost=0.44..8.21 rows=1 width=22) (actual time=0.022..0.022 rows=0 loops=1)
Index Cond: (a IS NULL)
Planning time: 0.123 ms
Execution time: 0.055 ms
(4 rows)
Time: 1.694 ms
postgres=# explain analyze select * from tbl_index where a is not null;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Seq Scan on tbl_index (cost=0.00..327389.00 rows=20000000 width=22) (actual time=0.025..1375.125 rows=20000000 loops=1)
Filter: (a IS NOT NULL)
Planning time: 0.121 ms
Execution time: 1789.101 ms
(4 rows)
Time: 1790.625 ms (00:01.791)
4.模糊查询通配符在开头
5.索引字段在表中占比较高
postgres=# create index tbl_index_c on tbl_index using btree (c);
CREATE INDEX
Time: 10552.062 ms (00:10.552)
postgres=# analyze tbl_index ;
ANALYZE
Time: 118.018 ms
postgres=# explain analyze select * from tbl_index where c = 'zhang';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Seq Scan on tbl_index (cost=0.00..377389.90 rows=10034703 width=22) (actual time=0.021..1409.096 rows=10000000 loops=1)
Filter: ((c)::text = 'zhang'::text)
Rows Removed by Filter: 10000000
Planning time: 0.562 ms
Execution time: 1612.769 ms
(5 rows)
Time: 1615.034 ms (00:01.615)
postgres=# explain analyze select * from tbl_index where c = 'eamon';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Seq Scan on tbl_index (cost=0.00..377389.90 rows=9965369 width=22) (actual time=616.952..1404.717 rows=10000000 loops=1)
Filter: ((c)::text = 'eamon'::text)
Rows Removed by Filter: 10000000
Planning time: 0.066 ms
Execution time: 1607.572 ms
(5 rows)
Time: 1609.019 ms (00:01.609)
postgres=# insert into tbl_index values(1,clock_timestamp()::timestamp without time zone,'zhangeamon');
INSERT 0 1
Time: 2.598 ms
postgres=# explain analyze select * from tbl_index where c = 'zhangeamon';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Index Scan using tbl_index_c on tbl_index (cost=0.44..7.46 rows=1 width=22) (actual time=0.086..96.848 rows=1 loops=1)
Index Cond: ((c)::text = 'zhangeamon'::text)
Planning time: 0.157 ms
Execution time: 96.881 ms
(4 rows)
Time: 98.464 ms
6.多字段btree索引查询条件不包含第一列
#创建表
postgres=# create table tbl_indexes(a int ,b varchar,c varchar);
CREATE TABLE
Time: 6.942 ms
#插入数据
postgres=# insert into tbl_indexes select generate_series(1,5000000),substring(md5(random()::text),0,6),substring(md5(random()::text),0,6);
INSERT 0 5000000
Time: 15003.647 ms (00:15.004)
#创建多值索引
postgres=# create index tbl_indexes_a_b_c on tbl_indexes using btree (a,b,c);
CREATE INDEX
Time: 2207.480 ms (00:02.207)
postgres=# select * from tbl_indexes limit 10;
a | b | c
----+-------+-------
1 | 0e7fb | d6370
2 | e2eb1 | 51d3e
3 | 93521 | 5f6b6
4 | 5880d | 23527
5 | 66f8e | f462f
6 | 6ceb3 | c9beb
7 | 18d44 | 11d64
8 | f76a4 | edd04
9 | 91975 | 4c79d
10 | 56f26 | 09e16
(10 rows)
#走索引
postgres=# explain analyze select * from tbl_indexes where a = 10;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl_indexes (cost=582.18..28488.04 rows=25000 width=68) (actual time=0.030..0.031 rows=1 loops=1)
Recheck Cond: (a = 10)
Heap Blocks: exact=1
-> Bitmap Index Scan on tbl_indexes_a_b_c (cost=0.00..575.93 rows=25000 width=0) (actual time=0.022..0.022 rows=1 loops=1)
Index Cond: (a = 10)
Planning time: 0.127 ms
Execution time: 0.075 ms
(7 rows)
Time: 2.878 ms
postgres=# explain analyze select * from tbl_indexes where a = 10 and b = '91975';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl_indexes (cost=5.71..482.09 rows=125 width=68) (actual time=0.025..0.025 rows=0 loops=1)
Recheck Cond: ((a = 10) AND ((b)::text = '91975'::text))
-> Bitmap Index Scan on tbl_indexes_a_b_c (cost=0.00..5.68 rows=125 width=0) (actual time=0.022..0.022 rows=0 loops=1)
Index Cond: ((a = 10) AND ((b)::text = '91975'::text))
Planning time: 0.146 ms
Execution time: 0.074 ms
(6 rows)
Time: 2.886 ms
postgres=# explain analyze select * from tbl_indexes where a = 10 and b = '91975' and c = '4c79d';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using tbl_indexes_a_b_c on tbl_indexes (cost=0.43..8.45 rows=1 width=68) (actual time=0.025..0.025 rows=0 loops=1)
Index Cond: ((a = 10) AND (b = '91975'::text) AND (c = '4c79d'::text))
Heap Fetches: 0
Planning time: 0.158 ms
Execution time: 0.067 ms
(5 rows)
Time: 3.136 ms
postgres=# explain analyze select * from tbl_indexes where a = 10 and c = '4c79d';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl_indexes (cost=638.46..1114.84 rows=125 width=68) (actual time=0.027..0.027 rows=0 loops=1)
Recheck Cond: ((a = 10) AND ((c)::text = '4c79d'::text))
-> Bitmap Index Scan on tbl_indexes_a_b_c (cost=0.00..638.43 rows=125 width=0) (actual time=0.023..0.024 rows=0 loops=1)
Index Cond: ((a = 10) AND ((c)::text = '4c79d'::text))
Planning time: 0.178 ms
Execution time: 0.077 ms
(6 rows)
Time: 2.917 ms
#不走索引
postgres=# explain analyze select * from tbl_indexes where b = '91975' and c = '4c79d';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..59290.50 rows=125 width=68) (actual time=0.577..127.956 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tbl_indexes (cost=0.00..58278.00 rows=52 width=68) (actual time=78.366..119.555 rows=0 loops=3)
Filter: (((b)::text = '91975'::text) AND ((c)::text = '4c79d'::text))
Rows Removed by Filter: 1666666
Planning time: 0.154 ms
Execution time: 127.993 ms
(8 rows)
Time: 130.603 ms
7.多字段索引查询条件使用OR(有时也会走索引扫描,但查询效率不高)
postgres=# explain analyze select * from tbl_indexes where a = 10 or c = '4c79d';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..64265.50 rows=49875 width=68) (actual time=0.520..138.035 rows=5 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tbl_indexes (cost=0.00..58278.00 rows=20781 width=68) (actual time=52.632..129.858 rows=2 loops=3)
Filter: ((a = 10) OR ((c)::text = '4c79d'::text))
Rows Removed by Filter: 1666665
Planning time: 0.151 ms
Execution time: 138.075 ms
(8 rows)
Time: 139.952 ms
postgres=# explain analyze select * from tbl_indexes where a = 10 or b = '4c79d';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..64265.50 rows=49875 width=68) (actual time=0.513..131.024 rows=8 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tbl_indexes (cost=0.00..58278.00 rows=20781 width=68) (actual time=51.260..123.413 rows=3 loops=3)
Filter: ((a = 10) OR ((b)::text = '4c79d'::text))
Rows Removed by Filter: 1666664
Planning time: 0.152 ms
Execution time: 131.064 ms
(8 rows)
Time: 132.946 ms
postgres=# explain analyze select * from tbl_indexes where a = 10 or a = 11;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbl_indexes (cost=1176.80..29434.85 rows=49875 width=68) (actual time=0.034..0.036 rows=2 loops=1)
Recheck Cond: ((a = 10) OR (a = 11))
Heap Blocks: exact=1
-> BitmapOr (cost=1176.80..1176.80 rows=50000 width=0) (actual time=0.026..0.027 rows=0 loops=1)
-> Bitmap Index Scan on tbl_indexes_a_b_c (cost=0.00..575.93 rows=25000 width=0) (actual time=0.020..0.020 rows=1 loops=1)
Index Cond: (a = 10)
-> Bitmap Index Scan on tbl_indexes_a_b_c (cost=0.00..575.93 rows=25000 width=0) (actual time=0.005..0.005 rows=1 loops=1)
Index Cond: (a = 11)
Planning time: 0.152 ms
Execution time: 0.090 ms
(10 rows)
Time: 2.928 ms
如果检索条件为同一个字段 如a = 1 or a =2 转换为 a in (1,2) 会更优。
如果多个字段为同一类型可使用数组化索引
indexdb=# CREATE TABLE tbloom AS
indexdb-# SELECT
indexdb-# (random() * 1000000)::int as i1,
indexdb-# (random() * 1000000)::int as i2,
indexdb-# (random() * 1000000)::int as i3,
indexdb-# (random() * 1000000)::int as i4,
indexdb-# (random() * 1000000)::int as i5,
indexdb-# (random() * 1000000)::int as i6
indexdb-# FROM
indexdb-# generate_series(1,10000000);
SELECT 10000000
-- 创建bloom索引
indexdb=# CREATE index btreeidx ON tbloom (i1, i2, i3, i4, i5, i6);
CREATE INDEX
indexdb=# EXPLAIN ANALYZE SELECT * FROM tbloom WHERE i2 = 898732 OR i5 = 123451;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
Gather (cost=249337.50..385507.50 rows=99750 width=24) (actual time=345.415..684.760 rows=19 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Bitmap Heap Scan on tbloom (cost=248337.50..374532.50 rows=41562 width=24) (actual time=397.157..672.875 rows=6 loops=3)
Filter: ((i2 = 898732) OR (i5 = 123451))
Rows Removed by Filter: 3333327
Heap Blocks: exact=21455
-> Bitmap Index Scan on btreeidx (cost=0.00..248312.56 rows=10000000 width=0) (actual time=331.686..331.686 rows=10000000 loops=1)
Planning time: 0.165 ms
Execution time: 684.813 ms
(10 rows)
-- 创建数组化索引
indexdb=# create index ON tbloom USING gin ( (array[i2,i5]));
CREATE INDEX
indexdb=# explain analyze select * from tbloom where (ARRAY[i2, i5]) && array[898732] or (ARRAY[i2, i5]) && array[123451];
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tbloom (cost=991.87..68961.41 rows=99750 width=24) (actual time=0.068..0.174 rows=41 loops=1)
Recheck Cond: ((ARRAY[i2, i5] && '{898732}'::integer[]) OR (ARRAY[i2, i5] && '{123451}'::integer[]))
Heap Blocks: exact=41
-> BitmapOr (cost=991.87..991.87 rows=100000 width=0) (actual time=0.046..0.046 rows=0 loops=1)
-> Bitmap Index Scan on tbloom_array_idx (cost=0.00..471.00 rows=50000 width=0) (actual time=0.030..0.030 rows=23 loops=1)
Index Cond: (ARRAY[i2, i5] && '{898732}'::integer[])
-> Bitmap Index Scan on tbloom_array_idx (cost=0.00..471.00 rows=50000 width=0) (actual time=0.015..0.015 rows=18 loops=1)
Index Cond: (ARRAY[i2, i5] && '{123451}'::integer[])
Planning time: 0.266 ms
Execution time: 0.242 ms
(10 rows)
Time: 1.311 ms
性能提升明显
8.表中数据量少时
postgres=# create table tbl_index_less(a int);
CREATE TABLE
postgres=# create index tbl_index_less_a on tbl_index_less using btree (a);
CREATE INDEX
-- 加10条
postgres=# insert into tbl_index_less select generate_series(1,10);
INSERT 0 10
postgres=# analyze tbl_index_less ;
ANALYZE
postgres=# explain analyze select * from tbl_index_less where a = 4;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on tbl_index_less (cost=0.00..1.12 rows=1 width=4) (actual time=0.013..0.016 rows=1 loops=1)
Filter: (a = 4)
Rows Removed by Filter: 9
Planning time: 0.276 ms
Execution time: 0.054 ms
(5 rows)
-- 加100条
postgres=# insert into tbl_index_less select generate_series(10,100);
INSERT 0 91
postgres=# explain analyze select * from tbl_index_less where a = 4;
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on tbl_index_less (cost=0.00..2.26 rows=1 width=4) (actual time=0.017..0.033 rows=1 loops=1)
Filter: (a = 4)
Rows Removed by Filter: 100
Planning time: 0.236 ms
Execution time: 0.062 ms
(5 rows)
-- 1000条
postgres=# insert into tbl_index_less select generate_series(100,1000);
INSERT 0 901
postgres=# analyze tbl_index_less ;
ANALYZE
postgres=# explain analyze select * from tbl_index_less where a = 4;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Index Only Scan using tbl_index_less_a on tbl_index_less (cost=0.28..8.29 rows=1 width=4) (actual time=0.010..0.010 rows=1 loops=1)
Index Cond: (a = 4)
Heap Fetches: 1
Planning time: 0.073 ms
Execution time: 0.023 ms
(5 rows)
tips
数据库是如何知道表中的数据量及数据分布情况 ,主要是依赖统计信息 pg_class ,pg_stats。
当表数据变更很大时,如批量导入数据或删除数据时。需要及时使用analyze更新统计信息。
关于 null https://yq.aliyun.com/articles/241219
查看表顺序扫描和索引的次数
select * from pg_stat_all_tables where relname = 'tab_name';
select * from pg_stat_all_indexes where relname = 'tbl_name';
创建索引 http://www.cnblogs.com/alianbog/p/5631505.html
mysql 索引建议 参考 https://mp.weixin.qq.com/s/xdbo67F72a9eTV93TEuL6w
A PostgreSQL extension for collecting statistics about predicates, helping find what indices are missing
利用索引失效改变执行计划案例
表定义&数据分布
create table tb1(id int primary key,c1 int);
create index on tb1(c1);
insert into tb1 select id, id/10000 from generate_series(1,10000000)id;
SQL&执行计划
postgres=# explain analyze select * from tb1 where c1=999 order by id limit 10; QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.43..332.29 rows=10 width=8) (actual time=1571.315..1571.319 rows=10 loops=1)
-> Index Scan using tb1_pkey on tb1 (cost=0.43..328935.03 rows=9912 width=8) (actual time=1571.314..1571.316 rows=10 loops=1)
Filter: (c1 = 999)
Rows Removed by Filter: 9989999 Planning Time: 0.112 ms
Execution Time: 1571.337 ms
(6 rows)
上面Index Scan估算的行数和cost都比较准确,但评估LIMIT子句时,优化器假设数据分布是均匀的, 只需扫描主键索引的 10/9912即可找到10条匹配的记录,最终的估算代价也被LIMIT降到10/9921。但实际上满足条件的记录都集中在索引的尾部。
sql 改写
SQL改写方法1:破坏索引排序
select * from tb1 where c1=999 order by id+0 limit 10;
SQL改写方法2:物化子查询
WITH t AS MATERIALIZED(
select * from tb1 where c1=999 )
select * from t order by id limit 10