PostgreSQL index optimization
Some time ago, I worked on query execution time optimizations for PostgreSQL, I talk about it here: The VISUALVM SQL PROFILE.
Kestra is a highly scalable data orchestration and scheduling platform that creates, executes, schedules, and monitors millions of complex pipelines. It’s also the company I work for!
The open source version of Kestra uses a database engine, while the Enterprise Edition lets you use an alternative engine based on Kafka and Elasticsearch. The database engine supports H2, PostgreSQL and MySQL. The optimization in question was done for PostgreSQL.
PostgreSQL doesn’t allow you to force the use of an index. During my previous research into improving performance, I noticed that a very important index on the queues
table was not being used, the primary key index was being used instead. To overcome this problem, I removed the primary key and replaced it with a hash index on the same column. But it turned out that this trick didn’t work in 100% of cases. Sometimes, PostgreSQL would insist on using the wrong index, and alas, the larger the size of the queues
table, the less PostgreSQL would choose the right index.
During my tests on a queues table containing 3 million records, PostgreSQL almost systematically chose the wrong index, leading to execution times of over a second, whereas using the right index takes no more than a millisecond.
So I had to understand why PostgreSQL chose a different index. And to understand that, you have to look in detail at the table structure, the queries, and the “almost” … the cases where it chose the right index.
Here is the structure of the queues
table:
CREATE TABLE queues (
“offset” serial4 NOT NULL,
“type” public.“queue_type” NOT NULL,
“key” varchar(250) NOT NULL,
value jsonb NOT NULL,
updated timestamptz NOT NULL DEFAULT CURRENT_TIMESTAMP,
consumer_indexer bool NULL DEFAULT false,
consumer_executor bool NULL DEFAULT false,
consumer_worker bool NULL DEFAULT false,
consumer_scheduler bool NULL DEFAULT false,
consumer_flow_topology bool NULL DEFAULT false,
consumer_group varchar(250) NULL
);
This table is queried every second by each consumer (indexer, executor, worker, scheduler, flow_topology) to find out if there are any messages to process. There is one index per consumer, which should ensure that these queries are very fast.
Here is one such index:
CREATE INDEX queues_type__consumer_executor
ON queues USING btree (type, consumer_executor);
Unfortunately, the index is not systematically used when querying. Instead, PostgreSQL most often chooses to scan the queues_offset
index, which is an index containing only the offset column. It therefore prefers a seq scan to a bitmap index scan. One trick to force the use of the queues_type__consumer_executor
, index is to configure PostgreSQL with a random page cost lower than the default of 4 (random_page_cost=1.1 for example). The problem, is that PostgreSQL’s cost analyzer decides that using the queues_type__consumer_executor
index has a greater potential cost than scanning the queues_offset
index. Since we can’t ask each of our users to configure PostgreSQL in a certain way, we need to find another solution.
Let’s take a look at one of the problematic queries:
select “value”, “offset” from queues
where ((type = CAST(‘io.kestra.core.models.executions.Execution’ AS queue_type))
and “consumer_scheduler” = false and “consumer_group” is null)
order by “offset” asc fetch next 100 rows only for update skip locked
Each consumer (in this case, the scheduler) will request a message type (in this case, Execution), then set their flag to true to indicate that the message has been processed. These requests are called poll requests.
At first look, the request should use my index. Except that, after analyzing the table statistics, I realized that the data values were poorly distributed. And that’s logical! Some consumers have very few messages, so their flag is set to false in over 95% of rows, making the index indiscriminate. Some consume almost 50% of the table, so the flag will be set to 50% true, making the index too costly.
Let’s take a closer look at the example above: 60% of rows are of type io.kestra.core.models.executions.Execution
and 50% of rows have the consumer_scheduler
flag set to false, giving us a probability of having both columns at the desired value of 30%, which is not discriminating enough to use the index. What the optimizer can’t know is that, in fact, only a few percent of rows have both columns at the desired values.
What’s more, even if you don’t query the offset column, it’s part of the order by clause. This is certainly why PostgreSQL prefers the index on the offset column, as the index is not very discriminating.
By adding the offset column to the queues_type__consumer_executor
index, PostgreSQL will then prefer this index, as the column will be useful for making its order by, it’s a bit like forcing it to use the index.
This results in the following index, which is now used for my query:
CREATE INDEX queues_type__consumer_executor
ON queues USING btree (type, consumer_executor, “offset”);
But we can do even better! Since queries are always made with a flag set to false, I can make indexes partial to save disk space, because once the message has been processed, it will never be accessed again.
This gives the following index:
CREATE INDEX queues_type__consumer_executor
ON queues USING btree (type, consumer_executor, “offset”) WHERE (consumer_executor = false);
With this new index, the average time for a poll query on the queues
table has dropped from 1s to 1ms!
PostgreSQL is a great database, and the fact that we can’t force an index forces us to ask ourselves questions such as: why isn’t my index being used, what is the distribution of data in my table, etc? These are good questions, which enable us to better understand our use of the database, and improve the performance of our queries by providing PostgreSQL with the right index, not just the index we thought was the right one.
If you too would like to analyze table statistics, here’s the query I used:
SELECT tablename, schemaname, attname As colname, n_distinct,
array_to_string(most_common_vals, E’\n’) AS common_vals,
array_to_string(most_common_freqs, E’\n’) As dist_freq
FROM pg_stats
WHERE tablename = ‘queues’
ORDER BY schemaname, tablename, attname;