How do I force Postgres to use a particular index

Question

How do I force Postgres to use an index when it would otherwise insist on doing a sequential scan

User · Answer

The question on itself is very much invalid. Forcing (by doing enable_seqscan=off for example) is very bad idea. It might be useful to check if it will be faster, but production code should never use such tricks.

Instead - do explain analyze of your query, read it, and find out why PostgreSQL chooses bad (in your opinion) plan.

There are tools on the web that help with reading explain analyze output - one of them is explain.depesz.com - written by me.

Another option is to join #postgresql channel on freenode irc network, and talking to guys there to help you out - as optimizing query is not a matter of "ask a question, get answer be happy". it's more like a conversation, with many things to check, many things to be learned.

User · Answer

Check your random page cost This problem typically happens when the estimated cost of an index scan is too high and doesn t correctly reflect reality  You may need to lower the random page cost configuration parameter to fix this  From the Postgres documentation   Reducing this value       will cause the system to prefer index scans  raising it will make index scans look relatively more expensive   You can do a quick test whether this will actually make Postgres use the index  EXPLAIN  lt query gt                  Uses sequential scan SET random page cost   1  EXPLAIN  lt query gt                  May use index scan now  You can restore the default value with SET random page cost   DEFAULT  again  Background Index scans require non-sequential disk page fetches  Postgres uses random page cost to estimate the cost of such non-sequential fetches in relation to sequential fetches  The default value is 4 0  thus assuming an average cost factor of 4 compared to sequential fetches  taking caching effects into account   The problem however is that this default value is unsuitable in the following important real-life scenarios  1  Solid-state drives As per the documentation   Storage that has a low random read cost relative to sequential  e g  solid-state drives  might be better modeled with a lower value for random page cost  e g   1 1   This slide from a speak at PostgresConf 2018 also says that random page cost should be set to something between 1 0 and 2 0 for solid-state drives  2  Cached data If the required index data is already cached in RAM  an index scan will always be significantly faster than a sequential scan  The documentation says   If your data is likely to be completely in cache        decreasing random page cost can be appropriate   The problem is that you of course can t easily know whether the relevant data is already cached  However  if a specific index is frequently used  and if the system has sufficient RAM  then data is likely to be cached eventually  and random page cost should be set to a lower value  You ll have to experiment with different values and see what works for you  You might also want to use the pg prewarm extension for explicit data caching

User · Answer

Sometimes PostgreSQL fails to make the best choice of indexes for a particular condition  As an example  suppose there is a transactions table with several million rows  of which there are several hundred for any given day  and the table has four indexes  transaction id  client id  date  and description  You want to run the following query   SELECT client id  SUM amount  FROM transactions WHERE date  gt    yesterday   timestamp AND date  lt   today   timestamp AND       description    Refund  GROUP BY client id   PostgreSQL may choose to use the index transactions description idx instead of transactions date idx  which may lead to the query taking several minutes instead of less than one second  If this is the case  you can force using the index on date by fudging the condition like this   SELECT client id  SUM amount  FROM transactions WHERE date  gt    yesterday   timestamp AND date  lt   today   timestamp AND       description        Refund  GROUP BY client id

User · Answer

Check your random page cost This problem typically happens when the estimated cost of an index scan is too high and doesn t correctly reflect reality  You may need to lower the random page cost configuration parameter to fix this  From the Postgres documentation   Reducing this value       will cause the system to prefer index scans  raising it will make index scans look relatively more expensive   You can do a quick test whether this will actually make Postgres use the index  EXPLAIN  lt query gt                  Uses sequential scan SET random page cost   1  EXPLAIN  lt query gt                  May use index scan now  You can restore the default value with SET random page cost   DEFAULT  again  Background Index scans require non-sequential disk page fetches  Postgres uses random page cost to estimate the cost of such non-sequential fetches in relation to sequential fetches  The default value is 4 0  thus assuming an average cost factor of 4 compared to sequential fetches  taking caching effects into account   The problem however is that this default value is unsuitable in the following important real-life scenarios  1  Solid-state drives As per the documentation   Storage that has a low random read cost relative to sequential  e g  solid-state drives  might be better modeled with a lower value for random page cost  e g   1 1   This slide from a speak at PostgresConf 2018 also says that random page cost should be set to something between 1 0 and 2 0 for solid-state drives  2  Cached data If the required index data is already cached in RAM  an index scan will always be significantly faster than a sequential scan  The documentation says   If your data is likely to be completely in cache        decreasing random page cost can be appropriate   The problem is that you of course can t easily know whether the relevant data is already cached  However  if a specific index is frequently used  and if the system has sufficient RAM  then data is likely to be cached eventually  and random page cost should be set to a lower value  You ll have to experiment with different values and see what works for you  You might also want to use the pg prewarm extension for explicit data caching

User · Answer

Probably the only valid reason for using  set enable seqscan false   is when you re writing queries and want to quickly see what the query plan would actually be were there large amounts of data in the table s   Or of course if you need to quickly confirm that your query is not using an index simply because the dataset is too small

User · Answer

Assuming you re asking about the common  index hinting  feature found in many databases  PostgreSQL doesn t provide such a feature  This was a conscious decision made by the PostgreSQL team  A good overview of why and what you can do instead can be found here  The reasons are basically that it s a performance hack that tends to cause more problems later down the line as your data changes  whereas PostgreSQL s optimizer can re-evaluate the plan based on the statistics  In other words  what might be a good query plan today probably won t be a good query plan for all time  and index hints force a particular query plan for all time   As a very blunt hammer  useful for testing  you can use the enable seqscan and enable indexscan parameters  See    Examining index usage enable  parameters   These are not suitable for ongoing production use  If you have issues with query plan choice  you should see the documentation for tracking down query performance issues  Don t just set enable  params and walk away   Unless you have a very good reason for using the index  Postgres may be making the correct choice  Why    For small tables  it s faster to do sequential scans  Postgres doesn t use indexes when datatypes don t match properly  you may need to include appropriate casts  Your planner settings might be causing problems    See also this old newsgroup post

User · Answer

Assuming you re asking about the common  index hinting  feature found in many databases  PostgreSQL doesn t provide such a feature  This was a conscious decision made by the PostgreSQL team  A good overview of why and what you can do instead can be found here  The reasons are basically that it s a performance hack that tends to cause more problems later down the line as your data changes  whereas PostgreSQL s optimizer can re-evaluate the plan based on the statistics  In other words  what might be a good query plan today probably won t be a good query plan for all time  and index hints force a particular query plan for all time   As a very blunt hammer  useful for testing  you can use the enable seqscan and enable indexscan parameters  See    Examining index usage enable  parameters   These are not suitable for ongoing production use  If you have issues with query plan choice  you should see the documentation for tracking down query performance issues  Don t just set enable  params and walk away   Unless you have a very good reason for using the index  Postgres may be making the correct choice  Why    For small tables  it s faster to do sequential scans  Postgres doesn t use indexes when datatypes don t match properly  you may need to include appropriate casts  Your planner settings might be causing problems    See also this old newsgroup post

User · Answer

There is a trick to push postgres to prefer a seqscan adding a OFFSET 0 in the subquery  This is handy for optimizing requests linking big huge tables when all you need is only the n first last elements   Lets say you are looking for first last 20 elements involving multiple tables having 100k  or more  entries  no point building linking up all the query over all the data when what you ll be looking for is in the first 100 or 1000 entries   In this scenario for example  it turns out to be over 10x faster to do a sequential scan   see How can I prevent Postgres from inlining a subquery

User · Answer

Sometimes PostgreSQL fails to make the best choice of indexes for a particular condition  As an example  suppose there is a transactions table with several million rows  of which there are several hundred for any given day  and the table has four indexes  transaction id  client id  date  and description  You want to run the following query   SELECT client id  SUM amount  FROM transactions WHERE date  gt    yesterday   timestamp AND date  lt   today   timestamp AND       description    Refund  GROUP BY client id   PostgreSQL may choose to use the index transactions description idx instead of transactions date idx  which may lead to the query taking several minutes instead of less than one second  If this is the case  you can force using the index on date by fudging the condition like this   SELECT client id  SUM amount  FROM transactions WHERE date  gt    yesterday   timestamp AND date  lt   today   timestamp AND       description        Refund  GROUP BY client id

User · Answer

Assuming you re asking about the common  index hinting  feature found in many databases  PostgreSQL doesn t provide such a feature  This was a conscious decision made by the PostgreSQL team  A good overview of why and what you can do instead can be found here  The reasons are basically that it s a performance hack that tends to cause more problems later down the line as your data changes  whereas PostgreSQL s optimizer can re-evaluate the plan based on the statistics  In other words  what might be a good query plan today probably won t be a good query plan for all time  and index hints force a particular query plan for all time   As a very blunt hammer  useful for testing  you can use the enable seqscan and enable indexscan parameters  See    Examining index usage enable  parameters   These are not suitable for ongoing production use  If you have issues with query plan choice  you should see the documentation for tracking down query performance issues  Don t just set enable  params and walk away   Unless you have a very good reason for using the index  Postgres may be making the correct choice  Why    For small tables  it s faster to do sequential scans  Postgres doesn t use indexes when datatypes don t match properly  you may need to include appropriate casts  Your planner settings might be causing problems    See also this old newsgroup post

User · Answer

The question on itself is very much invalid. Forcing (by doing enable_seqscan=off for example) is very bad idea. It might be useful to check if it will be faster, but production code should never use such tricks.

Instead - do explain analyze of your query, read it, and find out why PostgreSQL chooses bad (in your opinion) plan.

There are tools on the web that help with reading explain analyze output - one of them is explain.depesz.com - written by me.

Another option is to join #postgresql channel on freenode irc network, and talking to guys there to help you out - as optimizing query is not a matter of "ask a question, get answer be happy". it's more like a conversation, with many things to check, many things to be learned.

User · Answer

Assuming you re asking about the common  index hinting  feature found in many databases  PostgreSQL doesn t provide such a feature  This was a conscious decision made by the PostgreSQL team  A good overview of why and what you can do instead can be found here  The reasons are basically that it s a performance hack that tends to cause more problems later down the line as your data changes  whereas PostgreSQL s optimizer can re-evaluate the plan based on the statistics  In other words  what might be a good query plan today probably won t be a good query plan for all time  and index hints force a particular query plan for all time   As a very blunt hammer  useful for testing  you can use the enable seqscan and enable indexscan parameters  See    Examining index usage enable  parameters   These are not suitable for ongoing production use  If you have issues with query plan choice  you should see the documentation for tracking down query performance issues  Don t just set enable  params and walk away   Unless you have a very good reason for using the index  Postgres may be making the correct choice  Why    For small tables  it s faster to do sequential scans  Postgres doesn t use indexes when datatypes don t match properly  you may need to include appropriate casts  Your planner settings might be causing problems    See also this old newsgroup post

User · Answer

There is a trick to push postgres to prefer a seqscan adding a OFFSET 0 in the subquery  This is handy for optimizing requests linking big huge tables when all you need is only the n first last elements   Lets say you are looking for first last 20 elements involving multiple tables having 100k  or more  entries  no point building linking up all the query over all the data when what you ll be looking for is in the first 100 or 1000 entries   In this scenario for example  it turns out to be over 10x faster to do a sequential scan   see How can I prevent Postgres from inlining a subquery

User · Answer

Probably the only valid reason for using  set enable seqscan false   is when you re writing queries and want to quickly see what the query plan would actually be were there large amounts of data in the table s   Or of course if you need to quickly confirm that your query is not using an index simply because the dataset is too small

[sql] How do I force Postgres to use a particular index?

Examples related to sql

Examples related to postgresql

Examples related to indexing