Welcome to a weekly blog where I get to answer (like, really answer) some of the questions I’ve seen in the webinars I’ve presented lately. If you missed the latest one, PostgreSQL Performance Tuning Secrets, it might be helpful to give some of it a listen before or after you read this post. Each week, I’ll dive deep into one question. Let me know what you think in the comments. 

We constantly hear that indexes improve read performance and it’s usually true, but we also know that it will always have an impact on writes. What we don’t hear about too often is that in some cases, it may not give any performance improvement at all. This happens more than we want and might happen more than we even notice, and foreign keys (FKs) are a great example. I’m not saying that all FK’s indexes are bad, but most of the ones I’ve seen are just unnecessary, only adding load to the system.

For example, the below relationship where we have a 1:N relationship between the table “Supplier” and table “Product”:

foreign keys index

If we pay close attention to the FK’s in this example it won’t have a high number of lookups on the child table using the FK column, “SupplierID” in this example, if we compare with the number of lookups using “ProductID” and probably “ProductName”. The major usage will be to keep the relationship consistent and search in the other direction, finding the supplier for a certain product. In this circumstance, adding an index to the FK child without ensuring the access pattern requires it will add the extra cost of updating the index every time we update the “Product” table.

Another point we need to pay attention to is the index cardinality. If the index cardinality is too low Postgres won’t use it and the index will be just ignored. One can ask why that happens and if that wouldn’t still be cheaper for the database to go, for example, through half of the indexes instead of doing a full table scan? The answer is no, especially for databases that use heap tables like Postgres. The table access in Postgres (heap table) is mostly sequential, which is faster than random access in spinning HDD disks and still a bit faster on SSDs, while b+-tree index access is random by nature.

When Postgres uses an index it needs to open the index file, find the records it needs and then open the table file, do a lookup using the page addresses it got from the indexes changing the access pattern from sequential to random and depending on the data distribution it will probably access the majority of the table pages, ending up in a full table scan but now using random access, which is much more expensive. If we have columns with low cardinality and we really need to index them we need to use an alternative to b-tree indexes, for example, a GIN index, but this is a topic for another discussion.

With all of that, we may think that FK indexes are always evil and never use them on a child table, right? Well, that’s not true either and there are many circumstances they are useful and needed, for example, the below picture has another two tables, “Customer” and “Order”:

FK child table

It can be handy to have an index on the child table “Order->CustomerId” as it’s common to show all orders from a certain user and the column “CustomerId” on the table “Order” will be used quite frequently as the lookup key.

Another good example is to provide a faster method to validate referential integrity. If one needs to change the parent table (update or delete any parent key) the children need to be checked to make sure that the relationship isn’t broken. In this case, having an index on the child’s side would help to improve performance. When it worths the index however is “load dependant”. If the parent key has many deletes it might be a case to consider, however, if it’s a mostly static table or mostly has inserts or updates to the other columns other than the parent key column then it’s not a good candidate to have an index on the children tables.

There are many other examples that can be given to explain why an index on the child table might be useful and worth the extra write cost/penalty.

Conclusion

The takeaway here is that we should not indiscriminately create indexes on all FKs because many of them will just not be used or so rarely used that they aren’t worth the cost. It’s better to initially design the database with the FKs but not the indexes and add them while the database grows and we understand the workload. It’s possible that at some point we find that we need an index on “Product->SupplierId” due to our workload and the index on “Order->CustomerId” isn’t necessary anymore. Loads change and data distribution as well, index shall follow them and not be treated as immutable entities.


As more companies look at migrating away from Oracle or implementing new databases alongside their applications, PostgreSQL is often the best option for those who want to run on open source databases.

Read Our New White Paper:

Why Customers Choose Percona for PostgreSQL

Subscribe
Notify of
guest

5 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Serhii Bulieienko

In the Oracle DB the recommendation to create the index on FK columns is to avoid the full table lock when update/delete rows in primary table. It is not for SELECT performance.

Daniel Lidström

If I indiscriminately add the indexes then I know I won’t have worse query performance. I might have slightly worse insert performance but in my applications reads have outnumbered writes by several orders usually.

Daniel Lidström

Hi Charly, interesting article from the guru. I think his example doesn’t apply to FK indexes. FK indexes shouldn’t be used in sorting operations. They are used when looking up related information. I am certainly not a guru though, but I still think the advice to always index foreign keys is good advice and as you say we can always adjust later if really necessary. A helpful article would be one that explains easy ways to detect those situations.