April 5, 2026
Relational databases form the core of most scalable backend architectures. As user traffic grows, database read/write latency becomes the primary bottleneck for application responsiveness. In PostgreSQL, a query that executes in 5 milliseconds with 10,000 rows can easily balloon to 15 seconds when tables scale to 50 million rows.
Optimizing these queries requires more than just adding indexes. It demands a deep understanding of the PostgreSQL query planner, data layouts on disk, execution algorithms, and how Multi-Version Concurrency Control (MVCC) impacts disk reads. In this article, we will analyze PostgreSQL query plans, explore advanced indexing strategies, dissect join algorithms, resolve the slow count issue, and learn how to tune autovacuum parameters.
EXPLAIN (ANALYZE, BUFFERS)To optimize a slow query, you must inspect the execution plan generated by the PostgreSQL planner. By default, running EXPLAIN returns a static estimate of the cost. To get real-world performance statistics, you should always run EXPLAIN (ANALYZE, BUFFERS):
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, email, created_at
FROM users
WHERE company_id = 45
ORDER BY created_at DESC
LIMIT 50;
Adding BUFFERS tells the planner to output data block operations, which is the most critical metric for evaluating query efficiency.
Query Plan Execution Tree
+--------------------------------------+
| 1. Limit (Actual Time) |
+--------------------------------------+
|
+--------------------------------------+
| 2. Sort (Key: created_at) |
+--------------------------------------+
|
+--------------------------------------+
| 3. Bitmap Heap Scan (company_id) |
+--------------------------------------+
|
+--------------------------------------+
| 4. Bitmap Index Scan (idx_company) |
+--------------------------------------+
SELECT clause, PostgreSQL bypasses the heap table completely and reads data directly from the index. This is the fastest read operation.Shared Hit: Data blocks loaded directly from the PostgreSQL shared buffer cache in RAM.Shared Read: Data blocks read directly from disk (much slower).Shared Read count, it indicates that PostgreSQL is performing expensive disk I/O operations.Your goal is to optimize queries so that the planner chooses Index Only Scans with high Shared Hit ratios.
While B-Tree indexes are standard, they must be designed to match your query patterns.
When creating an index on multiple columns, the order of columns matters. The rule of thumb is: Equality columns first, range columns second.
Consider this query:
SELECT id FROM orders WHERE user_id = 1054 AND created_at >= '2024-01-01';
If you create an index like this:
-- Inefficient: Range column prevents efficient filtering on user_id
CREATE INDEX idx_orders_bad ON orders(created_at, user_id);
PostgreSQL will scan the index for dates, and then filter by user_id inside those date ranges. Instead, create it in this order:
-- Efficient: Equality column first
CREATE INDEX idx_orders_good ON orders(user_id, created_at);
Here, PostgreSQL instantly jumps to the matching user_id branch of the B-Tree and reads the chronological index entries linearly.
INCLUDE ClauseIf you want to achieve an Index Only Scan but do not want to add columns to the search index tree (which would bloat the index and slow down updates), use the INCLUDE clause. This appends data columns strictly to the leaf nodes of the B-Tree:
CREATE INDEX idx_users_company_covering
ON users(company_id)
INCLUDE(email, created_at);
Now, the query SELECT email, created_at FROM users WHERE company_id = 45 will execute as a fast Index Only Scan.
When joining tables, the planner chooses one of three algorithms depending on data sizes and index layouts.
For every row in the outer table, the database checks matching rows in the inner table.
The database reads the inner table and builds a hash table in memory (using Work Mem). It then scans the outer table, probing the hash table for matches.
work_mem buffer size, PostgreSQL is forced to write temporary files to disk, dragging down query speeds.Both datasets are sorted by the join key and scanned in parallel to align matches.
Debugging Tip: If you notice high execution times due to Hash Joins writing temp files to disk, increase the allocated memory block configuration for queries:
-- Increase query working memory to 64MB for the current session
SET work_mem = '64MB';
COUNT(*) Problem on Large TablesIn MySQL or other engines, running a count query is fast because they store pre-computed totals. In PostgreSQL, running SELECT COUNT(*) FROM orders takes a long time on large tables.
Because of MVCC, each transaction sees a different snapshot of the database. The database engine must perform a sequential scan, verifying the visibility of every single row against the current transaction status.
For dashboards or landing pages, you rarely need a 100% accurate count down to the single digit. You can query the system tables (pg_class) to get an instant, approximate count:
SELECT reltuples::bigint AS approximate_count
FROM pg_class
WHERE relname = 'orders';
This returns the approximate row count in less than a millisecond, even if the table contains billions of records.
If you need a 100% accurate count for a subset of records, use a covering index with a index-only scan, or maintain a dedicated counters table updated via triggers.
PostgreSQL uses MVCC to manage concurrent transactions. When you update a row, PostgreSQL marks the old row as invisible (creating a "dead tuple") and inserts a new version. Deleting a row simply marks it as dead.
If these dead tuples are not cleaned up, they remain on disk, causing table bloat. Future queries must scan through these inactive, dead slots, slowing down all reads and consuming disk space.
The autovacuum daemon is a background worker that cleans up dead tuples. If your database experiences high write/update volumes, the default autovacuum configurations are often too passive, leading to rapid bloat.
To prevent bloat, tune these configuration parameters in postgresql.conf (or apply them directly to high-traffic tables):
-- Trigger vacuum when 10% of rows are updated or deleted (default is 20%)
ALTER TABLE orders SET (autovacuum_vacuum_scale_factor = 0.10);
-- Set vacuum threshold to start after 500 changes
ALTER TABLE orders SET (autovacuum_vacuum_threshold = 500);
-- Decrease vacuum delay to make it run faster (default is 2ms)
ALTER TABLE orders SET (autovacuum_vacuum_cost_delay = 2);
-- Increase vacuum cost limit to allow it to do more work per cycle
ALTER TABLE orders SET (autovacuum_vacuum_cost_limit = 1000);
By monitoring slow queries with EXPLAIN ANALYZE BUFFERS, designing covering indexes with equality-first column ordering, and tuning autovacuum parameters, you can scale your PostgreSQL database to handle massive workloads.
Analytical queries often use Window Functions (ROW_NUMBER(), RANK(), SUM() OVER(...)) to calculate running totals, partitions, or rank rankings. However, window functions are notoriously slow because they require the database to group and sort data partitions in memory.
If you run EXPLAIN ANALYZE on a query containing window functions, you will often see a Sort node or an Incremental Sort node:
EXPLAIN (ANALYZE, BUFFERS)
SELECT user_id, amount,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) as row_num
FROM orders;
This sort step is expensive. If the sorted dataset exceeds the size of work_mem, PostgreSQL will spill the sort buffer to disk, causing high read/write latency.
You can eliminate the sort step entirely by creating an index that matches the PARTITION BY and ORDER BY clauses of your window function. The database planner can read the index nodes in pre-sorted order, bypassing the sort execution phase completely.
Create a composite index following this pattern:
PARTITION BY columns firstORDER BY columns second-- Optimize: ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC)
CREATE INDEX idx_orders_window_opt ON orders(user_id, created_at DESC);
When you re-run EXPLAIN ANALYZE after adding this index, you will notice that the Sort node is replaced by a fast WindowAgg node sitting directly on top of an Index Scan.
This eliminates CPU-intensive memory sorting and disk I/O, decreasing execution times by up to 90% on large datasets. Always ensure your analytical queries align with index-supported sorting structures to maximize speed.