Source linked

A 10,000-Row Postgres Metadata Table Can Wreck Your 20ms Queries

hackernoon.com@systems_wire2 hours ago·Systems Engineering·4 comments

Stale statistics on a small devices table caused a join to degrade from 20ms to 2.8 seconds. Here's how to find and fix it in under five minutes.

postgresquery performanceautovacuumexplain analyzedatabase optimizationmetadata tables

Stale statistics on a 10,000-row Postgres metadata table turned a 20ms join into a 2.8-second drag on every dashboard query — and nobody audits the small tables.

Most time-series performance work starts with the fact table: five hundred million sensor readings, daily partitions, autovacuum fighting write load. That's the wrong place to look first. A 10,000-row devices table, rarely updated, sitting in the FROM clause of every join can quietly push your query from the indexed-access tier into linear degradation. No schema migration required.

Why Postgres Plans a Join for a Table It Thinks Is 600 Rows

Postgres triggers ANALYZE based on autovacuum_analyze_scale_factor, which defaults to 0.2. On a 10,000-row table, that means 2,000 rows must change before a statistics refresh fires. If your device metadata only sees occasional inserts, ANALYZE might have run once in the last quarter. Meanwhile pg_statistic still describes the distribution of device_id values from six months ago — making a 10,000-row table look like 600 rows to the planner.

When the planner sees 600 rows, it may choose a nested loop when a hash join is needed, or allocate an undersized hash table that spills to disk. The direction of the error matters less than the consequence: a join that ran in 20ms with accurate statistics degrades to 4 seconds as the mismatch grows, with no change to the fact table, indexes, or query itself.

Finding the Mismatch in Under Five Minutes

Run EXPLAIN (ANALYZE, BUFFERS) on any query joining your fact table to a metadata table. Look at the join node: Postgres shows two numbers — rows=X (estimated) and actual rows=Y. A ratio above 5:1 in either direction means stale statistics.

A real mismatch looks like this:

Hash Join (cost=45.00..12890.43 rows=620 width=48)
 (actual time=0.821..2847.332 rows=9841 loops=1)
 -> Seq Scan on devices (cost=0.00..42.20 rows=620 width=24)
 (actual time=0.015..0.441 rows=9841 loops=1)

The planner expected 620 rows from devices. It got 9,841. That gap is why the hash table bucketing is wrong and the join ran for 2.8 seconds instead of 150ms.

Fix 1: Force Fresher Statistics (No Migration)

Lower the scale factor on the metadata table so ANALYZE fires more aggressively:

ALTER TABLE devices SET (
 autovacuum_analyze_scale_factor = 0.01,
 autovacuum_analyze_threshold = 50
);

With these settings, ANALYZE fires after roughly 150 rows change rather than 2,000. Run ANALYZE devices; immediately. Re-run the EXPLAIN — estimated rows now match actual rows (9,930 vs 9,841), and total execution time drops from 2.8 seconds to 312ms.

Fix 2: Audit All Small Metadata Tables

The same problem applies to any metadata table that grows slowly but joins to large fact tables. This query finds candidates:

SELECT schemaname, relname, n_live_tup, last_analyze, last_autoanalyze
FROM pg_stat_user_tables
WHERE n_live_tup < 100000 AND last_analyze < now() - interval '7 days';

No maintenance window, no schema migration. Just a one-time change.

Fix 3: Denormalize for the Highest-Volume Joins

For stable metadata like location, store it directly on the fact table row. Remove the join entirely. A trigger on devices keeps it current when location changes infrequently. If device moves are frequent, use an application-layer background job with a chunked loop to limit I/O pressure.

Your next step: open psql and run the audit query from Fix 2. It takes under a minute. Identify every under-analyzed metadata table in your schema, apply the scale factor fix, and watch your join queries return to the 20ms tier.


Source: How Small Postgres Metadata Tables Quietly Throttle Your Largest Queries
Domain: hackernoon.com

Read original source ->

External source stays available while the OJO article and comment thread stay local.

Comments load interactively on the live page.