Airflow on Kubernetes, Part 2: The Metadata Database and PgBouncer

The metadata database is the one stateful thing Airflow cannot run without, and on Kubernetes it’s the component most likely to fall over under load, not because the queries are heavy, but because of how many connections Airflow opens. This part explains where those connections come from, how to size the SQLAlchemy pool, and why PgBouncer is mandatory rather than optional for any serious Postgres deployment.

Why Airflow hammers Postgres connection counts

PgBouncer earned its place in this stack for one concrete reason: Airflow opens a connection per task, and a busy cluster opens enough of them to take an RDS instance down. Postgres uses a process-per-connection model and every open connection is a separate backend process with its own memory so it’s the number of connections, not the weight of the queries, that saturates the database first.

The Airflow docs say it directly:

“Airflow is known especially in high-performance setup to open many connections to metadata database. This might cause problems for Postgres resource usage, because in Postgres, each connection creates a new process and it makes Postgres resource-hungry when a lot of connections are opened. Therefore we recommend to use PGBouncer as database proxy for all Postgres production installations.” Set up a Database Backend

And the Helm chart’s production guide makes the same point:

“Due to distributed nature of Airflow, it can open a lot of database connections. Using a connection pooler can significantly reduce the number of open connections on the database.” - Helm production guide → PgBouncer

That’s the entire motivation for the pooler: multiplex a large, spiky number of client connections (one per running task, plus every server component) onto a small, fixed set of real Postgres backends. With Celery, every running task talks to the database, so your peak connection count tracks your peak task concurrency. Scale concurrency up and RDS marches toward max_connections and starts refusing connections; the whole instance gets overwhelmed by connection count long before CPU or IO is the bottleneck. PgBouncer is how you decouple the two: collapse ~100 client connections onto ~10 real backends so concurrency stops being a database-killer.

Version note. This per-task connection pressure is most acute on Airflow 2, where workers and tasks open metadata-DB connections directly. Airflow 3 isolates workers behind the Execution API, so task pods no longer open their own metadata connections, but the server components (scheduler, DAG processor, triggerer, API server) still each hold a pool, and the Celery result backend still needs connections, so PgBouncer remains the right call. If you sized PgBouncer on Airflow 2 to survive task concurrency, expect that pressure to drop after upgrading to 3.x.

Two databases, two pools

A Celery-based deployment actually puts two distinct kinds of connection on Postgres, and the chart pools them separately:

The metadata database: task and DAG state, the scheduler’s bookkeeping, connections/variables, and XCom rows. This is sql_alchemy_conn, pooled by pgbouncer.metadataPoolSize (default 10). This is the connection that “manages the tasks.”
The Celery result backend: where workers write the result/return state of each executed task so the scheduler can collect the outcome. This is [celery] result_backend, pooled by pgbouncer.resultBackendPoolSize (default 5). This is the connection that carries task output back. (config ref → celery, helm production guide)

A common conflation: XCom values don’t live in the result backend. XCom rows are in the metadata DB (or in object storage once you configure that; see Part 4), while the result backend only carries Celery’s per-task return state. Both usually point at the same Postgres instance, so one PgBouncer with two pools covers them.

pgbouncer:
  enabled: true
  metadataPoolSize: 10 # pool for the metadata DB  (sql_alchemy_conn): task/DAG state
  resultBackendPoolSize: 5 # pool for the Celery result backend: task results/output

Where do the server-side connections come from? Every server component holds its own SQLAlchemy pool: each scheduler replica, each DAG processor, the triggerer, and the API server. Multiply pool size by replica count and you can saturate a small Postgres’ max_connections before a single task runs, on top of the per-task connections above.

Sizing the SQLAlchemy pool

Each component’s pool is governed by these [database] settings (config ref):

Setting	Default	What it controls
`sql_alchemy_pool_size`	5	Persistent connections kept open per component
`sql_alchemy_max_overflow`	10	Extra connections allowed above pool size under burst (discarded when returned)
`sql_alchemy_pool_recycle`	1800	Seconds before an idle connection is recycled
`sql_alchemy_pool_pre_ping`	True	`SELECT 1` on checkout to detect dead connections

So a single component can hold up to pool_size + max_overflow = 15 connections under burst. The worst case across components:

worst-case connections ≈ Σ over components of (pool_size + max_overflow) × replicas

Example. Two schedulers + two API servers + one DAG processor + one triggerer at defaults is 6 × 15 = 90 connections before headroom for migrations jobs, db clean, and your own monitoring. That’s already past the default max_connections = 100 on many managed Postgres tiers. You have two levers: shrink the pools, or put a pooler in front. Do both.

pool_pre_ping = True is worth keeping on; it costs a trivial round-trip but saves you from the stale-connection errors that managed databases love to throw after idle periods.

Managed Postgres: also set keepalives

This one is unrelated to connection counts; it’s about idle connections being dropped, and it’s a distinct issue from why we need PgBouncer. RDS, Cloud SQL and Azure Postgres silently close connections that sit idle (typically after ~300s), and Airflow then trips over the dead socket: psycopg2.OperationalError: SSL SYSCALL error: EOF detected. The Airflow docs call this out specifically for managed Postgres:

“For managed Postgres such as Azure Postgresql, CloudSQL, Amazon RDS, you should use keepalives_idle in the connection parameters and set it to less than the idle time because those services will close idle connections after some time of inactivity (typically 300 seconds), which results with error … psycopg2.operationalerror: SSL SYSCALL error: EOF detected.” (Set up a Database Backend)

The fix is TCP keepalives via sql_alchemy_connect_args, pointed at a dict in your local settings (same source):

# airflow_local_settings.py
keepalive_kwargs = {
    "keepalives": 1,
    "keepalives_idle": 30,
    "keepalives_interval": 5,
    "keepalives_count": 5,
}

# airflow.cfg / env
[database]
sql_alchemy_connect_args = airflow_local_settings.keepalive_kwargs

PgBouncer: not optional for Postgres

A connection pooler sits between Airflow and Postgres, multiplexing many client connections onto a small set of real server connections. The Airflow docs recommend PgBouncer “for all Postgres production installations,” and the chart ships a pre-configured instance behind a boolean flag.

# values.yaml
pgbouncer:
  enabled: true
  maxClientConn: 100 # max client conns PgBouncer accepts
  metadataPoolSize: 10 # real server conns to the metadata DB
  resultBackendPoolSize: 5 # real server conns to the Celery result backend

The shape that makes this work is transaction pooling. The chart’s PgBouncer config uses pool_mode = transaction on listen_port = 6543 (production guide):

[pgbouncer]
pool_mode = transaction
listen_port = 6543
auth_type = scram-sha-256
max_client_conn = 100
ignore_startup_parameters = extra_float_digits

In transaction mode a server connection is assigned to a client only for the duration of a transaction, then returned to the pool. That’s what lets metadataPoolSize: 10 serve a hundred Airflow client connections; the real backends are released between transactions instead of being held idle. The effect: your 90 worst-case client connections collapse onto ~10–15 real Postgres backends.

When you enable PgBouncer, Airflow components connect to PgBouncer’s service and port (6543), not to Postgres directly. If you store DB creds in a Kubernetes Secret rather than the values file, the connection string has to target PgBouncer:

kubectl create secret generic airflow-metadata \
  --from-literal=connection=postgresql://user:pass@RELEASE-pgbouncer.airflow:6543/airflow-metadata

Scheduler ×2 ─┐
API server ×2 ┤
DAG processor ┼──▶  PgBouncer (transaction mode, :6543)  ──▶  Postgres
Triggerer    ─┘                                  ~10–15 real conns

Two gotchas with transaction pooling

extra_float_digits must be ignored. Postgres clients set session parameters on connect; in transaction mode those don’t survive across the shared backends. The chart already adds ignore_startup_parameters = extra_float_digits, which is why it works out of the box. If you hand-roll PgBouncer, omitting it produces cryptic connection failures.

Prepared statements and session state don’t carry across transactions. Transaction pooling is incompatible with anything that relies on session-scoped state surviving between transactions. Airflow core is built to tolerate this; just don’t point your own session-stateful tooling at the 6543 port expecting session semantics.

Running more than one scheduler (and what the DB needs)

Scheduler HA in Airflow relies entirely on the database: no Raft, no Zookeeper. The schedulers coordinate through row-level locks (SELECT ... FOR UPDATE) so that only one scheduler is in the “critical section” (where task instances move from scheduled to queued) at a time, respecting pool and concurrency limits. (Scheduler HA)

This needs a database that implements SKIP LOCKED / NOWAIT:

PostgreSQL 12+ and MySQL 8.0+: fully supported, run as many schedulers as you like, no extra config.
MariaDB: didn’t implement those clauses until 10.6; multiple schedulers have produced deadlocks. Avoid for HA.
SQL Server: untested with HA.
Keep [scheduler] use_row_level_locking = True (the default). If you set it to False, you must run exactly one scheduler.

Practically: use Postgres 12+ or MySQL 8+, and the HA story is free.

Keep the database from growing forever

The metadata DB accretes task instance history, logs metadata, XCom rows, and serialized DAG versions. Left alone it grows without bound and queries slow down. The chart can schedule airflow db clean as a CronJob:

databaseCleanup:
  enabled: true
  retentionDays: 90

This matters more once you’re versioning DAGs and storing XCom references (Part 4); both add rows. Decide a retention window early; reclaiming space from a bloated Airflow DB later is no fun.

Next: Part 3: executors and the execution lifecycle, including how a worker actually receives the task it runs and what happens when code changes mid-run.