Summary:

A deadlock was introduced in Redpanda 26.1.1 within the transactional state machine (rm_stm). Under transactional or idempotent produce workloads, a three-way circular lock inversion between _state_lock and _apply_lock can cause the affected partition to hang permanently. Once triggered, no produce or transactional operation can progress on that partition and recovery requires restarting the broker.

This is fixed in Redpanda v26.1.6, releasing on April 21, 2026

Severity:

High

Redpanda Products Affected:

Redpanda Self-Managed - Enterprise
Redpanda Self-Managed - Community

Release Affected:

Redpanda Core v26.1.1 - v26.1.5

Identification:

You are potentially impacted if ALL of the following are true:

You are running Redpanda v26.1.1 - v26.1.5.
Your workload uses Kafka transactions (producers configured with a transactional.id)
- rpk cluster txn list — if it returns any results, transactions are in use.

The deadlock is more likely to be triggered on clusters with high transactional write throughput, frequent Raft leadership changes (e.g. during a rolling upgrade), or a large number of partitions.

Additionally, if you are impacted you may see the following log entries on affected brokers:

WARN rm_stm.cc:1039 - Timed out while waiting for offset: XXXXXXX, ms_since_last_update: 10054ms, status: ongoing

ERROR tx_gateway_frontend.cc - begin_tx result: tx::errc::leader_not_found

INFO consensus.cc:227 - [external_stepdown - do_commit_tx wait error] Stepping down as leader

Impact:

Once triggered, the deadlock can cause the affected partition to hang. This results in:

Transaction failures — all transactional and idempotent produce operations on the affected partition hang permanently until the broker is restarted.
REQUEST_TIMED_OUT errors on Kafka producers — Kafka Streams and other EOS applications will see repeated retries with essentially infinite retry counts.
Flat consumer lag — with Exactly-Once Semantics, uncommitted records are invisible to read_committed consumers. Lag appearing flat is not a sign of health — it means transactions are failing to commit.
Leadership transfer hangs — the deadlocked partition cannot complete leadership transfers, blocking rolling upgrades and rebalancing on the affected shard.
Leaderless partitions — repeated leader stepdowns from STM timeouts cause partitions to cycle through elections. rpk cluster health may report leaderless partitions on the affected brokers.

Action required:

Running Redpanda v26.1.1–26.1.5 with a transactional workload:

Upgrade to 26.1.6 as soon as possible. There are no configuration-level workarounds. If a broker is currently in a deadlocked state, restart the affected broker to restore partition availability, then upgrade.

Running Redpanda v26.1.1–26.1.5 without transactions:

No immediate action required. When upgrading, proceed directly to 26.1.6 rather than stopping at an intermediate 26.1.x release.

Running Redpanda v25.x or earlier:

Not affected. When upgrading to the 26.1 series, upgrade directly to 26.1.6.

Redpanda Cloud:

Redpanda Cloud customers will be automatically moved to 26.1.6 as part of their scheduled maintenance window.

Questions? If you have any questions on this TSB, or need further guidance, please contact Redpanda Support