Summary:
A recent change in Tiered Storage could impact clients with negative or single digit millisecond fetch.max.wait configuration. Our internal Schema Registry client uses negative fetch.max.wait, and hit the bug leading to schemas that had to be read from Tiered Storage failing to load
Recently, we improved Tiered Storage to correctly respect the fetch.max.wait timeout during request processing. As part of that work, a safeguard was introduced to handle cases where fetch.max.wait is configured too low to allow Tiered Storage to make meaningful progress. In this context, "too low" means the timeout expires before there is sufficient time to even initiate a single object download from remote storage (low single digit milliseconds). When this occurs, the system automatically falls back to the kafka_fetch_request_timeout_ms configuration value to ensure the request can proceed successfully.
However, the implementation of this safeguard contained a bug that prevented Tiered Storage from properly limiting the timeout value, resulting in failing to fetch schemas from Tiered Storage.
When combined with the negative fetch.max.wait values sent by the Schema Registry client, this bug could result in schema fetch operations failing entirely. In affected environments, it may become impossible to retrieve schemas from Redpanda. Additionally, clients that send negative fetch.max.wait values may also experience time outs as well.
This bug has been patched in Redpanda version 25.3.9 and above.
Severity:
Medium
Redpanda Products Affected:
- Redpanda Self-Managed - Enterprise
- Redpanda Self-Managed - Community
Release Affected:
- 25.3.7
- 25.3.8
Identification:
- You are on an impacted version of Redpanda.
-
Tiered Storage is enabled.
- You can check this with rpk cluster config get cloud_storage_enabled if this returns true, you have Tiered Storage enabled.
- For clients outside Schema Registry, the fetch.max.wait value is either negative or low milliseconds.
Impact:
In affected environments, it may become impossible to read topics, including schemas, from Tiered Storage in Redpanda.
You may see the following in Redpanda Console when trying to access the Schema Registry
You may also receive 500 errors when trying to access the Schema Registry via various means, such as RPK or via api.
Action required:
Redpanda Self Hosted Customers:
Immediate Action: Avoid upgrading to Redpanda version 25.3.7 or 25.3.8.
Remediation: If on an impacted version upgrade and experiencing issues, upgrade to Redpanda 25.3.9 or above. All others should prioritize upgrading to 25.3.9 at their convenience.
Clients Sending a Negative fetch.max.wait:
Immediate Action: Adjust any connection timeout settings to reasonably high values, suggested starting point 50 seconds. Reach out to support below if needed.
Remediation: If using a client experiencing issues, upgrade to Redpanda 25.3.9 or above. All others should prioritize upgrading to 25.3.9 at their convenience.
Questions? If you have any questions on this TSB, or need further guidance, please contact Redpanda Support