Summary:
Redpanda brokers could crash if a reactor stall report is emitted at an inopportune time. Although uncommon, reactor stall reports can occur on Redpanda clusters under higher load, and these reports are used to assist with support investigations. If a report is emitted at a very specific moment when the interrupted path of execution is unwinding the stack, a segfault may occur. Further details are outlined in the Github issue here.
Severity:
Med
Redpanda Products Affected:
- Redpanda Self-Managed - Enterprise
- Redpanda Self-Managed - Community
Release Affected:
Redpanda versions < 24.2.22
Redpanda 24.3 versions < 24.3.11
Redpanda version 25.1.1
Identification:
If you are on an impacted version you could experience this issue. The key symptom is that the Redpanda broker terminates without any error message or backtrace, with an exit code of 139. If you see brokers terminate in this manner you could be experiencing this issue.
Impact:
This issue can cause Redpanda brokers to crash, typically a single broker can recover, and depending on your system this can have performance impacts.
Action required:
This issue will be resolved in Redpanda Versions 24.2.22, 24.3.1, and 25.1.2. You should upgrade to one of the patched versions as soon as they are available. These releases should all be available the week of the 28th and can be checked here.
To mitigate the issue for clusters on impacted versions you need to set the following Seastar startup flag.
--blocked-reactor-reports-per-minute=0
This flag will disable the reactor stall reports and prevent the inopportune log event from occurring that triggers the crash.
Via Redpanda.yaml for Manual Deployments
Add the flag to the rpk section in your redpanda.yaml configuration file:
rpk:
additional_start_flags:
- "--blocked-reactor-reports-per-minute=0"
This configuration should be located in /etc/redpanda/redpanda.yaml or your custom configuration location.
Via Helm/Operator for Kubernetes Deployments
you can set this in your values file:
config:
rpk:
additional_start_flags:
- "--blocked-reactor-reports-per-minute=0"
Then apply it with:
helm upgrade --install redpanda redpanda/redpanda --namespace <namespace> --create-namespace --values your-values.yaml
Via Ansible for Automated OS Package Deployments
For deployments via Ansible, please reach out to Redpanda Support to assist you in updating this setting.
Questions? If you have any questions on this TSB, or need further guidance, please contact Redpanda Support