Summary:
A bug in Redpanda could cause topics that have Tiered Storage enabled on a previously compacted topic to stop being uploaded to Tiered Storage, and the error New segment does not line up with previous segment to be output to the logs.
Historically this has only been seen impacting the __consumer_offsets topic, but could impact any compacted topic with a cleanup.policy set to compact.
Severity:
Medium
Redpanda Products Affected:
- Redpanda Self-Managed - Enterprise
- Redpanda Self-Managed - Community
Release Affected:
Redpanda Versions prior to 24.1.15
Identification:
Redpanda Clusters on versions below 24.1.15 with Tiered Storage enabled could be impacted by this bug.
To confirm you can check to see if you see the following message in your cluster logs.
New segment does not line up with previous segment
If you see many of these messages for a single topic, that topic may be impacted. Only topics starting at offset 0, and using cleanup.policy set to compact can be impacted.
If a Topic is impacted you can expect to see the topic size growing continuously. You can check the topic size by using the vectorized_storage_log_partition_size metric aggregated by topic.
One example is
sum by(topic) (vectorized_storage_log_partition_size{redpanda_cluster="<cluster-name>", host=".*", topic="topic_name" })
Impact:
If a topic is impacted by this bug, that topic will not be able to be uploaded to tiered storage causing the topic size on disk to grow. Depending on the size and topic this can have different impacts.
Historically we’ve only seen this issue affect the __consumer_offsets topic, and this can negatively impact any operation that interacts with the __consumer_offsets topic. This can cause consumer groups to be unable to consume from the cluster.
Action required:
For Impacted __consumer_offsets topic
If you are experiencing impact to your cluster you can mitigate this issue by turning off remote.writes for the __consumer_offsets topic. This will allow compaction to clean up the additional space used. If the __consumer_offsets topic was impacted this will also unblock any operations that rely on that topic. Depending on the size of the topic this process could take multiple hours.
To turn off remote writes, you can use the rpk topic alter-config command to set the redpanda.remote.write property to false. Here's how you can do it:
rpk topic alter-config __consumer_offsets --set redpanda.remote.write=false
Replace <topic_name> with the name of the topic for which you want to disable remote writes.
This command will disable the uploading of data from Redpanda to object storage for the specified topic.
NOTE: For the __consumer_offsets topic topic you will not need to turn remote writes back on.
You can use the above mentioned metrics to verify if the topic size is shrinking after disabling remote writes.
For impacted topics other than __consumer_offsets
If you are experiencing impact to your cluster you can mitigate this issue by turning off remote.writes for the impacted topic. This will allow compaction to clean up the additional space used. If the __consumer_offsets topic was impacted this will also unblock any operations that rely on that topic. Depending on the size of the topic this process could take multiple hours.
In order to turn remote writes back on for an impacted topic you will need to upgrade the cluster to at least 25.1.1, as new features prevent issues with disabling and enabling Tiered Storage. Please reach out to the Redpanda Support Team to understand the impact of disabling remote.writes on a topic before doing so.
To turn off remote writes, you can use the rpk topic alter-config command to set the redpanda.remote.write property to false. Here's how you can do it:
rpk topic alter-config <topic_name> --set redpanda.remote.write=false
Replace <topic_name> with the name of the topic for which you want to disable remote writes.
This command will disable the uploading of data from Redpanda to object storage for the specified topic.
NOTE: For the __consumer_offsets topic topic you will not need to turn remote writes back on.
You can use the above mentioned metrics to verify if the topic size is shrinking after disabling remote writes.
Follow this command to enable remote writes for your topic again after upgrading to 25.1.1
rpk topic alter-config <topic_name> --set redpanda.remote.write=true
Replace <topic_name> with the name of the topic for which you want to enable remote writes.
This command will enable the uploading of data from Redpanda to object storage for the specified topic.
If you are not experiencing this issue and on an impacted version, or have experienced it and mitigated the issue, we recommend that you upgrade to at least Redpanda Version 24.1.15 or a newer Supported Version to prevent the issue from occurring.
Questions? If you have any questions on this TSB, or need further guidance, please contact Redpanda Support