Problem Statement
If Redpanda events contain timestamps which are set to a date/time in the future it can prevent segment/logs from being deleted via retention mechanism, which can lead to low disk space issues.
Detail
Event timestamps with a date in the future are also known as "bogus retention timestamps"
Typical causes of bogus retention timestamps would be:-
- Out of synch system clock on the machine of the client/producer
- An application/client setting the timestamp manually for the event to a date in future.
Redpanda has a storage garbage-collector (storage-gc) which scans events to see if they are eligible for deletion based on the topics retention policy.
By default storage-gc mechanism will not remove segments that have events containing "bogus retention timestamps".
Any events/segments created later than the segment which contains the bogus retention timestamp will also not be removed(even if their timestamps are legitimate.)
This is where you can encounter disk space issues
To provide an example of a bogus retention timestamp ....
We have topic named time-test with a single partition and short retention time of 10 minutes set via topic config :
retention.ms 600000
When viewing the topics data directory : /var/lib/redpanda/data/kafka/time-test/
(viewed at Mon Jun 19 11:36:03 UTC 2023)
/var/lib/redpanda/data/kafka/time-test/0_21:
-rw-r--r-- 1 redpanda redpanda 81 Jun 19 11:45 1568918-3-v1.base_index
-rw-r--r-- 1 redpanda redpanda 164 Jun 19 11:45 1568918-3-v1.log
-rw-r--r-- 1 redpanda redpanda 1.2K Jun 19 11:41 tx.snapshot
-rw-r--r-- 1 redpanda redpanda 81 Jun 19 11:29 1568917-2-v1.base_index
-rw-r--r-- 1 redpanda redpanda 164 Jun 19 11:29 1568917-2-v1.log
-rw-r--r-- 1 redpanda redpanda 9.7K Jun 19 11:20 1523846-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 37M Jun 19 11:20 1523846-1-v1.log
-rw-r--r-- 1 redpanda redpanda 28K Jun 19 10:37 1371988-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 130M Jun 19 10:37 1371988-1-v1.log
-rw-r--r-- 1 redpanda redpanda 27K Jun 19 10:37 1220081-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 130M Jun 19 10:37 1220081-1-v1.log
-rw-r--r-- 1 redpanda redpanda 27K Jun 19 10:37 1068184-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 130M Jun 19 10:37 1068184-1-v1.log
-rw-r--r-- 1 redpanda redpanda 25K Jun 19 10:37 916094-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 130M Jun 19 10:37 916094-1-v1.log
-rw-r--r-- 1 redpanda redpanda 22K Jun 19 10:36 763466-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 130M Jun 19 10:36 763466-1-v1.log
-rw-r--r-- 1 redpanda redpanda 24K Jun 19 10:05 611262-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 130M Jun 19 10:05 611262-1-v1.log
-rw-r--r-- 1 redpanda redpanda 27K Jun 19 09:45 459298-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 130M Jun 19 09:45 459298-1-v1.log
-rw-r--r-- 1 redpanda redpanda 26K Jun 19 09:45 307329-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 130M Jun 19 09:45 307329-1-v1.log
-rw-r--r-- 1 redpanda redpanda 27K Jun 19 09:45 155375-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 130M Jun 19 09:45 155375-1-v1.log
We can see, there are numerous log files that have “rolled” at 130MB and no longer active meaning they are candidates for retention deletion based on the event timestamps.
The timestamp of the files are older than the topics 10 minute retention time(retention.ms) so it would indicate they are candidates for deletion via Redpanda's storage garbage-collector (storage-gc)
If we take a look at the Redpanda logs , can see more detail of what is going on .
Redpanda logs :
journalctl -u redpanda --no-pager --since today|grep "found segment with bogus retention timestamp"
..WARN 2023-06-19 11:00:46,164 [shard 1] storage-gc - disk_log_impl.cc:261 - [{kafka/time-test/0}] found segment with bogus retention timestamp: {timestamp: 1687219279849} (base {timestamp: 1687219210598}, max {timestamp: 1687219279849}) - {offset_tracker:{term:1, base_offset:155375, committed_offset:307328, dirty_offset:307328}, compacted_segment=0, finished_self_compaction=0, generation={17159}, reader={/var/lib/redpanda/data/kafka/time-test/0_21/155375-1-v1.log, (129385955 bytes)}, writer=nullptr, cache={cache_size=8579}, compaction_index:nullopt, closed=0, tombstone=0, index={file:/var/lib/redpanda/data/kafka/time-test/0_21/155375-1-v1.base_index, offsets:{155375}, index:{header_bitflags:0, base_offset:{155375}, max_offset:{307328}, base_timestamp:{timestamp: 1687219210598}, max_timestamp:{timestamp: 1687219279849}, batch_timestamps_are_monotonic:1, index(1704,1704,1704)}, step:32768, needs_persistence:0}}
In this case.. (a rather extreme one)
The log with the bogus retention timestamp is :/var/lib/redpanda/data/kafka/time-test/0_21/155375-1-v1.log
The problem timestamp referenced :
found segment with bogus retention timestamp: {timestamp: 1687219279849
convertsdate -d @$(echo "(1687219279849 + 500) / 1000" | bc)
to
Tue Jun 20 00:01:20 UTC 2023
Which is a timestamp in approx 12 hours in the future, hence the bogus retention timestamp.
Solution
Note:
One way round to avoid issue with "bogus retention timestamps" events,
could be to use the topic config parameter :
message.timestamp.type=LogAppendTime
This parameter means the broker system time will be used for event timestamps.
However, there is currently a bug : Issue 11922
which means that events created by clients with future based timestamps still trigger the "bogus retention timestamps" scenario.
So the options below are still applicable for topics created with message.timestamp.type=LogAppendTime
When the bug is resolved this document will be updated to reflect that.
Option 1 - Cluster param storage_ignore_timestamps_in_future_sec
In the latest Redpanda 22.x and 23.x releases
there is a Cluster configuration parameter to help deal with bogus retention timestamps:
storage_ignore_timestamps_in_future_sec
If this value is set (defaults to null) , the storage-gc component will check the value of storage_ignore_timestamps_in_future_sec , compare the broker's current-time and the event timestamp to work out of skew .
If Redpanda finds a skew >= storage_ignore_timestamps_in_future_sec , storage-gc then ignores the events's timestamp and instead uses a valid timestamp of another event in the same segment, or if another event's valid timestamp is unavailable, the timestamp of when the segment file was last modified (mtime).
In the example case above …the timestamp on the events is approx 12 hours ahead of the system clock on the Redpanda broker machine
So adjusting the cluster config param storage_ignore_timestamps_in_future_sec
to 60 seconds will trigger the storage-gc take action on the events with future based timestamps.
rpk cluster config set storage_ignore_timestamps_in_future_sec
60
Now in the Redpanda logs we can see storage-gc reports a different message ..
Jun 19 11:48:41 ip-10-0-8-55 rpk[2774569]: WARN 2023-06-19 11:48:41,549 [shard 1] storage-gc - disk_log_impl.cc:818 - [{kafka/time-test/0}] Timestamp in future detected, check client clocks. Adjusting retention timestamp from {timestamp: 1687219279849} to file mtime {timestamp: 1687167913014} on /var/lib/redpanda/data/kafka/time-test/0_21/155375-1-v1.log
Jun 19 11:48:41 ip-10-0-8-55 rpk[2774569]: WARN 2023-06-19 11:48:41,550 [shard 1] storage-gc - disk_log_impl.cc:818 - [{kafka/time-test/0}] Timestamp in future detected, check client clocks. Adjusting retention timestamp from {timestamp: 1687219291308} to file mtime {timestamp: 1687167918261} on /var/lib/redpanda/data/kafka/time-test/0_21/307329-1-v1.log
Jun 19 11:48:41 ip-10-0-8-55 rpk[2774569]: WARN 2023-06-19 11:48:41,550 [shard 1] storage-gc - disk_log_impl.cc:818 - [{kafka/time-test/0}] Timestamp in future detected, check client clocks. Adjusting retention timestamp from {timestamp: 1687219303867} to file mtime {timestamp: 1687167945608} on /var/lib/redpanda/data/kafka/time-test/0_21/459298-1-v1.log
Jun 19 11:48:41 ip-10-0-8-55 rpk[2774569]: WARN 2023-06-19 11:48:41,550 [shard 1] storage-gc - disk_log_impl.cc:818 - [{kafka/time-test/0}] Timestamp in future detected, check client clocks. Adjusting retention timestamp from {timestamp: 1687219309281} to file mtime {timestamp: 1687169110956} on /var/lib/redpanda/data/kafka/time-test/0_21/611262-1-v1.log
This is telling us there were no other timestamps in events suitable , so the retention timestamps have been changed to the mtime of the actual log files, thus making them available for evictions.
Checking the data directory again (at Mon Jun 19 11:59) ,
we can see the non-active logs which were > 10 minutes(retention.ms=600000 ) old, have now been removed
date;ls -lthR /var/lib/redpanda/data/kafka/time-test/*
Mon Jun 19 11:59:52 UTC 2023
/var/lib/redpanda/data/kafka/time-test/0_21:
total 33M
-rw-r--r-- 1 redpanda redpanda 140 Jun 19 11:48 snapshot
-rw-r--r-- 1 redpanda redpanda 1.2K Jun 19 11:48 tx.snapshot
-rw-r--r-- 1 redpanda redpanda 4.0K Jun 19 11:48 1568919-4-v1.log
-rw-r--r-- 1 redpanda redpanda 81 Jun 19 11:45 1568918-3-v1.base_index
-rw-r--r-- 1 redpanda redpanda 164 Jun 19 11:45 1568918-3-v1.log
Option 2 - tweak topic parameter topic retention.bytes
(This is more for legacy reasons. If you are on older releases of redpanda or for some reason don't want to utilise storage_ignore_timestamps_in_future_sec
option. Take caution with this approach as risk of removing events that have not passed retention time )
To get round bogus retention timestamps is also possible to adjust retention.bytes of a topic temporarily ..
See Documentation for more on size based retention :
https://docs.redpanda.com/docs/manage/cluster-maintenance/disk-utilization/#size-based-retention
Example topic with 10 partitions and a short retention time of 10 minutes for illustrative purposes
rpk topic describe -a time-bug
SUMMARY
=======
NAME time-bug
PARTITIONS 10
REPLICAS 1
CONFIGS
=======
KEY VALUE SOURCE
cleanup.policy delete DYNAMIC_TOPIC_CONFIG
compression.type producer DEFAULT_CONFIG
max.message.bytes 1048576 DEFAULT_CONFIG
message.timestamp.type CreateTime DEFAULT_CONFIG
redpanda.remote.delete true DEFAULT_CONFIG
redpanda.remote.read false DEFAULT_CONFIG
redpanda.remote.write false DEFAULT_CONFIG
retention.bytes -1 DEFAULT_CONFIG
retention.local.target.bytes -1 DEFAULT_CONFIG
retention.local.target.ms 86400000 DEFAULT_CONFIG
retention.ms 600000 DYNAMIC_TOPIC_CONFIG
segment.bytes 5000000 DYNAMIC_TOPIC_CONFIG
segment.ms 1209600000 DEFAULT_CONFIG
PARTITIONS
==========
PARTITION LEADER EPOCH REPLICAS LOG-START-OFFSET HIGH-WATERMARK
0 0 2 [0] 1060487 1387098
1 0 2 [0] 1005291 1338914
2 0 2 [0] 997621 1326536
3 0 2 [0] 1017114 1350107
4 0 2 [0] 998114 1329691
5 0 2 [0] 1014902 1352799
6 0 2 [0] 1000134 1336685
7 0 2 [0] 1007809 1337255
8 0 2 [0] 153263 1321078
9 0 2 [0] 152053 1338368
Checking the redpanda logs it is showing that partitions 8 + 9 have bogus retention timestamps.
(note: in the above "rpk topic" output.. partitions 8 + 9 also have much lower LOG-START-OFFSET than the rest)
Jul 10 08:48:28 ip-10-0-8-55 rpk[775]: WARN 2023-07-10 08:48:28,499 [shard 0] storage-gc - disk_log_impl.cc:261 - [{kafka/time-bug/8}] found segment with bogus retention timestamp: {timestamp: 2060094277119} (base {timestamp: 2060094277119}, max {timestamp: 2060094277119}) - {offset_tracker:{term:1, base_offset:153264, committed_offset:159416, dirty_offset:159416}, compacted_segment=0, finished_self_compaction=0, generation={0}, reader={/var/lib/redpanda/data/kafka/time-bug/8_150/153264-1-v1.log, (5248341 bytes)}, writer=nullptr, cache={cache_size=0}, compaction_index:nullopt, closed=0, tombstone=0, index={file:/var/lib/redpanda/data/kafka/time-bug/8_150/153264-1-v1.base_index, offsets:{153264}, index:{header_bitflags:0, base_offset:{153264}, max_offset:{159416}, base_timestamp:{timestamp: 2060094277119}, max_timestamp:{timestamp: 2060094277119}, batch_timestamps_are_monotonic:0, index(89,89,89)}, step:32768, needs_persistence:0}}
Jul 10 08:48:30 ip-10-0-8-55 rpk[775]: WARN 2023-07-10 08:48:30,640 [shard 1] storage-gc - disk_log_impl.cc:261 - [{kafka/time-bug/9}] found segment with bogus retention timestamp: {timestamp: 2060094296631} (base {timestamp: 2060094296631}, max {timestamp: 2060094296631}) - {offset_tracker:{term:1, base_offset:152054, committed_offset:158095, dirty_offset:158095}, compacted_segment=0, finished_self_compaction=0, generation={0}, reader={/var/lib/redpanda/data/kafka/time-bug/9_150/152054-1-v1.log, (5157410 bytes)}, writer=nullptr, cache={cache_size=0}, compaction_index:nullopt, closed=0, tombstone=0, index={file:/var/lib/redpanda/data/kafka/time-bug/9_150/152054-1-v1.base_index, offsets:{152054}, index:{header_bitflags:0, base_offset:{152054}, max_offset:{158095}, base_timestamp:{timestamp: 2060094296631}, max_timestamp:{timestamp: 2060094296631}, batch_timestamps_are_monotonic:0, index(88,88,88)}, step:32768, needs_persistence:0}}
Checking the directory for these partitions shows the segment logs are well over the retention time of 10 minutes
date;ls -ltr /var/lib/redpanda/data/kafka/time-bug/9_150|head
Mon Jul 10 09:07:08 UTC 2023
total 800872
-rw-r--r-- 1 redpanda redpanda 146 Jul 7 08:58 snapshot
-rw-r--r-- 1 redpanda redpanda 5157410 Jul 10 08:28 152054-1-v1.log
-rw-r--r-- 1 redpanda redpanda 1473 Jul 10 08:28 152054-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 5149126 Jul 10 08:28 158096-1-v1.log
-rw-r--r-- 1 redpanda redpanda 1521 Jul 10 08:28 158096-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 5169860 Jul 10 08:28 164124-1-v1.log
-rw-r--r-- 1 redpanda redpanda 1601 Jul 10 08:28 164124-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 5179348 Jul 10 08:28 170173-1-v1.log
date;ls -ltr /var/lib/redpanda/data/kafka/time-bug/8_150|head
Mon Jul 10 09:07:08 UTC 2023
total 780100
-rw-r--r-- 1 redpanda redpanda 146 Jul 7 08:58 snapshot
-rw-r--r-- 1 redpanda redpanda 5248341 Jul 10 08:28 153264-1-v1.log
-rw-r--r-- 1 redpanda redpanda 1489 Jul 10 08:28 153264-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 5246975 Jul 10 08:28 159417-1-v1.log
-rw-r--r-- 1 redpanda redpanda 1489 Jul 10 08:28 159417-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 5268600 Jul 10 08:28 165566-1-v1.log
-rw-r--r-- 1 redpanda redpanda 1521 Jul 10 08:28 165566-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 5241930 Jul 10 08:28 171737-1-v1.log
-rw-r--r-- 1 redpanda redpanda 1441 Jul 10 08:28 171737-1-v1.base_index
Running "du -shc" on the relevant data-directory shows that partitions 8 + 9 have a clear skew with the other partitions as well.
du -shc /var/lib/redpanda/data/kafka/time-bug/*
208M /var/lib/redpanda/data/kafka/time-bug/0_139
211M /var/lib/redpanda/data/kafka/time-bug/1_150
208M /var/lib/redpanda/data/kafka/time-bug/2_150
210M /var/lib/redpanda/data/kafka/time-bug/3_150
211M /var/lib/redpanda/data/kafka/time-bug/4_150
212M /var/lib/redpanda/data/kafka/time-bug/5_150
215M /var/lib/redpanda/data/kafka/time-bug/6_150
210M /var/lib/redpanda/data/kafka/time-bug/7_150
762M /var/lib/redpanda/data/kafka/time-bug/8_150
783M /var/lib/redpanda/data/kafka/time-bug/9_150
3.2G total
Adjust the segment.bytes parameter for the topic..
segment.bytes is based on the space used at partition level..
In the the example above the typical partition size is in a range of 208MB ->215MB
where the problem partitions * + 9 are > 750MB
So changing retention.bytes to 300MB in this case should be safe for size based retention to kick in for the 2 partitions containing logs with bogus retention timestamps
rpk topic alter-config time-bug --set retention.bytes= 300000000
(in real world.. might need further tweaking of retention.bytes which is why caution needs to be taken when doing this approach and storage_ignore_timestamps_in_future_sec
is the preferred approach)
You should see that the redpanda logs is no longer reporting the warning about bogus retention timestamps.. and du command shows space has been reclaimed...
du -shc /var/lib/redpanda/data/kafka/time-bug/*
205M /var/lib/redpanda/data/kafka/time-bug/0_139
211M /var/lib/redpanda/data/kafka/time-bug/1_150
204M /var/lib/redpanda/data/kafka/time-bug/2_150
210M /var/lib/redpanda/data/kafka/time-bug/3_150
215M /var/lib/redpanda/data/kafka/time-bug/4_150
209M /var/lib/redpanda/data/kafka/time-bug/5_150
200M /var/lib/redpanda/data/kafka/time-bug/6_150
209M /var/lib/redpanda/data/kafka/time-bug/7_150
211M /var/lib/redpanda/data/kafka/time-bug/8_150
201M /var/lib/redpanda/data/kafka/time-bug/9_150
Now set segment.bytes back to original value
rpk topic alter-config time-bug --set retention.bytes=-1