Problem Statement

If Redpanda events contain timestamps which are set to a date/time in the future it can prevent segment/logs from being deleted via retention mechanism, which can lead to low disk space issues.

Detail

Event timestamps with a date in the future are also known as "bogus retention timestamps"

Typical causes of bogus retention timestamps would be:-

Out of synch system clock on the machine of the client/producer
An application/client setting the timestamp manually for the event to a date in future.

Redpanda has a storage garbage-collector (storage-gc) which scans events to see if they are eligible for deletion based on the topics retention policy.
By default storage-gc mechanism will not remove segments that have events containing "bogus retention timestamps".
Any events/segments created later than the segment which contains the bogus retention timestamp will also not be removed(even if their timestamps are legitimate.)
This is where you can encounter disk space issues

To provide an example of a bogus retention timestamp ....
We have topic named time-test with a single partition and short retention time of 10 minutes set via topic config :

retention.ms 600000

When viewing the topics data directory : /var/lib/redpanda/data/kafka/time-test/
(viewed at Mon Jun 19 11:36:03 UTC 2023)

/var/lib/redpanda/data/kafka/time-test/0_21:
-rw-r--r-- 1 redpanda redpanda 81 Jun 19 11:45 1568918-3-v1.base_index
-rw-r--r-- 1 redpanda redpanda 164 Jun 19 11:45 1568918-3-v1.log
-rw-r--r-- 1 redpanda redpanda 1.2K Jun 19 11:41 tx.snapshot
-rw-r--r-- 1 redpanda redpanda 81 Jun 19 11:29 1568917-2-v1.base_index
-rw-r--r-- 1 redpanda redpanda 164 Jun 19 11:29 1568917-2-v1.log
-rw-r--r-- 1 redpanda redpanda 9.7K Jun 19 11:20 1523846-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 37M Jun 19 11:20 1523846-1-v1.log
-rw-r--r-- 1 redpanda redpanda 28K Jun 19 10:37 1371988-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 130M Jun 19 10:37 1371988-1-v1.log
-rw-r--r-- 1 redpanda redpanda 27K Jun 19 10:37 1220081-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 130M Jun 19 10:37 1220081-1-v1.log
-rw-r--r-- 1 redpanda redpanda 27K Jun 19 10:37 1068184-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 130M Jun 19 10:37 1068184-1-v1.log
-rw-r--r-- 1 redpanda redpanda 25K Jun 19 10:37 916094-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 130M Jun 19 10:37 916094-1-v1.log
-rw-r--r-- 1 redpanda redpanda 22K Jun 19 10:36 763466-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 130M Jun 19 10:36 763466-1-v1.log
-rw-r--r-- 1 redpanda redpanda 24K Jun 19 10:05 611262-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 130M Jun 19 10:05 611262-1-v1.log
-rw-r--r-- 1 redpanda redpanda 27K Jun 19 09:45 459298-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 130M Jun 19 09:45 459298-1-v1.log
-rw-r--r-- 1 redpanda redpanda 26K Jun 19 09:45 307329-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 130M Jun 19 09:45 307329-1-v1.log
-rw-r--r-- 1 redpanda redpanda 27K Jun 19 09:45 155375-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 130M Jun 19 09:45 155375-1-v1.log

We can see, there are numerous log files that have “rolled” at 130MB and no longer active meaning they are candidates for retention deletion based on the event timestamps.

The timestamp of the files are older than the topics 10 minute retention time(retention.ms) so it would indicate they are candidates for deletion via Redpanda's storage garbage-collector (storage-gc)

If we take a look at the Redpanda logs , can see more detail of what is going on .

Redpanda logs :

 journalctl -u redpanda --no-pager --since today|grep "found segment with bogus retention timestamp"

..WARN  2023-06-19 11:00:46,164 [shard 1] storage-gc - disk_log_impl.cc:261 - [{kafka/time-test/0}] found segment with bogus retention timestamp: {timestamp: 1687219279849} (base {timestamp: 1687219210598}, max {timestamp: 1687219279849}) - {offset_tracker:{term:1, base_offset:155375, committed_offset:307328, dirty_offset:307328}, compacted_segment=0, finished_self_compaction=0, generation={17159}, reader={/var/lib/redpanda/data/kafka/time-test/0_21/155375-1-v1.log, (129385955 bytes)}, writer=nullptr, cache={cache_size=8579}, compaction_index:nullopt, closed=0, tombstone=0, index={file:/var/lib/redpanda/data/kafka/time-test/0_21/155375-1-v1.base_index, offsets:{155375}, index:{header_bitflags:0, base_offset:{155375}, max_offset:{307328}, base_timestamp:{timestamp: 1687219210598}, max_timestamp:{timestamp: 1687219279849}, batch_timestamps_are_monotonic:1, index(1704,1704,1704)}, step:32768, needs_persistence:0}}

In this case.. (a rather extreme one)

The log with the bogus retention timestamp is :

/var/lib/redpanda/data/kafka/time-test/0_21/155375-1-v1.log

The problem timestamp referenced :

found segment with bogus retention timestamp: {timestamp: 1687219279849

converts
date -d @$(echo "(1687219279849 + 500) / 1000" | bc)

Tue Jun 20 00:01:20 UTC 2023

Which is a timestamp in approx 12 hours in the future, hence the bogus retention timestamp.

Solution

Note:
One way round to avoid issue with "bogus retention timestamps" events,
could be to use the topic config parameter :
message.timestamp.type=LogAppendTime
This parameter means the broker system time will be used for event timestamps.

However, there is currently a bug : Issue 11922
which means that events created by clients with future based timestamps still trigger the "bogus retention timestamps" scenario.
So the options below are still applicable for topics created with message.timestamp.type=LogAppendTime
When the bug is resolved this document will be updated to reflect that.

Option 1 - Cluster param storage_ignore_timestamps_in_future_sec

In the latest Redpanda 22.x and 23.x releases
there is a Cluster configuration parameter to help deal with bogus retention timestamps:

storage_ignore_timestamps_in_future_sec

(documented here)

If this value is set (defaults to null) , the storage-gc component will check the value of storage_ignore_timestamps_in_future_sec , compare the broker's current-time and the event timestamp to work out of skew .

If Redpanda finds a skew >= storage_ignore_timestamps_in_future_sec , storage-gc then ignores the events's timestamp and instead uses a valid timestamp of another event in the same segment, or if another event's valid timestamp is unavailable, the timestamp of when the segment file was last modified (mtime).

In the example case above …the timestamp on the events is approx 12 hours ahead of the system clock on the Redpanda broker machine

So adjusting the cluster config param storage_ignore_timestamps_in_future_sec to 60 seconds will trigger the storage-gc take action on the events with future based timestamps.

rpk cluster config set  storage_ignore_timestamps_in_future_sec 60

Now in the Redpanda logs we can see storage-gc reports a different message ..

Jun 19 11:48:41 ip-10-0-8-55 rpk[2774569]: WARN 2023-06-19 11:48:41,549 [shard 1] storage-gc - disk_log_impl.cc:818 - [{kafka/time-test/0}] Timestamp in future detected, check client clocks. Adjusting retention timestamp from {timestamp: 1687219279849} to file mtime {timestamp: 1687167913014} on /var/lib/redpanda/data/kafka/time-test/0_21/155375-1-v1.log
Jun 19 11:48:41 ip-10-0-8-55 rpk[2774569]: WARN 2023-06-19 11:48:41,550 [shard 1] storage-gc - disk_log_impl.cc:818 - [{kafka/time-test/0}] Timestamp in future detected, check client clocks. Adjusting retention timestamp from {timestamp: 1687219291308} to file mtime {timestamp: 1687167918261} on /var/lib/redpanda/data/kafka/time-test/0_21/307329-1-v1.log
Jun 19 11:48:41 ip-10-0-8-55 rpk[2774569]: WARN 2023-06-19 11:48:41,550 [shard 1] storage-gc - disk_log_impl.cc:818 - [{kafka/time-test/0}] Timestamp in future detected, check client clocks. Adjusting retention timestamp from {timestamp: 1687219303867} to file mtime {timestamp: 1687167945608} on /var/lib/redpanda/data/kafka/time-test/0_21/459298-1-v1.log
Jun 19 11:48:41 ip-10-0-8-55 rpk[2774569]: WARN 2023-06-19 11:48:41,550 [shard 1] storage-gc - disk_log_impl.cc:818 - [{kafka/time-test/0}] Timestamp in future detected, check client clocks. Adjusting retention timestamp from {timestamp: 1687219309281} to file mtime {timestamp: 1687169110956} on /var/lib/redpanda/data/kafka/time-test/0_21/611262-1-v1.log

This is telling us there were no other timestamps in events suitable , so the retention timestamps have been changed to the mtime of the actual log files, thus making them available for evictions.

Checking the data directory again (at Mon Jun 19 11:59) ,
we can see the non-active logs which were > 10 minutes(retention.ms=600000 ) old, have now been removed

 date;ls -lthR /var/lib/redpanda/data/kafka/time-test/*
Mon Jun 19 11:59:52 UTC 2023
/var/lib/redpanda/data/kafka/time-test/0_21:
total 33M
-rw-r--r-- 1 redpanda redpanda  140 Jun 19 11:48 snapshot
-rw-r--r-- 1 redpanda redpanda 1.2K Jun 19 11:48 tx.snapshot
-rw-r--r-- 1 redpanda redpanda 4.0K Jun 19 11:48 1568919-4-v1.log
-rw-r--r-- 1 redpanda redpanda   81 Jun 19 11:45 1568918-3-v1.base_index
-rw-r--r-- 1 redpanda redpanda  164 Jun 19 11:45 1568918-3-v1.log

Option 2 - tweak topic parameter topic retention.bytes

(This is more for legacy reasons. If you are on older releases of redpanda or for some reason don't want to utilise storage_ignore_timestamps_in_future_sec option. Take caution with this approach as risk of removing events that have not passed retention time )

To get round bogus retention timestamps is also possible to adjust retention.bytes of a topic temporarily ..

See Documentation for more on size based retention :
https://docs.redpanda.com/docs/manage/cluster-maintenance/disk-utilization/#size-based-retention

Example topic with 10 partitions and a short retention time of 10 minutes for illustrative purposes

 rpk topic describe  -a time-bug 

SUMMARY
=======
NAME        time-bug
PARTITIONS  10
REPLICAS    1

CONFIGS
=======
KEY                           VALUE          SOURCE
cleanup.policy                delete         DYNAMIC_TOPIC_CONFIG
compression.type              producer       DEFAULT_CONFIG
max.message.bytes             1048576        DEFAULT_CONFIG
message.timestamp.type        CreateTime     DEFAULT_CONFIG 
redpanda.remote.delete        true           DEFAULT_CONFIG
redpanda.remote.read          false          DEFAULT_CONFIG
redpanda.remote.write         false          DEFAULT_CONFIG
retention.bytes               -1             DEFAULT_CONFIG
retention.local.target.bytes  -1             DEFAULT_CONFIG
retention.local.target.ms     86400000       DEFAULT_CONFIG
retention.ms                  600000         DYNAMIC_TOPIC_CONFIG
segment.bytes                 5000000        DYNAMIC_TOPIC_CONFIG
segment.ms                    1209600000     DEFAULT_CONFIG


PARTITIONS
==========
PARTITION  LEADER  EPOCH  REPLICAS  LOG-START-OFFSET  HIGH-WATERMARK
0          0       2      [0]       1060487           1387098
1          0       2      [0]       1005291           1338914
2          0       2      [0]       997621            1326536
3          0       2      [0]       1017114           1350107
4          0       2      [0]       998114            1329691
5          0       2      [0]       1014902           1352799
6          0       2      [0]       1000134           1336685
7          0       2      [0]       1007809           1337255
8          0       2      [0]       153263            1321078
9          0       2      [0]       152053            1338368

Checking the redpanda logs it is showing that partitions 8 + 9 have bogus retention timestamps.
(note: in the above "rpk topic" output.. partitions 8 + 9 also have much lower LOG-START-OFFSET than the rest)

Jul 10 08:48:28 ip-10-0-8-55 rpk[775]: WARN  2023-07-10 08:48:28,499 [shard 0] storage-gc - disk_log_impl.cc:261 - [{kafka/time-bug/8}] found segment with bogus retention timestamp: {timestamp: 2060094277119} (base {timestamp: 2060094277119}, max {timestamp: 2060094277119}) - {offset_tracker:{term:1, base_offset:153264, committed_offset:159416, dirty_offset:159416}, compacted_segment=0, finished_self_compaction=0, generation={0}, reader={/var/lib/redpanda/data/kafka/time-bug/8_150/153264-1-v1.log, (5248341 bytes)}, writer=nullptr, cache={cache_size=0}, compaction_index:nullopt, closed=0, tombstone=0, index={file:/var/lib/redpanda/data/kafka/time-bug/8_150/153264-1-v1.base_index, offsets:{153264}, index:{header_bitflags:0, base_offset:{153264}, max_offset:{159416}, base_timestamp:{timestamp: 2060094277119}, max_timestamp:{timestamp: 2060094277119}, batch_timestamps_are_monotonic:0, index(89,89,89)}, step:32768, needs_persistence:0}}
Jul 10 08:48:30 ip-10-0-8-55 rpk[775]: WARN  2023-07-10 08:48:30,640 [shard 1] storage-gc - disk_log_impl.cc:261 - [{kafka/time-bug/9}] found segment with bogus retention timestamp: {timestamp: 2060094296631} (base {timestamp: 2060094296631}, max {timestamp: 2060094296631}) - {offset_tracker:{term:1, base_offset:152054, committed_offset:158095, dirty_offset:158095}, compacted_segment=0, finished_self_compaction=0, generation={0}, reader={/var/lib/redpanda/data/kafka/time-bug/9_150/152054-1-v1.log, (5157410 bytes)}, writer=nullptr, cache={cache_size=0}, compaction_index:nullopt, closed=0, tombstone=0, index={file:/var/lib/redpanda/data/kafka/time-bug/9_150/152054-1-v1.base_index, offsets:{152054}, index:{header_bitflags:0, base_offset:{152054}, max_offset:{158095}, base_timestamp:{timestamp: 2060094296631}, max_timestamp:{timestamp: 2060094296631}, batch_timestamps_are_monotonic:0, index(88,88,88)}, step:32768, needs_persistence:0}}

Checking the directory for these partitions shows the segment logs are well over the retention time of 10 minutes


date;ls -ltr /var/lib/redpanda/data/kafka/time-bug/9_150|head
Mon Jul 10 09:07:08 UTC 2023
total 800872
-rw-r--r-- 1 redpanda redpanda 146 Jul 7 08:58 snapshot
-rw-r--r-- 1 redpanda redpanda 5157410 Jul 10 08:28 152054-1-v1.log
-rw-r--r-- 1 redpanda redpanda 1473 Jul 10 08:28 152054-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 5149126 Jul 10 08:28 158096-1-v1.log
-rw-r--r-- 1 redpanda redpanda 1521 Jul 10 08:28 158096-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 5169860 Jul 10 08:28 164124-1-v1.log
-rw-r--r-- 1 redpanda redpanda 1601 Jul 10 08:28 164124-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 5179348 Jul 10 08:28 170173-1-v1.log


date;ls -ltr /var/lib/redpanda/data/kafka/time-bug/8_150|head
Mon Jul 10 09:07:08 UTC 2023
total 780100
-rw-r--r-- 1 redpanda redpanda 146 Jul 7 08:58 snapshot
-rw-r--r-- 1 redpanda redpanda 5248341 Jul 10 08:28 153264-1-v1.log
-rw-r--r-- 1 redpanda redpanda 1489 Jul 10 08:28 153264-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 5246975 Jul 10 08:28 159417-1-v1.log
-rw-r--r-- 1 redpanda redpanda 1489 Jul 10 08:28 159417-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 5268600 Jul 10 08:28 165566-1-v1.log
-rw-r--r-- 1 redpanda redpanda 1521 Jul 10 08:28 165566-1-v1.base_index
-rw-r--r-- 1 redpanda redpanda 5241930 Jul 10 08:28 171737-1-v1.log
-rw-r--r-- 1 redpanda redpanda 1441 Jul 10 08:28 171737-1-v1.base_index

Running "du -shc" on the relevant data-directory shows that partitions 8 + 9 have a clear skew with the other partitions as well.

du -shc /var/lib/redpanda/data/kafka/time-bug/*
208M /var/lib/redpanda/data/kafka/time-bug/0_139
211M /var/lib/redpanda/data/kafka/time-bug/1_150
208M /var/lib/redpanda/data/kafka/time-bug/2_150
210M /var/lib/redpanda/data/kafka/time-bug/3_150
211M /var/lib/redpanda/data/kafka/time-bug/4_150
212M /var/lib/redpanda/data/kafka/time-bug/5_150
215M /var/lib/redpanda/data/kafka/time-bug/6_150
210M /var/lib/redpanda/data/kafka/time-bug/7_150
762M /var/lib/redpanda/data/kafka/time-bug/8_150
783M /var/lib/redpanda/data/kafka/time-bug/9_150
3.2G total

Adjust the segment.bytes parameter for the topic..
segment.bytes is based on the space used at partition level..

In the the example above the typical partition size is in a range of 208MB ->215MB
where the problem partitions * + 9 are > 750MB

So changing retention.bytes to 300MB in this case should be safe for size based retention to kick in for the 2 partitions containing logs with bogus retention timestamps

rpk topic alter-config  time-bug --set retention.bytes= 300000000

(in real world.. might need further tweaking of retention.bytes which is why caution needs to be taken when doing this approach and storage_ignore_timestamps_in_future_secis the preferred approach)

You should see that the redpanda logs is no longer reporting the warning about bogus retention timestamps.. and du command shows space has been reclaimed...

du -shc  /var/lib/redpanda/data/kafka/time-bug/* 
205M /var/lib/redpanda/data/kafka/time-bug/0_139
211M /var/lib/redpanda/data/kafka/time-bug/1_150
204M /var/lib/redpanda/data/kafka/time-bug/2_150
210M /var/lib/redpanda/data/kafka/time-bug/3_150
215M /var/lib/redpanda/data/kafka/time-bug/4_150
209M /var/lib/redpanda/data/kafka/time-bug/5_150
200M /var/lib/redpanda/data/kafka/time-bug/6_150
209M /var/lib/redpanda/data/kafka/time-bug/7_150
211M /var/lib/redpanda/data/kafka/time-bug/8_150
201M /var/lib/redpanda/data/kafka/time-bug/9_150

Now set segment.bytes back to original value

rpk topic alter-config  time-bug  --set retention.bytes=-1