-
log.retention.hours
The most common configuration for how long Kafka will retain messages is by time. The default is specified in the configuration file using thelog.retention.hours
parameter, and it is set to 168 hours, the equivalent of one week. Setting it to a higher value will result in more disk space being used on brokers for that particular topic. On the other hand, setting it to a very small value will make data available for less time. Consumers that are not available for a long time may miss the data. There are two other parameters allowed,log.retention.minutes
andlog.retention.ms
. All three of these specify the same configuration - the amount of time after which messages may be deleted. If more than one is specified, the smaller unit size will take precedence. Because the Kafka CLI command only allows you to set thems
version of this parameter, so we recommend using that one across all your configurations. -
log.retention.bytes
Another way to expire messages is based on the total number of bytes of messages retained. This value is set using thelog.retention.bytes
parameter, and it is applied per partition. The default is -1, meaning that there is no limit and only a time limit is applied. This parameter is useful to set a to positive value if you want to keep the size of a log under a threshold.
Broker-level vs Topic-levelKafka broker-level topic configurations are prefixed by
log.
and we can remove it to find the equivalent Kafka topic-level configurationConfiguring retention by size and time
We learned earlier that new data gets appended into the active segment. Retention by time is performed by examining the last modified time on each log segment file on disk. This is the time that the log segment was closed, and represents the timestamp of the last message in the file.
log.retention.bytes
and log.retention.hours
, messages may be removed when either criteria is met.
Let us see how we can implement two common use-cases with these parameters.
-
One week of retention
To specify retention by time, we have to set
log.retention.hours
to one week. We have also to make sure the data is not expired by size. So, configure the values as shown:retention.ms = 604800000
retention.bytes = -1
-
Infinite time retention bounded by 500MB
We have to set
retention.bytes
to500MB
. We have also to make sure the data is not expired by time. This can be achieved by setting it to the special-1
value. So, configure the values as shown:retention.ms = -1
retention.bytes = 524288000