When a Kafka consumer starts and there are no committed offsets for its consumer group, or when the committed offset is no longer valid (e.g., because the data has been deleted), the consumer needs to decide where to start reading from. This behavior is controlled by the auto.offset.reset configuration.

Auto offset reset options

earliest

auto.offset.reset=earliest
  • Consumer will start reading from the beginning of the partition
  • Reads all available messages from the earliest available offset
  • Useful for reprocessing all historical data
  • Use case: Data migration, audit requirements, complete reprocessing

latest (default)

auto.offset.reset=latest
  • Consumer will start reading from the end of the partition
  • Only processes new messages produced after the consumer starts
  • Use case: Real-time processing where historical data is not needed

none

auto.offset.reset=none
  • Consumer throws an exception if no previous offset is found
  • Forces explicit offset management
  • Use case: Strict control over consumer behavior, prevents accidental data loss or reprocessing

When auto offset reset is triggered

The auto.offset.reset behavior is triggered in these scenarios:
  1. New consumer group: First time a consumer group subscribes to a topic
  2. Invalid offset: Committed offset no longer exists (data deleted due to retention)
  3. Offset out of range: Committed offset is beyond the current log boundaries

Common scenarios

Scenario 1: New consumer group

// First time this consumer group runs
Properties props = new Properties();
props.put("group.id", "new-consumer-group");
props.put("auto.offset.reset", "earliest"); // Will read from beginning

Scenario 2: Data retention cleanup

// Consumer was offline for too long, committed offset expired
// Behavior depends on auto.offset.reset setting
Properties props = new Properties();
props.put("group.id", "existing-group");
props.put("auto.offset.reset", "latest"); // Will skip to latest

Best practices

For production systems

# Be explicit about offset reset behavior
auto.offset.reset=latest

# Enable offset commits
enable.auto.commit=true
auto.commit.interval.ms=5000

For development/testing

# Often want to reprocess data
auto.offset.reset=earliest

# May want manual control
enable.auto.commit=false

For critical data processing

# Prevent accidental data loss or reprocessing
auto.offset.reset=none

# Handle exceptions explicitly in code

Monitoring offset reset events

Key metrics to monitor:
  • Consumer group lag
  • Offset reset occurrences
  • Consumer restarts and rebalances

Error handling example

Properties props = new Properties();
props.put("auto.offset.reset", "none");

KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);

try {
    consumer.subscribe(Arrays.asList("my-topic"));
    while (true) {
        ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
        // Process records
    }
} catch (NoOffsetForPartitionException e) {
    // Handle case where no valid offset exists
    // Decide whether to seek to beginning or end
    consumer.seekToBeginning(consumer.assignment());
    // or consumer.seekToEnd(consumer.assignment());
}

Offset management strategies

Automatic offset management

  • Use enable.auto.commit=true
  • Set appropriate auto.commit.interval.ms
  • Choose suitable auto.offset.reset policy

Manual offset management

  • Use enable.auto.commit=false
  • Call commitSync() or commitAsync() after processing
  • Handle offset reset scenarios explicitly

External offset storage

  • Store offsets in external systems (database, file system)
  • Use seek() methods to position consumer
  • Implement custom offset management logic
Data loss vs duplication
  • auto.offset.reset=latest can cause data loss if messages arrive while consumer is down
  • auto.offset.reset=earliest can cause message duplication if consumer group is recreated
  • auto.offset.reset=none requires explicit error handling but provides the most control

Configuration recommendations

High-throughput applications

auto.offset.reset=latest
enable.auto.commit=true
auto.commit.interval.ms=1000

Critical data processing

auto.offset.reset=none
enable.auto.commit=false
# Handle offsets manually with explicit commits

Replay/reprocessing scenarios

auto.offset.reset=earliest
enable.auto.commit=false
# Process all historical data, commit when safe