Kafka isn't a message queue โ it's a distributed commit log. Understanding its internals (partitions, replication, consumer groups, offset management) is what separates a surface-level answer from one that impresses.
Medium-Hard Very High FrequencyKafka is a distributed commit log, not a traditional message queue. Messages are appended to an immutable, ordered log (a partition) and are not deleted after consumption โ they're retained based on time or size policy. This fundamental difference means multiple consumers can independently read the same data at different speeds.
A topic is a logical channel (e.g., orders). Each topic is split into partitions โ the unit of parallelism. Within a partition, every message gets a monotonically increasing offset. Ordering is guaranteed within a partition only. Across partitions, there's no global order.
sendfile() syscall to transfer data directly from page cache to the network socket, bypassing user-space entirely. Saves 2 copies + 2 context switches per read.Each consumer in a consumer group is assigned a subset of partitions. Within a group, each partition is consumed by exactly one consumer. If you have 3 partitions and 3 consumers, each gets one partition. If you add a 4th consumer, it sits idle. If a consumer dies, its partitions are rebalanced to the remaining consumers.
Multiple consumer groups can independently read the same topic โ this is how Kafka supports both "queue" and "pub/sub" semantics.
acks)Fire and forget. Producer doesn't wait for any acknowledgment. Maximum throughput, but messages can be lost if the broker crashes before writing to disk. Use for metrics/logs where loss is tolerable.
Leader acknowledges after writing to its local log. If the leader crashes before replicas catch up, data is lost. Good balance of throughput and durability for most use cases.
Leader waits until all in-sync replicas (ISR) have written the message. Strongest durability guarantee. Combined with min.insync.replicas=2, ensures at least 2 copies before ack.
Set enable.idempotence=true. Each producer gets a PID + sequence number. Broker deduplicates retries. Prevents duplicates from network retries. Required for exactly-once.
linger.ms controls how long the producer waits to fill a batch before sending. batch.size sets the max batch size in bytes. Higher linger = better throughput but higher latency. Typical: linger.ms=5, batch.size=64KB.
If you set a message key (e.g., user_id), Kafka hashes it to determine the partition: partition = hash(key) % num_partitions. All messages with the same key go to the same partition, guaranteeing order for that key. This is how you ensure all events for a given user/order are processed in sequence.
Commit offset before processing. If processing fails, the message is skipped (lost). Simple but risky. Acceptable for non-critical analytics.
Process then commit offset. If consumer crashes after processing but before commit, the message is re-processed on restart. Most common pattern โ requires idempotent consumers.
Uses transactional producer + consumer in a read-process-write loop. isolation.level=read_committed. Works for Kafka-to-Kafka pipelines. For external sinks, you still need idempotent writes.
enable.auto.commit=true commits every 5s by default โ easy but risks at-most-once on crash. Manual commit (commitSync() / commitAsync()) gives you control over when exactly to mark progress.
When consumers join/leave a group (or crash), a rebalance redistributes partitions. During rebalance, the group stops consuming โ a "stop-the-world" event. Kafka 2.4+ introduced incremental cooperative rebalancing to minimize this pause: only affected partitions are revoked, not all of them.
Lag = latest offset โ consumer's committed offset. High lag means the consumer can't keep up. Monitor via kafka-consumer-groups.sh --describe or tools like Burrow. If lag grows unboundedly, you need more consumers (up to partition count) or need to optimize processing.
Replication Factor: Each partition has N replicas spread across different brokers. Standard production setup: replication.factor=3. One replica is the leader (handles all reads/writes), the rest are followers that replicate from the leader.
In-Sync Replicas (ISR): Replicas that are caught up to the leader within replica.lag.time.max.ms (default 30s). Only ISR members can become the new leader. If a follower falls behind, it's removed from ISR. The leader only considers a write "committed" when all ISR members have it.
min.insync.replicas: Minimum ISR count required for a write to succeed. With replication.factor=3 and min.insync.replicas=2, you can tolerate 1 broker failure without data loss and without halting writes. If ISR drops below this, producers get NotEnoughReplicasException.
Leader Election: When the leader broker goes down, the controller (a designated broker) picks a new leader from the ISR. This happens within seconds.
Unclean Leader Election: If all ISR members are down, Kafka can either wait (availability loss) or elect a non-ISR replica (unclean.leader.election.enable=true). The tradeoff: availability vs potential data loss. Default is false โ correctness over availability.
You need high throughput (100K+ msg/s), log replay, event sourcing, stream processing, multiple consumers reading the same data, or long retention. Kafka excels at "event backbone" use cases.
You need traditional task queues, complex routing (fanout/topic exchanges), message-level acknowledgment, low latency for small volumes, or simple request-reply patterns. SQS for zero-ops on AWS.
Partitions determine max consumer parallelism. You can increase partition count but never decrease it (would break key-based routing). Start with num.partitions = expected_throughput / partition_throughput. Rule of thumb: 10โ30 partitions per topic for most workloads. Don't go over 10K partitions per broker โ metadata overhead grows.
retention.ms (default 7 days) or retention.bytes. Old segments are deleted. Good for event streams where you only care about recent data.
Keeps the latest value for each key. Background thread removes older records with same key. Perfect for changelog/CDC topics where you want current state forever (e.g., user profiles).
Use Apache Avro + Schema Registry for typed, evolvable messages. Schemas are registered centrally and referenced by ID in messages (saves bandwidth vs embedding schema). Supports backward/forward/full compatibility modes. Prevents producers from breaking consumers with incompatible changes.
Interview recognition signals: If the interviewer mentions "event-driven," "stream processing," "real-time pipeline," "decoupling services," or "replay events" โ Kafka is almost certainly the expected answer. Show you understand why it's a commit log and not just another queue.
| Metric | Value |
|---|---|
| Single broker throughput (write) | ~200 MB/s (sequential disk) |
| Single broker throughput (messages) | ~200Kโ800K msg/s (1KB msgs) |
| Max partitions per broker (recommended) | ~4,000โ10,000 |
| Max message size (default) | 1 MB (configurable) |
| Replication overhead | ~2ร storage & network (RF=3) |
| Leader election time | ~5โ30 seconds |
| End-to-end latency (99th %ile) | ~2โ10 ms (acks=1) |
| Consumer lag (healthy threshold) | < 1,000 messages |
| Default retention | 7 days (168 hours) |
| LinkedIn daily volume | 7+ trillion messages |
Learn by doing โ download the Kafka Lab, a ready-to-run playground with a 3-broker cluster, Kotlin client, and 7 guided experiments covering everything on this page.