1
0
Fork 0
mirror of https://github.com/chrislusf/seaweedfs synced 2025-09-18 17:20:22 +02:00
seaweedfs/KAFKA_SMQ_INTEGRATION_SUMMARY.md
2025-09-11 21:23:55 -07:00

6.6 KiB

Kafka-SMQ Integration Implementation Summary

Overview

This implementation provides SMQ native offset management for the Kafka Gateway, leveraging SeaweedMQ's existing distributed infrastructure instead of building custom persistence layers.

Core Concept

Instead of implementing separate consumer offset persistence, leverage SMQ's existing distributed offset management system where offsets are replicated across brokers and survive client disconnections and broker failures.

Architecture

┌─ Kafka Gateway ─────────────────────────────────────┐
│ ┌─ Kafka Abstraction ──────────────────────────────┐ │
│ │ • Presents Kafka-style sequential offsets       │ │
│ │ • Translates to/from SMQ consumer groups         │ │
│ └──────────────────────────────────────────────────┘ │
│                        ↕                           │
│ ┌─ SMQ Native Offset Management ──────────────────┐ │
│ │ • Consumer groups with SMQ semantics            │ │
│ │ • Distributed offset tracking                   │ │
│ │ • Automatic replication & failover              │ │
│ │ • Built-in persistence                          │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘

Key Components

1. Message Offset Persistence (Implemented)

  • File: weed/mq/kafka/offset/persistence.go
  • Purpose: Maps Kafka sequential offsets to SMQ timestamps
  • Storage: kafka-system/offset-mappings topic
  • Status: Fully implemented and working
  • File: weed/mq/kafka/offset/smq_native_offset_management.go
  • Purpose: Use SMQ's built-in consumer group offset management
  • Benefits:
    • Automatic persistence and replication
    • Survives broker failures and restarts
    • No custom persistence code needed
    • Leverages battle-tested SMQ infrastructure

3. Offset Translation Layer

  • Purpose: Lightweight cache for Kafka offset ↔ SMQ timestamp mapping
  • Implementation: In-memory cache with LRU eviction
  • Scope: Only recent mappings needed for active consumers

Kafka to SMQ Mapping

Kafka Concept SMQ Concept Implementation
Consumer Group SMQ Consumer Group Direct 1:1 mapping with "kafka-cg-" prefix
Topic-Partition SMQ Topic-Partition Range Map Kafka partition N to SMQ ring [N*32, (N+1)*32-1]
Sequential Offset SMQ Timestamp + Index Use SMQ's natural timestamp ordering
Offset Commit SMQ Consumer Position Let SMQ track position natively
Offset Fetch SMQ Consumer Status Query SMQ for current position

Long-Term Disconnection Handling

Scenario: 30-Day Disconnection

Day 0:   Consumer reads to offset 1000, disconnects
         → SMQ consumer group position: timestamp T1000

Day 1-30: System continues, messages 1001-5000 arrive
         → SMQ stores messages with timestamps T1001-T5000
         → SMQ preserves consumer group position at T1000

Day 15:  Kafka Gateway restarts
         → In-memory cache lost
         → SMQ consumer group position PRESERVED

Day 30:  Consumer reconnects
         → SMQ automatically resumes from position T1000
         → Consumer receives messages starting from offset 1001
         → Perfect resumption, no manual recovery needed

Benefits vs Custom Persistence

Aspect Custom Persistence SMQ Native
Implementation High complexity (dual systems) Medium (SMQ integration)
Reliability Custom code risks Battle-tested SMQ
Performance Dual writes penalty Native SMQ speed
Operational Monitor 2 systems Single system
Maintenance Custom bugs/fixes SMQ team handles
Scalability Custom scaling SMQ distributed arch

Implementation Roadmap

Phase 1: SMQ API Assessment (1-2 weeks)

  • Verify SMQ consumer group persistence behavior
  • Test RESUME_OR_EARLIEST functionality
  • Confirm replication across brokers

Phase 2: Consumer Group Integration (2-3 weeks)

  • Implement SMQ consumer group wrapper
  • Map Kafka consumer lifecycle to SMQ
  • Handle consumer group coordination

Phase 3: Offset Translation (2-3 weeks)

  • Lightweight cache for Kafka ↔ SMQ offset mapping
  • Handle edge cases (rebalancing, etc.)
  • Performance optimization

Phase 4: Integration & Testing (3-4 weeks)

  • Replace current offset management
  • Comprehensive testing with long disconnections
  • Performance benchmarking

Total Estimated Effort: ~10 weeks

Current Status

Completed

  • Message offset persistence (Kafka offset → SMQ timestamp mapping)
  • SMQ publishing integration with offset tracking
  • SMQ subscription integration with offset mapping
  • Comprehensive integration tests
  • Architecture design for SMQ native approach

Next Steps

  1. Assess SMQ consumer group APIs
  2. Implement SMQ native consumer group wrapper
  3. Replace current in-memory offset management
  4. Test long-term disconnection scenarios

Key Advantages

  • Leverage Existing Infrastructure: Use SMQ's proven distributed systems
  • Reduced Complexity: No custom persistence layer to maintain
  • Better Reliability: Inherit SMQ's fault tolerance and replication
  • Operational Simplicity: One system to monitor instead of two
  • Performance: Native SMQ operations without translation overhead
  • Automatic Edge Cases: SMQ handles network partitions, broker failures, etc.

Recommendation

The SMQ Native Offset Management approach is strongly recommended over custom persistence because:

  1. Simpler Implementation: ~10 weeks vs 15+ weeks for custom persistence
  2. Higher Reliability: Leverage battle-tested SMQ infrastructure
  3. Better Performance: Native operations without dual-write penalty
  4. Lower Maintenance: SMQ team handles distributed systems complexity
  5. Operational Benefits: Single system to monitor and maintain

This approach solves the critical offset persistence problem while minimizing implementation complexity and maximizing reliability.