mirror of
https://github.com/chrislusf/seaweedfs
synced 2025-09-18 17:20:22 +02:00
6.6 KiB
6.6 KiB
Kafka-SMQ Integration Implementation Summary
Overview
This implementation provides SMQ native offset management for the Kafka Gateway, leveraging SeaweedMQ's existing distributed infrastructure instead of building custom persistence layers.
Recommended Approach: SMQ Native Offset Management
Core Concept
Instead of implementing separate consumer offset persistence, leverage SMQ's existing distributed offset management system where offsets are replicated across brokers and survive client disconnections and broker failures.
Architecture
┌─ Kafka Gateway ─────────────────────────────────────┐
│ ┌─ Kafka Abstraction ──────────────────────────────┐ │
│ │ • Presents Kafka-style sequential offsets │ │
│ │ • Translates to/from SMQ consumer groups │ │
│ └──────────────────────────────────────────────────┘ │
│ ↕ │
│ ┌─ SMQ Native Offset Management ──────────────────┐ │
│ │ • Consumer groups with SMQ semantics │ │
│ │ • Distributed offset tracking │ │
│ │ • Automatic replication & failover │ │
│ │ • Built-in persistence │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
Key Components
1. Message Offset Persistence (Implemented)
- File:
weed/mq/kafka/offset/persistence.go
- Purpose: Maps Kafka sequential offsets to SMQ timestamps
- Storage:
kafka-system/offset-mappings
topic - Status: Fully implemented and working
2. SMQ Native Consumer Groups (Recommended)
- File:
weed/mq/kafka/offset/smq_native_offset_management.go
- Purpose: Use SMQ's built-in consumer group offset management
- Benefits:
- Automatic persistence and replication
- Survives broker failures and restarts
- No custom persistence code needed
- Leverages battle-tested SMQ infrastructure
3. Offset Translation Layer
- Purpose: Lightweight cache for Kafka offset ↔ SMQ timestamp mapping
- Implementation: In-memory cache with LRU eviction
- Scope: Only recent mappings needed for active consumers
Kafka to SMQ Mapping
Kafka Concept | SMQ Concept | Implementation |
---|---|---|
Consumer Group | SMQ Consumer Group | Direct 1:1 mapping with "kafka-cg-" prefix |
Topic-Partition | SMQ Topic-Partition Range | Map Kafka partition N to SMQ ring [N*32, (N+1)*32-1] |
Sequential Offset | SMQ Timestamp + Index | Use SMQ's natural timestamp ordering |
Offset Commit | SMQ Consumer Position | Let SMQ track position natively |
Offset Fetch | SMQ Consumer Status | Query SMQ for current position |
Long-Term Disconnection Handling
Scenario: 30-Day Disconnection
Day 0: Consumer reads to offset 1000, disconnects
→ SMQ consumer group position: timestamp T1000
Day 1-30: System continues, messages 1001-5000 arrive
→ SMQ stores messages with timestamps T1001-T5000
→ SMQ preserves consumer group position at T1000
Day 15: Kafka Gateway restarts
→ In-memory cache lost
→ SMQ consumer group position PRESERVED
Day 30: Consumer reconnects
→ SMQ automatically resumes from position T1000
→ Consumer receives messages starting from offset 1001
→ Perfect resumption, no manual recovery needed
Benefits vs Custom Persistence
Aspect | Custom Persistence | SMQ Native |
---|---|---|
Implementation | High complexity (dual systems) | Medium (SMQ integration) |
Reliability | Custom code risks | Battle-tested SMQ |
Performance | Dual writes penalty | Native SMQ speed |
Operational | Monitor 2 systems | Single system |
Maintenance | Custom bugs/fixes | SMQ team handles |
Scalability | Custom scaling | SMQ distributed arch |
Implementation Roadmap
Phase 1: SMQ API Assessment (1-2 weeks)
- Verify SMQ consumer group persistence behavior
- Test
RESUME_OR_EARLIEST
functionality - Confirm replication across brokers
Phase 2: Consumer Group Integration (2-3 weeks)
- Implement SMQ consumer group wrapper
- Map Kafka consumer lifecycle to SMQ
- Handle consumer group coordination
Phase 3: Offset Translation (2-3 weeks)
- Lightweight cache for Kafka ↔ SMQ offset mapping
- Handle edge cases (rebalancing, etc.)
- Performance optimization
Phase 4: Integration & Testing (3-4 weeks)
- Replace current offset management
- Comprehensive testing with long disconnections
- Performance benchmarking
Total Estimated Effort: ~10 weeks
Current Status
Completed
- Message offset persistence (Kafka offset → SMQ timestamp mapping)
- SMQ publishing integration with offset tracking
- SMQ subscription integration with offset mapping
- Comprehensive integration tests
- Architecture design for SMQ native approach
Next Steps
- Assess SMQ consumer group APIs
- Implement SMQ native consumer group wrapper
- Replace current in-memory offset management
- Test long-term disconnection scenarios
Key Advantages
- Leverage Existing Infrastructure: Use SMQ's proven distributed systems
- Reduced Complexity: No custom persistence layer to maintain
- Better Reliability: Inherit SMQ's fault tolerance and replication
- Operational Simplicity: One system to monitor instead of two
- Performance: Native SMQ operations without translation overhead
- Automatic Edge Cases: SMQ handles network partitions, broker failures, etc.
Recommendation
The SMQ Native Offset Management approach is strongly recommended over custom persistence because:
- Simpler Implementation: ~10 weeks vs 15+ weeks for custom persistence
- Higher Reliability: Leverage battle-tested SMQ infrastructure
- Better Performance: Native operations without dual-write penalty
- Lower Maintenance: SMQ team handles distributed systems complexity
- Operational Benefits: Single system to monitor and maintain
This approach solves the critical offset persistence problem while minimizing implementation complexity and maximizing reliability.