1
0
Fork 0
mirror of https://github.com/chrislusf/seaweedfs synced 2025-09-18 17:20:22 +02:00
seaweedfs/KAFKA_SMQ_INTEGRATION_SUMMARY.md
2025-09-11 21:23:55 -07:00

153 lines
6.6 KiB
Markdown

# Kafka-SMQ Integration Implementation Summary
## Overview
This implementation provides **SMQ native offset management** for the Kafka Gateway, leveraging SeaweedMQ's existing distributed infrastructure instead of building custom persistence layers.
## Recommended Approach: SMQ Native Offset Management
### Core Concept
Instead of implementing separate consumer offset persistence, leverage SMQ's existing distributed offset management system where offsets are replicated across brokers and survive client disconnections and broker failures.
### Architecture
```
┌─ Kafka Gateway ─────────────────────────────────────┐
│ ┌─ Kafka Abstraction ──────────────────────────────┐ │
│ │ • Presents Kafka-style sequential offsets │ │
│ │ • Translates to/from SMQ consumer groups │ │
│ └──────────────────────────────────────────────────┘ │
│ ↕ │
│ ┌─ SMQ Native Offset Management ──────────────────┐ │
│ │ • Consumer groups with SMQ semantics │ │
│ │ • Distributed offset tracking │ │
│ │ • Automatic replication & failover │ │
│ │ • Built-in persistence │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────┘
```
## Key Components
### 1. Message Offset Persistence (Implemented)
- **File**: `weed/mq/kafka/offset/persistence.go`
- **Purpose**: Maps Kafka sequential offsets to SMQ timestamps
- **Storage**: `kafka-system/offset-mappings` topic
- **Status**: Fully implemented and working
### 2. SMQ Native Consumer Groups (Recommended)
- **File**: `weed/mq/kafka/offset/smq_native_offset_management.go`
- **Purpose**: Use SMQ's built-in consumer group offset management
- **Benefits**:
- Automatic persistence and replication
- Survives broker failures and restarts
- No custom persistence code needed
- Leverages battle-tested SMQ infrastructure
### 3. Offset Translation Layer
- **Purpose**: Lightweight cache for Kafka offset ↔ SMQ timestamp mapping
- **Implementation**: In-memory cache with LRU eviction
- **Scope**: Only recent mappings needed for active consumers
## Kafka to SMQ Mapping
| Kafka Concept | SMQ Concept | Implementation |
|---------------|-------------|----------------|
| Consumer Group | SMQ Consumer Group | Direct 1:1 mapping with "kafka-cg-" prefix |
| Topic-Partition | SMQ Topic-Partition Range | Map Kafka partition N to SMQ ring [N*32, (N+1)*32-1] |
| Sequential Offset | SMQ Timestamp + Index | Use SMQ's natural timestamp ordering |
| Offset Commit | SMQ Consumer Position | Let SMQ track position natively |
| Offset Fetch | SMQ Consumer Status | Query SMQ for current position |
## Long-Term Disconnection Handling
### Scenario: 30-Day Disconnection
```
Day 0: Consumer reads to offset 1000, disconnects
→ SMQ consumer group position: timestamp T1000
Day 1-30: System continues, messages 1001-5000 arrive
→ SMQ stores messages with timestamps T1001-T5000
→ SMQ preserves consumer group position at T1000
Day 15: Kafka Gateway restarts
→ In-memory cache lost
→ SMQ consumer group position PRESERVED
Day 30: Consumer reconnects
→ SMQ automatically resumes from position T1000
→ Consumer receives messages starting from offset 1001
→ Perfect resumption, no manual recovery needed
```
## Benefits vs Custom Persistence
| Aspect | Custom Persistence | SMQ Native |
|--------|-------------------|------------|
| Implementation | High complexity (dual systems) | Medium (SMQ integration) |
| Reliability | Custom code risks | Battle-tested SMQ |
| Performance | Dual writes penalty | Native SMQ speed |
| Operational | Monitor 2 systems | Single system |
| Maintenance | Custom bugs/fixes | SMQ team handles |
| Scalability | Custom scaling | SMQ distributed arch |
## Implementation Roadmap
### Phase 1: SMQ API Assessment (1-2 weeks)
- Verify SMQ consumer group persistence behavior
- Test `RESUME_OR_EARLIEST` functionality
- Confirm replication across brokers
### Phase 2: Consumer Group Integration (2-3 weeks)
- Implement SMQ consumer group wrapper
- Map Kafka consumer lifecycle to SMQ
- Handle consumer group coordination
### Phase 3: Offset Translation (2-3 weeks)
- Lightweight cache for Kafka ↔ SMQ offset mapping
- Handle edge cases (rebalancing, etc.)
- Performance optimization
### Phase 4: Integration & Testing (3-4 weeks)
- Replace current offset management
- Comprehensive testing with long disconnections
- Performance benchmarking
**Total Estimated Effort: ~10 weeks**
## Current Status
### Completed
- Message offset persistence (Kafka offset → SMQ timestamp mapping)
- SMQ publishing integration with offset tracking
- SMQ subscription integration with offset mapping
- Comprehensive integration tests
- Architecture design for SMQ native approach
### Next Steps
1. Assess SMQ consumer group APIs
2. Implement SMQ native consumer group wrapper
3. Replace current in-memory offset management
4. Test long-term disconnection scenarios
## Key Advantages
- **Leverage Existing Infrastructure**: Use SMQ's proven distributed systems
- **Reduced Complexity**: No custom persistence layer to maintain
- **Better Reliability**: Inherit SMQ's fault tolerance and replication
- **Operational Simplicity**: One system to monitor instead of two
- **Performance**: Native SMQ operations without translation overhead
- **Automatic Edge Cases**: SMQ handles network partitions, broker failures, etc.
## Recommendation
The **SMQ Native Offset Management** approach is strongly recommended over custom persistence because:
1. **Simpler Implementation**: ~10 weeks vs 15+ weeks for custom persistence
2. **Higher Reliability**: Leverage battle-tested SMQ infrastructure
3. **Better Performance**: Native operations without dual-write penalty
4. **Lower Maintenance**: SMQ team handles distributed systems complexity
5. **Operational Benefits**: Single system to monitor and maintain
This approach solves the critical offset persistence problem while minimizing implementation complexity and maximizing reliability.