1
0
Fork 0
mirror of https://github.com/chrislusf/seaweedfs synced 2025-06-29 16:22:46 +02:00
seaweedfs/test/mq/integration_test_design.md
2025-06-23 10:55:02 -07:00

286 lines
No EOL
7.9 KiB
Markdown

# SeaweedMQ Integration Test Design
## Overview
This document outlines the comprehensive integration test strategy for SeaweedMQ, covering all critical functionalities from basic pub/sub operations to advanced features like auto-scaling, failover, and performance testing.
## Architecture Under Test
SeaweedMQ consists of:
- **Masters**: Cluster coordination and metadata management
- **Volume Servers**: Storage layer for persistent messages
- **Filers**: File system interface for metadata storage
- **Brokers**: Message processing and routing (stateless)
- **Agents**: Client interface for pub/sub operations
- **Schema System**: Protobuf-based message schema management
## Test Categories
### 1. Basic Functionality Tests
#### 1.1 Basic Pub/Sub Operations
- **Test**: `TestBasicPublishSubscribe`
- Publish messages to a topic
- Subscribe and receive messages
- Verify message content and ordering
- Test with different data types (string, int, bytes, records)
- **Test**: `TestMultipleConsumers`
- Multiple subscribers on same topic
- Verify message distribution
- Test consumer group functionality
- **Test**: `TestMessageOrdering`
- Publish messages in sequence
- Verify FIFO ordering within partitions
- Test with different partition keys
#### 1.2 Schema Management
- **Test**: `TestSchemaValidation`
- Publish with valid schemas
- Reject invalid schema messages
- Test schema evolution scenarios
- **Test**: `TestRecordTypes`
- Nested record structures
- List types and complex schemas
- Schema-to-Parquet conversion
### 2. Partitioning and Scaling Tests
#### 2.1 Partition Management
- **Test**: `TestPartitionDistribution`
- Messages distributed across partitions based on keys
- Verify partition assignment logic
- Test partition rebalancing
- **Test**: `TestAutoSplitMerge`
- Simulate high load to trigger auto-split
- Simulate low load to trigger auto-merge
- Verify data consistency during splits/merges
#### 2.2 Broker Scaling
- **Test**: `TestBrokerAddRemove`
- Add brokers during operation
- Remove brokers gracefully
- Verify partition reassignment
- **Test**: `TestLoadBalancing`
- Verify even load distribution across brokers
- Test with varying message sizes and rates
- Monitor broker resource utilization
### 3. Failover and Reliability Tests
#### 3.1 Broker Failover
- **Test**: `TestBrokerFailover`
- Kill leader broker during publishing
- Verify seamless failover to follower
- Test data consistency after failover
- **Test**: `TestBrokerRecovery`
- Broker restart scenarios
- State recovery from storage
- Partition reassignment after recovery
#### 3.2 Data Durability
- **Test**: `TestMessagePersistence`
- Publish messages and restart cluster
- Verify all messages are recovered
- Test with different replication settings
- **Test**: `TestFollowerReplication`
- Leader-follower message replication
- Verify consistency between replicas
- Test follower promotion scenarios
### 4. Agent Functionality Tests
#### 4.1 Session Management
- **Test**: `TestPublishSessions`
- Create/close publish sessions
- Concurrent session management
- Session cleanup after failures
- **Test**: `TestSubscribeSessions`
- Subscribe session lifecycle
- Consumer group management
- Offset tracking and acknowledgments
#### 4.2 Error Handling
- **Test**: `TestConnectionFailures`
- Network partitions between agent and broker
- Automatic reconnection logic
- Message buffering during outages
### 5. Performance and Load Tests
#### 5.1 Throughput Tests
- **Test**: `TestHighThroughputPublish`
- Publish 100K+ messages/second
- Monitor system resources
- Verify no message loss
- **Test**: `TestHighThroughputSubscribe`
- Multiple consumers processing high volume
- Monitor processing latency
- Test backpressure handling
#### 5.2 Spike Traffic Tests
- **Test**: `TestTrafficSpikes`
- Sudden increase in message volume
- Auto-scaling behavior verification
- Resource utilization patterns
- **Test**: `TestLargeMessages`
- Messages with large payloads (MB size)
- Memory usage monitoring
- Storage efficiency testing
### 6. End-to-End Scenarios
#### 6.1 Complete Workflow Tests
- **Test**: `TestProducerConsumerWorkflow`
- Multi-stage data processing pipeline
- Producer → Topic → Multiple Consumers
- Data transformation and aggregation
- **Test**: `TestMultiTopicOperations`
- Multiple topics with different schemas
- Cross-topic message routing
- Topic management operations
## Test Infrastructure
### Environment Setup
#### Docker Compose Configuration
```yaml
# test-environment.yml
version: '3.9'
services:
master-cluster:
# 3 master nodes for HA
volume-cluster:
# 3 volume servers for data storage
filer-cluster:
# 2 filers for metadata
broker-cluster:
# 3 brokers for message processing
test-runner:
# Container to run integration tests
```
#### Test Data Management
- Pre-defined test schemas
- Sample message datasets
- Performance benchmarking data
### Test Framework Structure
```go
// Base test framework
type IntegrationTestSuite struct {
masters []string
brokers []string
filers []string
testClient *TestClient
cleanup []func()
}
// Test utilities
type TestClient struct {
publishers map[string]*pub_client.TopicPublisher
subscribers map[string]*sub_client.TopicSubscriber
agents []*agent.MessageQueueAgent
}
```
### Monitoring and Metrics
#### Health Checks
- Broker connectivity status
- Master cluster health
- Storage system availability
- Network connectivity between components
#### Performance Metrics
- Message throughput (msgs/sec)
- End-to-end latency
- Resource utilization (CPU, Memory, Disk)
- Network bandwidth usage
## Test Execution Strategy
### Parallel Test Execution
- Categorize tests by resource requirements
- Run independent tests in parallel
- Serialize tests that modify cluster state
### Continuous Integration
- Automated test runs on PR submissions
- Performance regression detection
- Multi-platform testing (Linux, macOS, Windows)
### Test Environment Management
- Docker-based isolated environments
- Automatic cleanup after test completion
- Resource monitoring and alerts
## Success Criteria
### Functional Requirements
- ✅ All messages published are received by subscribers
- ✅ Message ordering preserved within partitions
- ✅ Schema validation works correctly
- ✅ Auto-scaling triggers at expected thresholds
- ✅ Failover completes within 30 seconds
- ✅ No data loss during normal operations
### Performance Requirements
- ✅ Throughput: 50K+ messages/second/broker
- ✅ Latency: P95 < 100ms end-to-end
- Memory usage: < 1GB per broker under normal load
- Storage efficiency: < 20% overhead vs raw message size
### Reliability Requirements
- 99.9% uptime during normal operations
- Automatic recovery from single component failures
- Data consistency maintained across all scenarios
- Graceful degradation under resource constraints
## Implementation Timeline
### Phase 1: Core Functionality (Week 1-2)
- Basic pub/sub tests
- Schema validation tests
- Simple failover scenarios
### Phase 2: Advanced Features (Week 3-4)
- Auto-scaling tests
- Complex failover scenarios
- Agent functionality tests
### Phase 3: Performance & Load (Week 5-6)
- Throughput and latency tests
- Spike traffic handling
- Resource utilization monitoring
### Phase 4: End-to-End (Week 7-8)
- Complete workflow tests
- Multi-component integration
- Performance regression testing
## Maintenance and Updates
### Regular Updates
- Add tests for new features
- Update performance baselines
- Enhance error scenarios coverage
### Test Data Refresh
- Generate new test datasets quarterly
- Update schema examples
- Refresh performance benchmarks
This comprehensive test design ensures SeaweedMQ's reliability, performance, and functionality across all critical use cases and failure scenarios.