mirror of
https://github.com/chrislusf/seaweedfs
synced 2025-06-29 16:22:46 +02:00
286 lines
No EOL
7.9 KiB
Markdown
286 lines
No EOL
7.9 KiB
Markdown
# SeaweedMQ Integration Test Design
|
|
|
|
## Overview
|
|
|
|
This document outlines the comprehensive integration test strategy for SeaweedMQ, covering all critical functionalities from basic pub/sub operations to advanced features like auto-scaling, failover, and performance testing.
|
|
|
|
## Architecture Under Test
|
|
|
|
SeaweedMQ consists of:
|
|
- **Masters**: Cluster coordination and metadata management
|
|
- **Volume Servers**: Storage layer for persistent messages
|
|
- **Filers**: File system interface for metadata storage
|
|
- **Brokers**: Message processing and routing (stateless)
|
|
- **Agents**: Client interface for pub/sub operations
|
|
- **Schema System**: Protobuf-based message schema management
|
|
|
|
## Test Categories
|
|
|
|
### 1. Basic Functionality Tests
|
|
|
|
#### 1.1 Basic Pub/Sub Operations
|
|
- **Test**: `TestBasicPublishSubscribe`
|
|
- Publish messages to a topic
|
|
- Subscribe and receive messages
|
|
- Verify message content and ordering
|
|
- Test with different data types (string, int, bytes, records)
|
|
|
|
- **Test**: `TestMultipleConsumers`
|
|
- Multiple subscribers on same topic
|
|
- Verify message distribution
|
|
- Test consumer group functionality
|
|
|
|
- **Test**: `TestMessageOrdering`
|
|
- Publish messages in sequence
|
|
- Verify FIFO ordering within partitions
|
|
- Test with different partition keys
|
|
|
|
#### 1.2 Schema Management
|
|
- **Test**: `TestSchemaValidation`
|
|
- Publish with valid schemas
|
|
- Reject invalid schema messages
|
|
- Test schema evolution scenarios
|
|
|
|
- **Test**: `TestRecordTypes`
|
|
- Nested record structures
|
|
- List types and complex schemas
|
|
- Schema-to-Parquet conversion
|
|
|
|
### 2. Partitioning and Scaling Tests
|
|
|
|
#### 2.1 Partition Management
|
|
- **Test**: `TestPartitionDistribution`
|
|
- Messages distributed across partitions based on keys
|
|
- Verify partition assignment logic
|
|
- Test partition rebalancing
|
|
|
|
- **Test**: `TestAutoSplitMerge`
|
|
- Simulate high load to trigger auto-split
|
|
- Simulate low load to trigger auto-merge
|
|
- Verify data consistency during splits/merges
|
|
|
|
#### 2.2 Broker Scaling
|
|
- **Test**: `TestBrokerAddRemove`
|
|
- Add brokers during operation
|
|
- Remove brokers gracefully
|
|
- Verify partition reassignment
|
|
|
|
- **Test**: `TestLoadBalancing`
|
|
- Verify even load distribution across brokers
|
|
- Test with varying message sizes and rates
|
|
- Monitor broker resource utilization
|
|
|
|
### 3. Failover and Reliability Tests
|
|
|
|
#### 3.1 Broker Failover
|
|
- **Test**: `TestBrokerFailover`
|
|
- Kill leader broker during publishing
|
|
- Verify seamless failover to follower
|
|
- Test data consistency after failover
|
|
|
|
- **Test**: `TestBrokerRecovery`
|
|
- Broker restart scenarios
|
|
- State recovery from storage
|
|
- Partition reassignment after recovery
|
|
|
|
#### 3.2 Data Durability
|
|
- **Test**: `TestMessagePersistence`
|
|
- Publish messages and restart cluster
|
|
- Verify all messages are recovered
|
|
- Test with different replication settings
|
|
|
|
- **Test**: `TestFollowerReplication`
|
|
- Leader-follower message replication
|
|
- Verify consistency between replicas
|
|
- Test follower promotion scenarios
|
|
|
|
### 4. Agent Functionality Tests
|
|
|
|
#### 4.1 Session Management
|
|
- **Test**: `TestPublishSessions`
|
|
- Create/close publish sessions
|
|
- Concurrent session management
|
|
- Session cleanup after failures
|
|
|
|
- **Test**: `TestSubscribeSessions`
|
|
- Subscribe session lifecycle
|
|
- Consumer group management
|
|
- Offset tracking and acknowledgments
|
|
|
|
#### 4.2 Error Handling
|
|
- **Test**: `TestConnectionFailures`
|
|
- Network partitions between agent and broker
|
|
- Automatic reconnection logic
|
|
- Message buffering during outages
|
|
|
|
### 5. Performance and Load Tests
|
|
|
|
#### 5.1 Throughput Tests
|
|
- **Test**: `TestHighThroughputPublish`
|
|
- Publish 100K+ messages/second
|
|
- Monitor system resources
|
|
- Verify no message loss
|
|
|
|
- **Test**: `TestHighThroughputSubscribe`
|
|
- Multiple consumers processing high volume
|
|
- Monitor processing latency
|
|
- Test backpressure handling
|
|
|
|
#### 5.2 Spike Traffic Tests
|
|
- **Test**: `TestTrafficSpikes`
|
|
- Sudden increase in message volume
|
|
- Auto-scaling behavior verification
|
|
- Resource utilization patterns
|
|
|
|
- **Test**: `TestLargeMessages`
|
|
- Messages with large payloads (MB size)
|
|
- Memory usage monitoring
|
|
- Storage efficiency testing
|
|
|
|
### 6. End-to-End Scenarios
|
|
|
|
#### 6.1 Complete Workflow Tests
|
|
- **Test**: `TestProducerConsumerWorkflow`
|
|
- Multi-stage data processing pipeline
|
|
- Producer → Topic → Multiple Consumers
|
|
- Data transformation and aggregation
|
|
|
|
- **Test**: `TestMultiTopicOperations`
|
|
- Multiple topics with different schemas
|
|
- Cross-topic message routing
|
|
- Topic management operations
|
|
|
|
## Test Infrastructure
|
|
|
|
### Environment Setup
|
|
|
|
#### Docker Compose Configuration
|
|
```yaml
|
|
# test-environment.yml
|
|
version: '3.9'
|
|
services:
|
|
master-cluster:
|
|
# 3 master nodes for HA
|
|
volume-cluster:
|
|
# 3 volume servers for data storage
|
|
filer-cluster:
|
|
# 2 filers for metadata
|
|
broker-cluster:
|
|
# 3 brokers for message processing
|
|
test-runner:
|
|
# Container to run integration tests
|
|
```
|
|
|
|
#### Test Data Management
|
|
- Pre-defined test schemas
|
|
- Sample message datasets
|
|
- Performance benchmarking data
|
|
|
|
### Test Framework Structure
|
|
|
|
```go
|
|
// Base test framework
|
|
type IntegrationTestSuite struct {
|
|
masters []string
|
|
brokers []string
|
|
filers []string
|
|
testClient *TestClient
|
|
cleanup []func()
|
|
}
|
|
|
|
// Test utilities
|
|
type TestClient struct {
|
|
publishers map[string]*pub_client.TopicPublisher
|
|
subscribers map[string]*sub_client.TopicSubscriber
|
|
agents []*agent.MessageQueueAgent
|
|
}
|
|
```
|
|
|
|
### Monitoring and Metrics
|
|
|
|
#### Health Checks
|
|
- Broker connectivity status
|
|
- Master cluster health
|
|
- Storage system availability
|
|
- Network connectivity between components
|
|
|
|
#### Performance Metrics
|
|
- Message throughput (msgs/sec)
|
|
- End-to-end latency
|
|
- Resource utilization (CPU, Memory, Disk)
|
|
- Network bandwidth usage
|
|
|
|
## Test Execution Strategy
|
|
|
|
### Parallel Test Execution
|
|
- Categorize tests by resource requirements
|
|
- Run independent tests in parallel
|
|
- Serialize tests that modify cluster state
|
|
|
|
### Continuous Integration
|
|
- Automated test runs on PR submissions
|
|
- Performance regression detection
|
|
- Multi-platform testing (Linux, macOS, Windows)
|
|
|
|
### Test Environment Management
|
|
- Docker-based isolated environments
|
|
- Automatic cleanup after test completion
|
|
- Resource monitoring and alerts
|
|
|
|
## Success Criteria
|
|
|
|
### Functional Requirements
|
|
- ✅ All messages published are received by subscribers
|
|
- ✅ Message ordering preserved within partitions
|
|
- ✅ Schema validation works correctly
|
|
- ✅ Auto-scaling triggers at expected thresholds
|
|
- ✅ Failover completes within 30 seconds
|
|
- ✅ No data loss during normal operations
|
|
|
|
### Performance Requirements
|
|
- ✅ Throughput: 50K+ messages/second/broker
|
|
- ✅ Latency: P95 < 100ms end-to-end
|
|
- ✅ Memory usage: < 1GB per broker under normal load
|
|
- ✅ Storage efficiency: < 20% overhead vs raw message size
|
|
|
|
### Reliability Requirements
|
|
- ✅ 99.9% uptime during normal operations
|
|
- ✅ Automatic recovery from single component failures
|
|
- ✅ Data consistency maintained across all scenarios
|
|
- ✅ Graceful degradation under resource constraints
|
|
|
|
## Implementation Timeline
|
|
|
|
### Phase 1: Core Functionality (Week 1-2)
|
|
- Basic pub/sub tests
|
|
- Schema validation tests
|
|
- Simple failover scenarios
|
|
|
|
### Phase 2: Advanced Features (Week 3-4)
|
|
- Auto-scaling tests
|
|
- Complex failover scenarios
|
|
- Agent functionality tests
|
|
|
|
### Phase 3: Performance & Load (Week 5-6)
|
|
- Throughput and latency tests
|
|
- Spike traffic handling
|
|
- Resource utilization monitoring
|
|
|
|
### Phase 4: End-to-End (Week 7-8)
|
|
- Complete workflow tests
|
|
- Multi-component integration
|
|
- Performance regression testing
|
|
|
|
## Maintenance and Updates
|
|
|
|
### Regular Updates
|
|
- Add tests for new features
|
|
- Update performance baselines
|
|
- Enhance error scenarios coverage
|
|
|
|
### Test Data Refresh
|
|
- Generate new test datasets quarterly
|
|
- Update schema examples
|
|
- Refresh performance benchmarks
|
|
|
|
This comprehensive test design ensures SeaweedMQ's reliability, performance, and functionality across all critical use cases and failure scenarios. |