# SeaweedMQ Integration Test Design ## Overview This document outlines the comprehensive integration test strategy for SeaweedMQ, covering all critical functionalities from basic pub/sub operations to advanced features like auto-scaling, failover, and performance testing. ## Architecture Under Test SeaweedMQ consists of: - **Masters**: Cluster coordination and metadata management - **Volume Servers**: Storage layer for persistent messages - **Filers**: File system interface for metadata storage - **Brokers**: Message processing and routing (stateless) - **Agents**: Client interface for pub/sub operations - **Schema System**: Protobuf-based message schema management ## Test Categories ### 1. Basic Functionality Tests #### 1.1 Basic Pub/Sub Operations - **Test**: `TestBasicPublishSubscribe` - Publish messages to a topic - Subscribe and receive messages - Verify message content and ordering - Test with different data types (string, int, bytes, records) - **Test**: `TestMultipleConsumers` - Multiple subscribers on same topic - Verify message distribution - Test consumer group functionality - **Test**: `TestMessageOrdering` - Publish messages in sequence - Verify FIFO ordering within partitions - Test with different partition keys #### 1.2 Schema Management - **Test**: `TestSchemaValidation` - Publish with valid schemas - Reject invalid schema messages - Test schema evolution scenarios - **Test**: `TestRecordTypes` - Nested record structures - List types and complex schemas - Schema-to-Parquet conversion ### 2. Partitioning and Scaling Tests #### 2.1 Partition Management - **Test**: `TestPartitionDistribution` - Messages distributed across partitions based on keys - Verify partition assignment logic - Test partition rebalancing - **Test**: `TestAutoSplitMerge` - Simulate high load to trigger auto-split - Simulate low load to trigger auto-merge - Verify data consistency during splits/merges #### 2.2 Broker Scaling - **Test**: `TestBrokerAddRemove` - Add brokers during operation - Remove brokers gracefully - Verify partition reassignment - **Test**: `TestLoadBalancing` - Verify even load distribution across brokers - Test with varying message sizes and rates - Monitor broker resource utilization ### 3. Failover and Reliability Tests #### 3.1 Broker Failover - **Test**: `TestBrokerFailover` - Kill leader broker during publishing - Verify seamless failover to follower - Test data consistency after failover - **Test**: `TestBrokerRecovery` - Broker restart scenarios - State recovery from storage - Partition reassignment after recovery #### 3.2 Data Durability - **Test**: `TestMessagePersistence` - Publish messages and restart cluster - Verify all messages are recovered - Test with different replication settings - **Test**: `TestFollowerReplication` - Leader-follower message replication - Verify consistency between replicas - Test follower promotion scenarios ### 4. Agent Functionality Tests #### 4.1 Session Management - **Test**: `TestPublishSessions` - Create/close publish sessions - Concurrent session management - Session cleanup after failures - **Test**: `TestSubscribeSessions` - Subscribe session lifecycle - Consumer group management - Offset tracking and acknowledgments #### 4.2 Error Handling - **Test**: `TestConnectionFailures` - Network partitions between agent and broker - Automatic reconnection logic - Message buffering during outages ### 5. Performance and Load Tests #### 5.1 Throughput Tests - **Test**: `TestHighThroughputPublish` - Publish 100K+ messages/second - Monitor system resources - Verify no message loss - **Test**: `TestHighThroughputSubscribe` - Multiple consumers processing high volume - Monitor processing latency - Test backpressure handling #### 5.2 Spike Traffic Tests - **Test**: `TestTrafficSpikes` - Sudden increase in message volume - Auto-scaling behavior verification - Resource utilization patterns - **Test**: `TestLargeMessages` - Messages with large payloads (MB size) - Memory usage monitoring - Storage efficiency testing ### 6. End-to-End Scenarios #### 6.1 Complete Workflow Tests - **Test**: `TestProducerConsumerWorkflow` - Multi-stage data processing pipeline - Producer → Topic → Multiple Consumers - Data transformation and aggregation - **Test**: `TestMultiTopicOperations` - Multiple topics with different schemas - Cross-topic message routing - Topic management operations ## Test Infrastructure ### Environment Setup #### Docker Compose Configuration ```yaml # test-environment.yml version: '3.9' services: master-cluster: # 3 master nodes for HA volume-cluster: # 3 volume servers for data storage filer-cluster: # 2 filers for metadata broker-cluster: # 3 brokers for message processing test-runner: # Container to run integration tests ``` #### Test Data Management - Pre-defined test schemas - Sample message datasets - Performance benchmarking data ### Test Framework Structure ```go // Base test framework type IntegrationTestSuite struct { masters []string brokers []string filers []string testClient *TestClient cleanup []func() } // Test utilities type TestClient struct { publishers map[string]*pub_client.TopicPublisher subscribers map[string]*sub_client.TopicSubscriber agents []*agent.MessageQueueAgent } ``` ### Monitoring and Metrics #### Health Checks - Broker connectivity status - Master cluster health - Storage system availability - Network connectivity between components #### Performance Metrics - Message throughput (msgs/sec) - End-to-end latency - Resource utilization (CPU, Memory, Disk) - Network bandwidth usage ## Test Execution Strategy ### Parallel Test Execution - Categorize tests by resource requirements - Run independent tests in parallel - Serialize tests that modify cluster state ### Continuous Integration - Automated test runs on PR submissions - Performance regression detection - Multi-platform testing (Linux, macOS, Windows) ### Test Environment Management - Docker-based isolated environments - Automatic cleanup after test completion - Resource monitoring and alerts ## Success Criteria ### Functional Requirements - ✅ All messages published are received by subscribers - ✅ Message ordering preserved within partitions - ✅ Schema validation works correctly - ✅ Auto-scaling triggers at expected thresholds - ✅ Failover completes within 30 seconds - ✅ No data loss during normal operations ### Performance Requirements - ✅ Throughput: 50K+ messages/second/broker - ✅ Latency: P95 < 100ms end-to-end - ✅ Memory usage: < 1GB per broker under normal load - ✅ Storage efficiency: < 20% overhead vs raw message size ### Reliability Requirements - ✅ 99.9% uptime during normal operations - ✅ Automatic recovery from single component failures - ✅ Data consistency maintained across all scenarios - ✅ Graceful degradation under resource constraints ## Implementation Timeline ### Phase 1: Core Functionality (Week 1-2) - Basic pub/sub tests - Schema validation tests - Simple failover scenarios ### Phase 2: Advanced Features (Week 3-4) - Auto-scaling tests - Complex failover scenarios - Agent functionality tests ### Phase 3: Performance & Load (Week 5-6) - Throughput and latency tests - Spike traffic handling - Resource utilization monitoring ### Phase 4: End-to-End (Week 7-8) - Complete workflow tests - Multi-component integration - Performance regression testing ## Maintenance and Updates ### Regular Updates - Add tests for new features - Update performance baselines - Enhance error scenarios coverage ### Test Data Refresh - Generate new test datasets quarterly - Update schema examples - Refresh performance benchmarks This comprehensive test design ensures SeaweedMQ's reliability, performance, and functionality across all critical use cases and failure scenarios.