1
0
Fork 0
mirror of https://github.com/chrislusf/seaweedfs synced 2025-06-29 16:22:46 +02:00
seaweedfs/test/mq/integration_test_design.md
2025-06-23 10:55:02 -07:00

7.9 KiB

SeaweedMQ Integration Test Design

Overview

This document outlines the comprehensive integration test strategy for SeaweedMQ, covering all critical functionalities from basic pub/sub operations to advanced features like auto-scaling, failover, and performance testing.

Architecture Under Test

SeaweedMQ consists of:

  • Masters: Cluster coordination and metadata management
  • Volume Servers: Storage layer for persistent messages
  • Filers: File system interface for metadata storage
  • Brokers: Message processing and routing (stateless)
  • Agents: Client interface for pub/sub operations
  • Schema System: Protobuf-based message schema management

Test Categories

1. Basic Functionality Tests

1.1 Basic Pub/Sub Operations

  • Test: TestBasicPublishSubscribe

    • Publish messages to a topic
    • Subscribe and receive messages
    • Verify message content and ordering
    • Test with different data types (string, int, bytes, records)
  • Test: TestMultipleConsumers

    • Multiple subscribers on same topic
    • Verify message distribution
    • Test consumer group functionality
  • Test: TestMessageOrdering

    • Publish messages in sequence
    • Verify FIFO ordering within partitions
    • Test with different partition keys

1.2 Schema Management

  • Test: TestSchemaValidation

    • Publish with valid schemas
    • Reject invalid schema messages
    • Test schema evolution scenarios
  • Test: TestRecordTypes

    • Nested record structures
    • List types and complex schemas
    • Schema-to-Parquet conversion

2. Partitioning and Scaling Tests

2.1 Partition Management

  • Test: TestPartitionDistribution

    • Messages distributed across partitions based on keys
    • Verify partition assignment logic
    • Test partition rebalancing
  • Test: TestAutoSplitMerge

    • Simulate high load to trigger auto-split
    • Simulate low load to trigger auto-merge
    • Verify data consistency during splits/merges

2.2 Broker Scaling

  • Test: TestBrokerAddRemove

    • Add brokers during operation
    • Remove brokers gracefully
    • Verify partition reassignment
  • Test: TestLoadBalancing

    • Verify even load distribution across brokers
    • Test with varying message sizes and rates
    • Monitor broker resource utilization

3. Failover and Reliability Tests

3.1 Broker Failover

  • Test: TestBrokerFailover

    • Kill leader broker during publishing
    • Verify seamless failover to follower
    • Test data consistency after failover
  • Test: TestBrokerRecovery

    • Broker restart scenarios
    • State recovery from storage
    • Partition reassignment after recovery

3.2 Data Durability

  • Test: TestMessagePersistence

    • Publish messages and restart cluster
    • Verify all messages are recovered
    • Test with different replication settings
  • Test: TestFollowerReplication

    • Leader-follower message replication
    • Verify consistency between replicas
    • Test follower promotion scenarios

4. Agent Functionality Tests

4.1 Session Management

  • Test: TestPublishSessions

    • Create/close publish sessions
    • Concurrent session management
    • Session cleanup after failures
  • Test: TestSubscribeSessions

    • Subscribe session lifecycle
    • Consumer group management
    • Offset tracking and acknowledgments

4.2 Error Handling

  • Test: TestConnectionFailures
    • Network partitions between agent and broker
    • Automatic reconnection logic
    • Message buffering during outages

5. Performance and Load Tests

5.1 Throughput Tests

  • Test: TestHighThroughputPublish

    • Publish 100K+ messages/second
    • Monitor system resources
    • Verify no message loss
  • Test: TestHighThroughputSubscribe

    • Multiple consumers processing high volume
    • Monitor processing latency
    • Test backpressure handling

5.2 Spike Traffic Tests

  • Test: TestTrafficSpikes

    • Sudden increase in message volume
    • Auto-scaling behavior verification
    • Resource utilization patterns
  • Test: TestLargeMessages

    • Messages with large payloads (MB size)
    • Memory usage monitoring
    • Storage efficiency testing

6. End-to-End Scenarios

6.1 Complete Workflow Tests

  • Test: TestProducerConsumerWorkflow

    • Multi-stage data processing pipeline
    • Producer → Topic → Multiple Consumers
    • Data transformation and aggregation
  • Test: TestMultiTopicOperations

    • Multiple topics with different schemas
    • Cross-topic message routing
    • Topic management operations

Test Infrastructure

Environment Setup

Docker Compose Configuration

# test-environment.yml
version: '3.9'
services:
  master-cluster:
    # 3 master nodes for HA
  volume-cluster:
    # 3 volume servers for data storage
  filer-cluster:
    # 2 filers for metadata
  broker-cluster:
    # 3 brokers for message processing
  test-runner:
    # Container to run integration tests

Test Data Management

  • Pre-defined test schemas
  • Sample message datasets
  • Performance benchmarking data

Test Framework Structure

// Base test framework
type IntegrationTestSuite struct {
    masters     []string
    brokers     []string
    filers      []string
    testClient  *TestClient
    cleanup     []func()
}

// Test utilities
type TestClient struct {
    publishers  map[string]*pub_client.TopicPublisher
    subscribers map[string]*sub_client.TopicSubscriber
    agents      []*agent.MessageQueueAgent
}

Monitoring and Metrics

Health Checks

  • Broker connectivity status
  • Master cluster health
  • Storage system availability
  • Network connectivity between components

Performance Metrics

  • Message throughput (msgs/sec)
  • End-to-end latency
  • Resource utilization (CPU, Memory, Disk)
  • Network bandwidth usage

Test Execution Strategy

Parallel Test Execution

  • Categorize tests by resource requirements
  • Run independent tests in parallel
  • Serialize tests that modify cluster state

Continuous Integration

  • Automated test runs on PR submissions
  • Performance regression detection
  • Multi-platform testing (Linux, macOS, Windows)

Test Environment Management

  • Docker-based isolated environments
  • Automatic cleanup after test completion
  • Resource monitoring and alerts

Success Criteria

Functional Requirements

  • All messages published are received by subscribers
  • Message ordering preserved within partitions
  • Schema validation works correctly
  • Auto-scaling triggers at expected thresholds
  • Failover completes within 30 seconds
  • No data loss during normal operations

Performance Requirements

  • Throughput: 50K+ messages/second/broker
  • Latency: P95 < 100ms end-to-end
  • Memory usage: < 1GB per broker under normal load
  • Storage efficiency: < 20% overhead vs raw message size

Reliability Requirements

  • 99.9% uptime during normal operations
  • Automatic recovery from single component failures
  • Data consistency maintained across all scenarios
  • Graceful degradation under resource constraints

Implementation Timeline

Phase 1: Core Functionality (Week 1-2)

  • Basic pub/sub tests
  • Schema validation tests
  • Simple failover scenarios

Phase 2: Advanced Features (Week 3-4)

  • Auto-scaling tests
  • Complex failover scenarios
  • Agent functionality tests

Phase 3: Performance & Load (Week 5-6)

  • Throughput and latency tests
  • Spike traffic handling
  • Resource utilization monitoring

Phase 4: End-to-End (Week 7-8)

  • Complete workflow tests
  • Multi-component integration
  • Performance regression testing

Maintenance and Updates

Regular Updates

  • Add tests for new features
  • Update performance baselines
  • Enhance error scenarios coverage

Test Data Refresh

  • Generate new test datasets quarterly
  • Update schema examples
  • Refresh performance benchmarks

This comprehensive test design ensures SeaweedMQ's reliability, performance, and functionality across all critical use cases and failure scenarios.