mirror of
https://github.com/chrislusf/seaweedfs
synced 2025-09-10 13:22:47 +02:00
166 lines
5.7 KiB
Markdown
166 lines
5.7 KiB
Markdown
# Filer Benchmark Tool
|
|
|
|
A simple Go program to benchmark SeaweedFS filer performance and detect race conditions with concurrent file operations.
|
|
|
|
## Overview
|
|
|
|
This tool creates 300 (configurable) goroutines that concurrently:
|
|
1. Create empty files on the filer
|
|
2. Add multiple chunks to each file (with fake file IDs)
|
|
3. Verify the file was created successfully
|
|
|
|
This simulates the race condition scenario from [Issue #7062](https://github.com/seaweedfs/seaweedfs/issues/7062) where concurrent operations can lead to metadata inconsistencies.
|
|
|
|
## Usage
|
|
|
|
### Build and Run Directly
|
|
```bash
|
|
# Build the tool
|
|
go build -o bin/filer_benchmark ./cmd/filer_benchmark/
|
|
|
|
# Basic usage (single filer)
|
|
./bin/filer_benchmark -filers=localhost:8888
|
|
|
|
# Test with multiple filers
|
|
./bin/filer_benchmark -filers=localhost:8888,localhost:8889,localhost:8890
|
|
|
|
# High concurrency race condition test
|
|
./bin/filer_benchmark -goroutines=500 -loops=200 -verbose
|
|
```
|
|
|
|
### Using Helper Scripts
|
|
```bash
|
|
# Use the wrapper script with predefined configurations
|
|
./scripts/run_filer_benchmark.sh
|
|
|
|
# Run example test suite
|
|
./examples/run_filer_race_test.sh
|
|
```
|
|
|
|
## Configuration Options
|
|
|
|
| Flag | Default | Description |
|
|
|------|---------|-------------|
|
|
| `-filers` | `localhost:8888` | Comma-separated list of filer addresses |
|
|
| `-goroutines` | `300` | Number of concurrent goroutines |
|
|
| `-loops` | `100` | Number of operations per goroutine |
|
|
| `-chunkSize` | `1048576` | Chunk size in bytes (1MB) |
|
|
| `-chunksPerFile` | `5` | Number of chunks per file |
|
|
| `-testDir` | `/benchmark` | Test directory on filer |
|
|
| `-verbose` | `false` | Enable verbose error logging |
|
|
|
|
## Race Condition Detection
|
|
|
|
The tool detects race conditions by monitoring for these error patterns:
|
|
- `leveldb: closed` - Metadata cache closed during operation
|
|
- `transport is closing` - gRPC connection closed during operation
|
|
- `connection refused` - Network connectivity issues
|
|
- `not found after creation` - File disappeared after being created
|
|
|
|
## Example Output
|
|
|
|
```
|
|
============================================================
|
|
FILER BENCHMARK RESULTS
|
|
============================================================
|
|
Configuration:
|
|
Filers: localhost:8888,localhost:8889,localhost:8890
|
|
Goroutines: 300
|
|
Loops per goroutine: 100
|
|
Chunks per file: 5
|
|
Chunk size: 1048576 bytes
|
|
|
|
Results:
|
|
Total operations attempted: 30000
|
|
Files successfully created: 29850
|
|
Total chunks added: 149250
|
|
Errors: 150
|
|
Race condition errors: 23
|
|
Success rate: 99.50%
|
|
|
|
Performance:
|
|
Total duration: 45.2s
|
|
Operations/second: 663.72
|
|
Files/second: 660.18
|
|
Chunks/second: 3300.88
|
|
|
|
Race Condition Analysis:
|
|
Race condition rate: 0.0767%
|
|
Race conditions detected: 23
|
|
🟡 MODERATE race condition rate
|
|
Overall error rate: 0.50%
|
|
============================================================
|
|
```
|
|
|
|
## Test Scenarios
|
|
|
|
### 1. Basic Functionality Test
|
|
```bash
|
|
./bin/filer_benchmark -goroutines=20 -loops=10
|
|
```
|
|
Low concurrency test to verify basic functionality.
|
|
|
|
### 2. Race Condition Reproduction
|
|
```bash
|
|
./bin/filer_benchmark -goroutines=500 -loops=100 -verbose
|
|
```
|
|
High concurrency test designed to trigger race conditions.
|
|
|
|
### 3. Multi-Filer Load Test
|
|
```bash
|
|
./bin/filer_benchmark -filers=filer1:8888,filer2:8888,filer3:8888 -goroutines=300
|
|
```
|
|
Distribute load across multiple filers.
|
|
|
|
### 4. Small Files Benchmark
|
|
```bash
|
|
./bin/filer_benchmark -chunkSize=4096 -chunksPerFile=1 -goroutines=1000
|
|
```
|
|
Test with many small files to stress metadata operations.
|
|
|
|
## How It Simulates Race Conditions
|
|
|
|
1. **Concurrent Operations**: Multiple goroutines perform file operations simultaneously
|
|
2. **Random Timing**: Small random delays create timing variations
|
|
3. **Fake Chunks**: Uses file IDs without actual volume server data to focus on metadata operations
|
|
4. **Verification Step**: Attempts to read files immediately after creation to catch race conditions
|
|
5. **Multiple Filers**: Distributes load randomly across multiple filer instances
|
|
|
|
## Prerequisites
|
|
|
|
- SeaweedFS master server running
|
|
- SeaweedFS filer server(s) running
|
|
- Go 1.19+ for building
|
|
- Network connectivity to filer endpoints
|
|
|
|
## Integration with Issue #7062
|
|
|
|
This tool reproduces the core problem from the original issue:
|
|
- **Concurrent file operations** (simulated by goroutines)
|
|
- **Metadata race conditions** (detected through error patterns)
|
|
- **Transport disconnections** (monitored in error analysis)
|
|
- **File inconsistencies** (caught by verification steps)
|
|
|
|
The key difference is this tool focuses on the filer metadata layer rather than the full CSI driver + mount stack, making it easier to isolate and debug the race condition.
|
|
|
|
## Debugging Findings
|
|
|
|
### Multi-Filer vs Single-Filer Connection Issue
|
|
|
|
**Problem**: When using multiple filers with independent stores (non-shared backend), the benchmark may fail with errors like:
|
|
- `update entry with chunks failed: rpc error: code = Unknown desc = not found /benchmark/file_X: filer: no entry is found in filer store`
|
|
- `CreateEntry /benchmark/file_X: /benchmark should be a directory`
|
|
|
|
**Root Cause**: The issue is NOT missing metadata events, but rather the benchmark's round-robin load balancing across filers:
|
|
|
|
1. **File Creation**: Benchmark creates `file_X` on `filer1`
|
|
2. **Chunk Updates**: Benchmark tries to update `file_X` on `filer2` or `filer3`
|
|
3. **Error**: `filer2`/`filer3` don't have `file_X` in their local store yet (metadata sync delay)
|
|
|
|
**Verification**: Running with single filer connection (`-filers localhost:18888`) while 3 filers are running shows **NO missed events**, confirming metadata synchronization works correctly.
|
|
|
|
**Solutions**:
|
|
- Ensure `/benchmark` directory exists on ALL filers before starting
|
|
- Use file affinity (same filer for create/update operations)
|
|
- Add retry logic for cross-filer operations
|
|
- Add small delays to allow metadata sync between operations
|