* feat: Phase 1 - Add SQL query engine foundation for MQ topics Implements core SQL infrastructure with metadata operations: New Components: - SQL parser integration using github.com/xwb1989/sqlparser - Query engine framework in weed/query/engine/ - Schema catalog mapping MQ topics to SQL tables - Interactive SQL CLI command 'weed sql' Supported Operations: - SHOW DATABASES (lists MQ namespaces) - SHOW TABLES (lists MQ topics) - SQL statement parsing and routing - Error handling and result formatting Key Design Decisions: - MQ namespaces ↔ SQL databases - MQ topics ↔ SQL tables - Parquet message storage ready for querying - Backward-compatible schema evolution support Testing: - Unit tests for core engine functionality - Command integration tests - Parse error handling validation Assumptions (documented in code): - All MQ messages stored in Parquet format - Schema evolution maintains backward compatibility - MySQL-compatible SQL syntax via sqlparser - Single-threaded usage per SQL session Next Phase: DDL operations (CREATE/ALTER/DROP TABLE) * feat: Phase 2 - Add DDL operations and real MQ broker integration Implements comprehensive DDL support for MQ topic management: New Components: - Real MQ broker connectivity via BrokerClient - CREATE TABLE → ConfigureTopic gRPC calls - DROP TABLE → DeleteTopic operations - DESCRIBE table → Schema introspection - SQL type mapping (SQL ↔ MQ schema types) Enhanced Features: - Live topic discovery from MQ broker - Fallback to cached/sample data when broker unavailable - MySQL-compatible DESCRIBE output - Schema validation and error handling - CREATE TABLE with column definitions Key Infrastructure: - broker_client.go: gRPC communication with MQ broker - sql_types.go: Bidirectional SQL/MQ type conversion - describe.go: Table schema introspection - Enhanced engine.go: Full DDL routing and execution Supported SQL Operations: ✅ SHOW DATABASES, SHOW TABLES (live + fallback) ✅ CREATE TABLE table_name (col1 INT, col2 VARCHAR(50), ...) ✅ DROP TABLE table_name ✅ DESCRIBE table_name / SHOW COLUMNS FROM table_name Known Limitations: - SQL parser issues with reserved keywords (e.g., 'timestamp') - Requires running MQ broker for full functionality - ALTER TABLE not yet implemented - DeleteTopic method needs broker-side implementation Architecture Decisions: - Broker discovery via filer lock mechanism (same as shell commands) - Graceful fallback when broker unavailable - ConfigureTopic for CREATE TABLE with 6 default partitions - Schema versioning ready for ALTER TABLE support Testing: - Unit tests updated with filer address parameter - Integration tests for DDL operations - Error handling for connection failures Next Phase: SELECT query execution with Parquet scanning * fix: Resolve High Priority TODOs - Real MQ Broker Integration ✅ COMPLETED HIGH PRIORITY TODOs: 🔧 **Real FilerClient Integration** (engine.go:131) - Implemented GetFilerClient() method in BrokerClient - Added filerClientImpl with full FilerClient interface compliance - Added AdjustedUrl() and GetDataCenter() methods - Real filerClient connection replaces nil fallback 🔧 **Partition Discovery via MQ Broker** (hybrid_message_scanner.go:116) - Added ListTopicPartitions() method using topic configuration - Implemented discoverTopicPartitions() in HybridMessageScanner - Reads actual partition count from BrokerPartitionAssignments - Generates proper partition ranges based on topic.PartitionCount 📋 **Technical Fixes:** - Fixed compilation errors with undefined variables - Proper error handling with filerClientErr variable - Corrected ConfigureTopicResponse field usage (BrokerPartitionAssignments vs PartitionCount) - Complete FilerClient interface implementation 🎯 **Impact:** - SQL engine now connects to real MQ broker infrastructure - Actual topic partition discovery instead of hardcoded defaults - Production-ready broker integration with graceful fallbacks - Maintains backward compatibility with sample data when broker unavailable ✅ All tests passing - High priority TODO resolution complete! Next: Schema-aware message parsing and time filter optimization. * feat: Time Filter Extraction - Complete Performance Optimization ✅ FOURTH HIGH PRIORITY TODO COMPLETED! ⏰ **Time Filter Extraction & Push-Down Optimization** (engine.go:198-199) - Replaced hardcoded StartTimeNs=0, StopTimeNs=0 with intelligent extraction - Added extractTimeFilters() with recursive WHERE clause analysis - Smart time column detection (\_timestamp_ns, created_at, timestamp, etc.) - Comprehensive time value parsing (nanoseconds, ISO dates, datetime formats) - Operator reversal handling (column op value vs value op column) 🧠 **Intelligent WHERE Clause Processing:** - AND expressions: Combine time bounds (intersection) ✅ - OR expressions: Skip extraction (safety) ✅ - Parentheses: Recursive unwrapping ✅ - Comparison operators: >, >=, <, <=, = ✅ - Multiple time formats: nanoseconds, RFC3339, date-only, datetime ✅ 🚀 **Performance Impact:** - Push-down filtering to hybrid scanner level - Reduced data scanning at source (live logs + Parquet files) - Time-based partition pruning potential - Significant performance gains for time-series queries 📊 **Comprehensive Testing (21 tests passing):** - ✅ Time filter extraction (6 test scenarios) - ✅ Time column recognition (case-insensitive) - ✅ Time value parsing (5 formats) - ✅ Full integration with SELECT queries - ✅ Backward compatibility maintained 💡 **Real-World Query Examples:** Before: Scans ALL data, filters in memory SELECT * FROM events WHERE \_timestamp_ns > 1672531200000000000; After: Scans ONLY relevant time range at source level → StartTimeNs=1672531200000000000, StopTimeNs=0 → Massive performance improvement for large datasets! 🎯 **Production Ready Features:** - Multiple time column formats supported - Graceful fallbacks for invalid dates - OR clause safety (avoids incorrect optimization) - Comprehensive error handling **ALL MEDIUM PRIORITY TODOs NOW READY FOR NEXT PHASEtest ./weed/query/engine/ -v* 🎉 * feat: Extended WHERE Operators - Complete Advanced Filtering ✅ **EXTENDED WHERE OPERATORS IMPLEMENTEDtest ./weed/query/engine/ -v | grep -E PASS * feat: Enhanced SQL CLI Experience ✅ COMPLETE ENHANCED CLI IMPLEMENTATION: 🚀 **Multiple Execution Modes:** - Interactive shell with enhanced prompts and context - Single query execution: --query 'SQL' --output format - Batch file processing: --file queries.sql --output csv - Database context switching: --database dbname 📊 **Multi-Format Output:** - Table format (ASCII) - default for interactive - JSON format - structured data for programmatic use - CSV format - spreadsheet-friendly output - Smart auto-detection based on execution mode ⚙️ **Enhanced Interactive Shell:** - Database context switching: USE database_name; - Output format switching: \format table|json|csv - Command history tracking (basic implementation) - Enhanced help with WHERE operator examples - Contextual prompts: seaweedfs:dbname> 🛠️ **Production Features:** - Comprehensive error handling (JSON + user-friendly) - Query execution timing and performance metrics - 30-second timeout protection with graceful handling - Real MQ integration with hybrid data scanning 📖 **Complete CLI Interface:** - Full flag support: --server, --interactive, --file, --output, --database, --query - Auto-detection of execution mode and output format - Structured help system with practical examples - Batch processing with multi-query file support 💡 **Advanced WHERE Integration:** All extended operators (<=, >=, !=, LIKE, IN) fully supported across all execution modes and output formats. 🎯 **Usage Examples:** - weed sql --interactive - weed sql --query 'SHOW DATABASES' --output json - weed sql --file queries.sql --output csv - weed sql --database analytics --interactive Enhanced CLI experience complete - production ready! 🚀 * Delete test_utils_test.go * fmt * integer conversion * show databases works * show tables works * Update describe.go * actual column types * Update .gitignore * scan topic messages * remove emoji * support aggregation functions * column name case insensitive, better auto column names * fmt * fix reading system fields * use parquet statistics for optimization * remove emoji * parquet file generate stats * scan all files * parquet file generation remember the sources also * fmt * sql * truncate topic * combine parquet results with live logs * explain * explain the execution plan * add tests * improve tests * skip * use mock for testing * add tests * refactor * fix after refactoring * detailed logs during explain. Fix bugs on reading live logs. * fix decoding data * save source buffer index start for log files * process buffer from brokers * filter out already flushed messages * dedup with buffer start index * explain with broker buffer * the parquet file should also remember the first buffer_start attribute from the sources * parquet file can query messages in broker memory, if log files do not exist * buffer start stored as 8 bytes * add jdbc * add postgres protocol * Revert "add jdbc" This reverts commit |
||
---|---|---|
.. | ||
config | ||
.dockerignore | ||
client.go | ||
docker-compose.yml | ||
Dockerfile.client | ||
Dockerfile.producer | ||
Dockerfile.seaweedfs | ||
Makefile | ||
producer.go | ||
README.md | ||
run-tests.sh | ||
SETUP_OVERVIEW.md | ||
validate-setup.sh |
SeaweedFS PostgreSQL Protocol Test Suite
This directory contains a comprehensive Docker Compose test setup for the SeaweedFS PostgreSQL wire protocol implementation.
Overview
The test suite includes:
- SeaweedFS Cluster: Full SeaweedFS server with MQ broker and agent
- PostgreSQL Server: SeaweedFS PostgreSQL wire protocol server
- MQ Data Producer: Creates realistic test data across multiple topics and namespaces
- PostgreSQL Test Client: Comprehensive Go client testing all functionality
- Interactive Tools: psql CLI access for manual testing
Quick Start
1. Run Complete Test Suite (Automated)
./run-tests.sh all
This will automatically:
- Start SeaweedFS and PostgreSQL servers
- Create test data in multiple MQ topics
- Run comprehensive PostgreSQL client tests
- Show results
2. Manual Step-by-Step Testing
# Start the services
./run-tests.sh start
# Create test data
./run-tests.sh produce
# Run automated tests
./run-tests.sh test
# Connect with psql for interactive testing
./run-tests.sh psql
3. Interactive PostgreSQL Testing
# Connect with psql
./run-tests.sh psql
# Inside psql session:
postgres=> SHOW DATABASES;
postgres=> \c analytics;
postgres=> SHOW TABLES;
postgres=> SELECT COUNT(*) FROM user_events;
postgres=> SELECT COUNT(*) FROM user_events;
postgres=> \q
Test Data Structure
The producer creates realistic test data across multiple namespaces:
Analytics Namespace
-
user_events
(1000 records): User interaction events- Fields: id, user_id, user_type, action, status, amount, timestamp, metadata
- User types: premium, standard, trial, enterprise
- Actions: login, logout, purchase, view, search, click, download
-
system_logs
(500 records): System operation logs- Fields: id, level, service, message, error_code, timestamp
- Levels: debug, info, warning, error, critical
- Services: auth-service, payment-service, user-service, etc.
-
metrics
(800 records): System metrics- Fields: id, name, value, tags, timestamp
- Metrics: cpu_usage, memory_usage, disk_usage, request_latency, etc.
E-commerce Namespace
-
product_views
(1200 records): Product interaction data- Fields: id, product_id, user_id, category, price, view_count, timestamp
- Categories: electronics, books, clothing, home, sports, automotive
-
user_events
(600 records): E-commerce specific user events
Logs Namespace
application_logs
(2000 records): Application logserror_logs
(300 records): Error-specific logs with 4xx/5xx error codes
Architecture
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ PostgreSQL │ │ PostgreSQL │ │ SeaweedFS │
│ Clients │◄──►│ Wire Protocol │◄──►│ SQL Engine │
│ (psql, Go) │ │ Server │ │ │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌─────────────────┐
│ Session │ │ MQ Broker │
│ Management │ │ & Topics │
└──────────────────┘ └─────────────────┘
Services
SeaweedFS Server
- Ports: 9333 (master), 8888 (filer), 8333 (S3), 8085 (volume), 9533 (metrics), 26777→16777 (MQ agent), 27777→17777 (MQ broker)
- Features: Full MQ broker, S3 API, filer, volume server
- Data: Persistent storage in Docker volume
- Health Check: Cluster status endpoint
PostgreSQL Server
- Port: 5432 (standard PostgreSQL port)
- Protocol: Full PostgreSQL 3.0 wire protocol
- Authentication: Trust mode (no password for testing)
- Features: Real-time MQ topic discovery, database context switching
MQ Producer
- Purpose: Creates realistic test data
- Topics: 7 topics across 3 namespaces
- Data Types: JSON messages with varied schemas
- Volume: ~4,400 total records with realistic distributions
Test Client
- Language: Go with standard
lib/pq
PostgreSQL driver - Tests: 8 comprehensive test categories
- Coverage: System info, discovery, queries, aggregations, context switching
Available Commands
./run-tests.sh start # Start services
./run-tests.sh produce # Create test data
./run-tests.sh test # Run client tests
./run-tests.sh psql # Interactive psql
./run-tests.sh logs # Show service logs
./run-tests.sh status # Service status
./run-tests.sh stop # Stop services
./run-tests.sh clean # Complete cleanup
./run-tests.sh all # Full automated test
Test Categories
1. System Information
- PostgreSQL version compatibility
- Current user and database
- Server settings and encoding
2. Database Discovery
SHOW DATABASES
- List MQ namespaces- Dynamic namespace discovery from filer
3. Table Discovery
SHOW TABLES
- List topics in current namespace- Real-time topic discovery
4. Data Queries
- Basic
SELECT * FROM table
queries - Sample data retrieval and display
- Column information
5. Aggregation Queries
COUNT(*)
,SUM()
,AVG()
,MIN()
,MAX()
- Aggregation operations
- Statistical analysis
6. Database Context Switching
USE database
commands- Session isolation testing
- Cross-namespace queries
7. System Columns
_timestamp_ns
,_key
,_source
access- MQ metadata exposure
8. Complex Queries
WHERE
clauses with comparisonsLIMIT
- Multi-condition filtering
Expected Results
After running the complete test suite, you should see:
=== Test Results ===
✅ Test PASSED: System Information
✅ Test PASSED: Database Discovery
✅ Test PASSED: Table Discovery
✅ Test PASSED: Data Queries
✅ Test PASSED: Aggregation Queries
✅ Test PASSED: Database Context Switching
✅ Test PASSED: System Columns
✅ Test PASSED: Complex Queries
Test Results: 8/8 tests passed
🎉 All tests passed!
Manual Testing Examples
Connect with psql
./run-tests.sh psql
Basic Exploration
-- Check system information
SELECT version();
SELECT current_user, current_database();
-- Discover data structure
SHOW DATABASES;
\c analytics;
SHOW TABLES;
DESCRIBE user_events;
Data Analysis
-- Basic queries
SELECT COUNT(*) FROM user_events;
SELECT * FROM user_events LIMIT 5;
-- Aggregations
SELECT
COUNT(*) as events,
AVG(amount) as avg_amount
FROM user_events
WHERE amount IS NOT NULL;
-- Time-based analysis
SELECT
COUNT(*) as count
FROM user_events
WHERE status = 'active';
Cross-Namespace Analysis
-- Switch between namespaces
USE ecommerce;
SELECT COUNT(*) FROM product_views;
USE logs;
SELECT COUNT(*) FROM application_logs;
Troubleshooting
Services Not Starting
# Check service status
./run-tests.sh status
# View logs
./run-tests.sh logs seaweedfs
./run-tests.sh logs postgres-server
No Test Data
# Recreate test data
./run-tests.sh produce
# Check producer logs
./run-tests.sh logs mq-producer
Connection Issues
# Test PostgreSQL server health
docker-compose exec postgres-server nc -z localhost 5432
# Test SeaweedFS health
curl http://localhost:9333/cluster/status
Clean Restart
# Complete cleanup and restart
./run-tests.sh clean
./run-tests.sh all
Development
Modifying Test Data
Edit producer.go
to change:
- Data schemas and volume
- Topic names and namespaces
- Record generation logic
Adding Tests
Edit client.go
to add new test functions:
func testNewFeature(db *sql.DB) error {
// Your test implementation
return nil
}
// Add to tests slice in main()
{"New Feature", testNewFeature},
Custom Queries
Use the interactive psql session:
./run-tests.sh psql
Production Considerations
This test setup demonstrates:
- Real MQ Integration: Actual topic discovery and data access
- Universal PostgreSQL Compatibility: Works with any PostgreSQL client
- Production-Ready Features: Authentication, session management, error handling
- Scalable Architecture: Direct SQL engine integration, no translation overhead
The test validates that SeaweedFS can serve as a drop-in PostgreSQL replacement for read-only analytics workloads on MQ data.