* feat: Phase 1 - Add SQL query engine foundation for MQ topics Implements core SQL infrastructure with metadata operations: New Components: - SQL parser integration using github.com/xwb1989/sqlparser - Query engine framework in weed/query/engine/ - Schema catalog mapping MQ topics to SQL tables - Interactive SQL CLI command 'weed sql' Supported Operations: - SHOW DATABASES (lists MQ namespaces) - SHOW TABLES (lists MQ topics) - SQL statement parsing and routing - Error handling and result formatting Key Design Decisions: - MQ namespaces ↔ SQL databases - MQ topics ↔ SQL tables - Parquet message storage ready for querying - Backward-compatible schema evolution support Testing: - Unit tests for core engine functionality - Command integration tests - Parse error handling validation Assumptions (documented in code): - All MQ messages stored in Parquet format - Schema evolution maintains backward compatibility - MySQL-compatible SQL syntax via sqlparser - Single-threaded usage per SQL session Next Phase: DDL operations (CREATE/ALTER/DROP TABLE) * feat: Phase 2 - Add DDL operations and real MQ broker integration Implements comprehensive DDL support for MQ topic management: New Components: - Real MQ broker connectivity via BrokerClient - CREATE TABLE → ConfigureTopic gRPC calls - DROP TABLE → DeleteTopic operations - DESCRIBE table → Schema introspection - SQL type mapping (SQL ↔ MQ schema types) Enhanced Features: - Live topic discovery from MQ broker - Fallback to cached/sample data when broker unavailable - MySQL-compatible DESCRIBE output - Schema validation and error handling - CREATE TABLE with column definitions Key Infrastructure: - broker_client.go: gRPC communication with MQ broker - sql_types.go: Bidirectional SQL/MQ type conversion - describe.go: Table schema introspection - Enhanced engine.go: Full DDL routing and execution Supported SQL Operations: ✅ SHOW DATABASES, SHOW TABLES (live + fallback) ✅ CREATE TABLE table_name (col1 INT, col2 VARCHAR(50), ...) ✅ DROP TABLE table_name ✅ DESCRIBE table_name / SHOW COLUMNS FROM table_name Known Limitations: - SQL parser issues with reserved keywords (e.g., 'timestamp') - Requires running MQ broker for full functionality - ALTER TABLE not yet implemented - DeleteTopic method needs broker-side implementation Architecture Decisions: - Broker discovery via filer lock mechanism (same as shell commands) - Graceful fallback when broker unavailable - ConfigureTopic for CREATE TABLE with 6 default partitions - Schema versioning ready for ALTER TABLE support Testing: - Unit tests updated with filer address parameter - Integration tests for DDL operations - Error handling for connection failures Next Phase: SELECT query execution with Parquet scanning * fix: Resolve High Priority TODOs - Real MQ Broker Integration ✅ COMPLETED HIGH PRIORITY TODOs: 🔧 **Real FilerClient Integration** (engine.go:131) - Implemented GetFilerClient() method in BrokerClient - Added filerClientImpl with full FilerClient interface compliance - Added AdjustedUrl() and GetDataCenter() methods - Real filerClient connection replaces nil fallback 🔧 **Partition Discovery via MQ Broker** (hybrid_message_scanner.go:116) - Added ListTopicPartitions() method using topic configuration - Implemented discoverTopicPartitions() in HybridMessageScanner - Reads actual partition count from BrokerPartitionAssignments - Generates proper partition ranges based on topic.PartitionCount 📋 **Technical Fixes:** - Fixed compilation errors with undefined variables - Proper error handling with filerClientErr variable - Corrected ConfigureTopicResponse field usage (BrokerPartitionAssignments vs PartitionCount) - Complete FilerClient interface implementation 🎯 **Impact:** - SQL engine now connects to real MQ broker infrastructure - Actual topic partition discovery instead of hardcoded defaults - Production-ready broker integration with graceful fallbacks - Maintains backward compatibility with sample data when broker unavailable ✅ All tests passing - High priority TODO resolution complete! Next: Schema-aware message parsing and time filter optimization. * feat: Time Filter Extraction - Complete Performance Optimization ✅ FOURTH HIGH PRIORITY TODO COMPLETED! ⏰ **Time Filter Extraction & Push-Down Optimization** (engine.go:198-199) - Replaced hardcoded StartTimeNs=0, StopTimeNs=0 with intelligent extraction - Added extractTimeFilters() with recursive WHERE clause analysis - Smart time column detection (\_timestamp_ns, created_at, timestamp, etc.) - Comprehensive time value parsing (nanoseconds, ISO dates, datetime formats) - Operator reversal handling (column op value vs value op column) 🧠 **Intelligent WHERE Clause Processing:** - AND expressions: Combine time bounds (intersection) ✅ - OR expressions: Skip extraction (safety) ✅ - Parentheses: Recursive unwrapping ✅ - Comparison operators: >, >=, <, <=, = ✅ - Multiple time formats: nanoseconds, RFC3339, date-only, datetime ✅ 🚀 **Performance Impact:** - Push-down filtering to hybrid scanner level - Reduced data scanning at source (live logs + Parquet files) - Time-based partition pruning potential - Significant performance gains for time-series queries 📊 **Comprehensive Testing (21 tests passing):** - ✅ Time filter extraction (6 test scenarios) - ✅ Time column recognition (case-insensitive) - ✅ Time value parsing (5 formats) - ✅ Full integration with SELECT queries - ✅ Backward compatibility maintained 💡 **Real-World Query Examples:** Before: Scans ALL data, filters in memory SELECT * FROM events WHERE \_timestamp_ns > 1672531200000000000; After: Scans ONLY relevant time range at source level → StartTimeNs=1672531200000000000, StopTimeNs=0 → Massive performance improvement for large datasets! 🎯 **Production Ready Features:** - Multiple time column formats supported - Graceful fallbacks for invalid dates - OR clause safety (avoids incorrect optimization) - Comprehensive error handling **ALL MEDIUM PRIORITY TODOs NOW READY FOR NEXT PHASEtest ./weed/query/engine/ -v* 🎉 * feat: Extended WHERE Operators - Complete Advanced Filtering ✅ **EXTENDED WHERE OPERATORS IMPLEMENTEDtest ./weed/query/engine/ -v | grep -E PASS * feat: Enhanced SQL CLI Experience ✅ COMPLETE ENHANCED CLI IMPLEMENTATION: 🚀 **Multiple Execution Modes:** - Interactive shell with enhanced prompts and context - Single query execution: --query 'SQL' --output format - Batch file processing: --file queries.sql --output csv - Database context switching: --database dbname 📊 **Multi-Format Output:** - Table format (ASCII) - default for interactive - JSON format - structured data for programmatic use - CSV format - spreadsheet-friendly output - Smart auto-detection based on execution mode ⚙️ **Enhanced Interactive Shell:** - Database context switching: USE database_name; - Output format switching: \format table|json|csv - Command history tracking (basic implementation) - Enhanced help with WHERE operator examples - Contextual prompts: seaweedfs:dbname> 🛠️ **Production Features:** - Comprehensive error handling (JSON + user-friendly) - Query execution timing and performance metrics - 30-second timeout protection with graceful handling - Real MQ integration with hybrid data scanning 📖 **Complete CLI Interface:** - Full flag support: --server, --interactive, --file, --output, --database, --query - Auto-detection of execution mode and output format - Structured help system with practical examples - Batch processing with multi-query file support 💡 **Advanced WHERE Integration:** All extended operators (<=, >=, !=, LIKE, IN) fully supported across all execution modes and output formats. 🎯 **Usage Examples:** - weed sql --interactive - weed sql --query 'SHOW DATABASES' --output json - weed sql --file queries.sql --output csv - weed sql --database analytics --interactive Enhanced CLI experience complete - production ready! 🚀 * Delete test_utils_test.go * fmt * integer conversion * show databases works * show tables works * Update describe.go * actual column types * Update .gitignore * scan topic messages * remove emoji * support aggregation functions * column name case insensitive, better auto column names * fmt * fix reading system fields * use parquet statistics for optimization * remove emoji * parquet file generate stats * scan all files * parquet file generation remember the sources also * fmt * sql * truncate topic * combine parquet results with live logs * explain * explain the execution plan * add tests * improve tests * skip * use mock for testing * add tests * refactor * fix after refactoring * detailed logs during explain. Fix bugs on reading live logs. * fix decoding data * save source buffer index start for log files * process buffer from brokers * filter out already flushed messages * dedup with buffer start index * explain with broker buffer * the parquet file should also remember the first buffer_start attribute from the sources * parquet file can query messages in broker memory, if log files do not exist * buffer start stored as 8 bytes * add jdbc * add postgres protocol * Revert "add jdbc" This reverts commit |
||
---|---|---|
.. | ||
DESIGN.md | ||
protocol.go | ||
README.md | ||
server.go |
PostgreSQL Wire Protocol Package
This package implements PostgreSQL wire protocol support for SeaweedFS, enabling universal compatibility with PostgreSQL clients, tools, and applications.
Package Structure
weed/server/postgres/
├── README.md # This documentation
├── server.go # Main PostgreSQL server implementation
├── protocol.go # Wire protocol message handlers with MQ integration
├── DESIGN.md # Architecture and design documentation
└── IMPLEMENTATION.md # Complete implementation guide
Core Components
server.go
- PostgreSQLServer: Main server structure with connection management
- PostgreSQLSession: Individual client session handling
- PostgreSQLServerConfig: Server configuration options
- Authentication System: Trust, password, and MD5 authentication
- TLS Support: Encrypted connections with custom certificates
- Connection Pooling: Resource management and cleanup
protocol.go
- Wire Protocol Implementation: Full PostgreSQL 3.0 protocol support
- Message Handlers: Startup, query, parse/bind/execute sequences
- Response Generation: Row descriptions, data rows, command completion
- Data Type Mapping: SeaweedFS to PostgreSQL type conversion
- SQL Parser: Uses PostgreSQL-native parser for full dialect compatibility
- Error Handling: PostgreSQL-compliant error responses
- MQ Integration: Direct integration with SeaweedFS SQL engine for real topic data
- System Query Support: Essential PostgreSQL system queries (version, current_user, etc.)
- Database Context: Session-based database switching with USE commands
Key Features
Real MQ Topic Integration
The PostgreSQL server now directly integrates with SeaweedFS Message Queue topics, providing:
- Live Topic Discovery: Automatically discovers MQ namespaces and topics from the filer
- Real Schema Information: Reads actual topic schemas from broker configuration
- Actual Data Access: Queries real MQ data stored in Parquet and log files
- Dynamic Updates: Reflects topic additions and schema changes automatically
- Consistent SQL Engine: Uses the same SQL engine as
weed sql
command
Database Context Management
- Session Isolation: Each PostgreSQL connection has its own database context
- USE Command Support: Switch between namespaces using standard
USE database
syntax - Auto-Discovery: Topics are discovered and registered on first access
- Schema Caching: Efficient caching of topic schemas and metadata
Usage
Import the Package
import "github.com/seaweedfs/seaweedfs/weed/server/postgres"
Create and Start Server
config := &postgres.PostgreSQLServerConfig{
Host: "localhost",
Port: 5432,
AuthMethod: postgres.AuthMD5,
Users: map[string]string{"admin": "secret"},
Database: "default",
MaxConns: 100,
IdleTimeout: time.Hour,
}
server, err := postgres.NewPostgreSQLServer(config, "localhost:9333")
if err != nil {
return err
}
err = server.Start()
if err != nil {
return err
}
// Server is now accepting PostgreSQL connections
Authentication Methods
The package supports three authentication methods:
Trust Authentication
AuthMethod: postgres.AuthTrust
- No password required
- Suitable for development/testing
- Not recommended for production
Password Authentication
AuthMethod: postgres.AuthPassword,
Users: map[string]string{"user": "password"}
- Clear text password transmission
- Simple but less secure
- Requires TLS for production use
MD5 Authentication
AuthMethod: postgres.AuthMD5,
Users: map[string]string{"user": "password"}
- Secure hashed authentication with salt
- Recommended for production
- Compatible with all PostgreSQL clients
TLS Configuration
Enable TLS encryption for secure connections:
cert, err := tls.LoadX509KeyPair("server.crt", "server.key")
if err != nil {
return err
}
config.TLSConfig = &tls.Config{
Certificates: []tls.Certificate{cert},
}
Client Compatibility
This implementation is compatible with:
Command Line Tools
psql
- PostgreSQL command line clientpgcli
- Enhanced command line with auto-completion- Database IDEs (DataGrip, DBeaver)
Programming Languages
- Python: psycopg2, asyncpg
- Java: PostgreSQL JDBC driver
- JavaScript: pg (node-postgres)
- Go: lib/pq, pgx
- .NET: Npgsql
- PHP: pdo_pgsql
- Ruby: pg gem
BI Tools
- Tableau (native PostgreSQL connector)
- Power BI (PostgreSQL data source)
- Grafana (PostgreSQL plugin)
- Apache Superset
Supported SQL Operations
Data Queries
SELECT * FROM topic_name;
SELECT id, message FROM topic_name WHERE condition;
SELECT COUNT(*) FROM topic_name;
SELECT MIN(id), MAX(id), AVG(amount) FROM topic_name;
Schema Information
SHOW DATABASES;
SHOW TABLES;
DESCRIBE topic_name;
DESC topic_name;
System Information
SELECT version();
SELECT current_database();
SELECT current_user;
System Columns
SELECT id, message, _timestamp_ns, _key, _source FROM topic_name;
Configuration Options
Server Configuration
- Host/Port: Server binding address and port
- Authentication: Method and user credentials
- Database: Default database/namespace name
- Connections: Maximum concurrent connections
- Timeouts: Idle connection timeout
- TLS: Certificate and encryption settings
Performance Tuning
- Connection Limits: Prevent resource exhaustion
- Idle Timeout: Automatic cleanup of unused connections
- Memory Management: Efficient session handling
- Query Streaming: Large result set support
Error Handling
The package provides PostgreSQL-compliant error responses:
- Connection Errors: Authentication failures, network issues
- SQL Errors: Invalid syntax, missing tables
- Resource Errors: Connection limits, timeouts
- Security Errors: Permission denied, invalid credentials
Development and Testing
Unit Tests
Run PostgreSQL package tests:
go test ./weed/server/postgres
Integration Testing
Use the provided Python test client:
python postgres-examples/test_client.py --host localhost --port 5432
Manual Testing
Connect with psql:
psql -h localhost -p 5432 -U seaweedfs -d default
Documentation
- DESIGN.md: Complete architecture and design overview
- IMPLEMENTATION.md: Detailed implementation guide
- postgres-examples/: Client examples and test scripts
- Command Documentation:
weed db -help
Security Considerations
Production Deployment
- Use MD5 or stronger authentication
- Enable TLS encryption
- Configure appropriate connection limits
- Monitor for suspicious activity
- Use strong passwords
- Implement proper firewall rules
Access Control
- Create dedicated read-only users
- Use principle of least privilege
- Monitor connection patterns
- Log authentication attempts
Architecture Notes
SQL Parser Dialect Considerations
✅ POSTGRESQL ONLY: SeaweedFS SQL engine exclusively supports PostgreSQL syntax:
- ✅ Core Engine:
engine.go
uses custom PostgreSQL parser for proper dialect support - PostgreSQL Server: Uses PostgreSQL parser for optimal wire protocol compatibility
- Parser: Custom lightweight PostgreSQL parser for full PostgreSQL compatibility
- Support Status: Only PostgreSQL syntax is supported - MySQL parsing has been removed
Key Benefits of PostgreSQL Parser:
- Native Dialect Support: Correctly handles PostgreSQL-specific syntax and semantics
- System Catalog Compatibility: Supports
pg_catalog
,information_schema
queries - Operator Compatibility: Handles
||
string concatenation, PostgreSQL-specific operators - Type System Alignment: Better PostgreSQL type inference and coercion
- Reduced Translation Overhead: Eliminates need for dialect translation layer
PostgreSQL Syntax Support:
- Identifier Quoting: Uses PostgreSQL double quotes (
"
) for identifiers - String Concatenation: Supports PostgreSQL
||
operator - System Functions: Full support for PostgreSQL system catalogs (
pg_catalog
) and functions - Standard Compliance: Follows PostgreSQL SQL standard and dialect
Implementation Features:
- Native PostgreSQL query processing in
protocol.go
- System query support (
SELECT version()
,BEGIN
, etc.) - Type mapping between PostgreSQL and SeaweedFS schema types
- Error code mapping to PostgreSQL standards
- Comprehensive PostgreSQL wire protocol support
This package provides enterprise-grade PostgreSQL compatibility, enabling seamless integration of SeaweedFS with the entire PostgreSQL ecosystem.