mirror of
https://github.com/chrislusf/seaweedfs
synced 2025-09-09 21:02:46 +02:00
OPTION A COMPLETE: Full production integration of ML optimization system ## Major Integration Components: ### 1. Command Line Interface - Add ML optimization flags to 'weed mount' command: * -ml.enabled: Enable/disable ML optimizations * -ml.prefetchWorkers: Configure concurrent prefetch workers (default: 8) * -ml.confidenceThreshold: Set ML confidence threshold (default: 0.6) * -ml.maxPrefetchAhead: Max chunks to prefetch ahead (default: 8) * -ml.batchSize: Batch size for prefetch operations (default: 3) - Updated command help text with ML Optimization section and usage examples - Complete flag parsing and validation pipeline ### 2. Core WFS Integration - Add MLIntegrationManager to WFS struct with proper lifecycle management - Initialize ML optimization based on mount flags with custom configuration - Integrate ML system shutdown with graceful cleanup on mount termination - Memory-safe initialization with proper error handling ### 3. FUSE Operation Hooks - **File Open (wfs.Open)**: Apply ML-specific optimizations (FOPEN_KEEP_CACHE, direct I/O) - **File Read (wfs.Read)**: Record access patterns for ML prefetch decision making - **File Close (wfs.Release)**: Update ML file tracking and cleanup resources - **Get Attributes (wfs.GetAttr)**: Apply ML-aware attribute cache timeouts - All hooks properly guarded with nil checks and enabled status validation ### 4. Configuration Management - Mount options propagated through Option struct to ML system - NewMLIntegrationManagerWithConfig for runtime configuration - Default fallbacks and validation for all ML parameters - Seamless integration with existing mount option processing ## Production Features: ✅ **Zero-Impact Design**: ML optimizations only activate when explicitly enabled ✅ **Backward Compatibility**: All existing mount functionality preserved ✅ **Resource Management**: Proper initialization, shutdown, and cleanup ✅ **Error Handling**: Graceful degradation if ML components fail ✅ **Performance Monitoring**: Integration points for metrics and debugging ✅ **Configuration Flexibility**: Runtime tunable parameters via mount flags ## Testing Verification: - ✅ Successful compilation of entire codebase - ✅ Mount command properly shows ML flags in help text - ✅ Flag parsing and validation working correctly - ✅ ML optimization system initializes when enabled - ✅ FUSE operations integrate ML hooks without breaking existing functionality ## Usage Examples: Basic ML optimization: backers.md bin build cmd CODE_OF_CONDUCT.md DESIGN.md docker examples filerldb2 go.mod go.sum k8s LICENSE Makefile ML_OPTIMIZATION_PLAN.md note other random README.md s3tests_boto3 scripts seaweedfs-rdma-sidecar snap SSE-C_IMPLEMENTATION.md telemetry test test-volume-data unmaintained util venv weed chrislu console Aug 27 13:07 chrislu ttys004 Aug 27 13:11 chrislu ttys012 Aug 28 14:00 Filesystem 512-blocks Used Available Capacity iused ifree %iused Mounted on /dev/disk3s1s1 1942700360 22000776 332038696 7% 425955 1660193480 0% / devfs 494 494 0 100% 856 0 100% /dev /dev/disk3s6 1942700360 6291632 332038696 2% 3 1660193480 0% /System/Volumes/VM /dev/disk3s2 1942700360 13899920 332038696 5% 1270 1660193480 0% /System/Volumes/Preboot /dev/disk3s4 1942700360 4440 332038696 1% 54 1660193480 0% /System/Volumes/Update /dev/disk1s2 1024000 12328 983744 2% 1 4918720 0% /System/Volumes/xarts /dev/disk1s1 1024000 11064 983744 2% 32 4918720 0% /System/Volumes/iSCPreboot /dev/disk1s3 1024000 7144 983744 1% 92 4918720 0% /System/Volumes/Hardware /dev/disk3s5 1942700360 1566013608 332038696 83% 11900819 1660193480 1% /System/Volumes/Data map auto_home 0 0 0 100% 0 0 - /System/Volumes/Data/home Filesystem 512-blocks Used Available Capacity iused ifree %iused Mounted on /dev/disk3s1s1 1942700360 22000776 332038696 7% 425955 1660193480 0% / devfs 494 494 0 100% 856 0 100% /dev /dev/disk3s6 1942700360 6291632 332038696 2% 3 1660193480 0% /System/Volumes/VM /dev/disk3s2 1942700360 13899920 332038696 5% 1270 1660193480 0% /System/Volumes/Preboot /dev/disk3s4 1942700360 4440 332038696 1% 54 1660193480 0% /System/Volumes/Update /dev/disk1s2 1024000 12328 983744 2% 1 4918720 0% /System/Volumes/xarts /dev/disk1s1 1024000 11064 983744 2% 32 4918720 0% /System/Volumes/iSCPreboot /dev/disk1s3 1024000 7144 983744 1% 92 4918720 0% /System/Volumes/Hardware /dev/disk3s5 1942700360 1566013608 332038696 83% 11900819 1660193480 1% /System/Volumes/Data map auto_home 0 0 0 100% 0 0 - /System/Volumes/Data/home /Users/chrislu/go/src/github.com/seaweedfs/seaweedfs HQ-KT6TWPKFQD /Users/chrislu/go/src/github.com/seaweedfs/seaweedfs Custom ML configuration: backers.md bin build cmd CODE_OF_CONDUCT.md DESIGN.md docker examples filerldb2 go.mod go.sum k8s LICENSE Makefile ML_OPTIMIZATION_PLAN.md note other random README.md s3tests_boto3 scripts seaweedfs-rdma-sidecar snap SSE-C_IMPLEMENTATION.md telemetry test test-volume-data unmaintained util venv weed /Users/chrislu/go/src/github.com/seaweedfs/seaweedfs ## Architecture Impact: - Clean separation between core FUSE and ML optimization layers - Modular design allows easy extension and maintenance - Production-ready with comprehensive error handling and resource management - Foundation established for advanced ML features (Phase 4) This completes Option A: Production Integration, providing a fully functional ML-aware FUSE mount system ready for real-world ML workloads.
154 lines
7.7 KiB
Go
154 lines
7.7 KiB
Go
package command
|
|
|
|
import (
|
|
"os"
|
|
"time"
|
|
)
|
|
|
|
type MountOptions struct {
|
|
filer *string
|
|
filerMountRootPath *string
|
|
dir *string
|
|
dirAutoCreate *bool
|
|
collection *string
|
|
collectionQuota *int
|
|
replication *string
|
|
diskType *string
|
|
ttlSec *int
|
|
chunkSizeLimitMB *int
|
|
concurrentWriters *int
|
|
cacheMetaTtlSec *int
|
|
cacheDirForRead *string
|
|
cacheDirForWrite *string
|
|
cacheSizeMBForRead *int64
|
|
dataCenter *string
|
|
allowOthers *bool
|
|
umaskString *string
|
|
nonempty *bool
|
|
volumeServerAccess *string
|
|
uidMap *string
|
|
gidMap *string
|
|
readOnly *bool
|
|
debug *bool
|
|
debugPort *int
|
|
localSocket *string
|
|
disableXAttr *bool
|
|
extraOptions []string
|
|
fuseCommandPid int
|
|
|
|
// RDMA acceleration options
|
|
rdmaEnabled *bool
|
|
rdmaSidecarAddr *string
|
|
rdmaFallback *bool
|
|
rdmaReadOnly *bool
|
|
rdmaMaxConcurrent *int
|
|
rdmaTimeoutMs *int
|
|
|
|
// ML optimization options
|
|
mlOptimizationEnabled *bool
|
|
mlPrefetchWorkers *int
|
|
mlConfidenceThreshold *float64
|
|
mlMaxPrefetchAhead *int
|
|
mlBatchSize *int
|
|
}
|
|
|
|
var (
|
|
mountOptions MountOptions
|
|
mountCpuProfile *string
|
|
mountMemProfile *string
|
|
mountReadRetryTime *time.Duration
|
|
)
|
|
|
|
func init() {
|
|
cmdMount.Run = runMount // break init cycle
|
|
mountOptions.filer = cmdMount.Flag.String("filer", "localhost:8888", "comma-separated weed filer location")
|
|
mountOptions.filerMountRootPath = cmdMount.Flag.String("filer.path", "/", "mount this remote path from filer server")
|
|
mountOptions.dir = cmdMount.Flag.String("dir", ".", "mount weed filer to this directory")
|
|
mountOptions.dirAutoCreate = cmdMount.Flag.Bool("dirAutoCreate", false, "auto create the directory to mount to")
|
|
mountOptions.collection = cmdMount.Flag.String("collection", "", "collection to create the files")
|
|
mountOptions.collectionQuota = cmdMount.Flag.Int("collectionQuotaMB", 0, "quota for the collection")
|
|
mountOptions.replication = cmdMount.Flag.String("replication", "", "replication(e.g. 000, 001) to create to files. If empty, let filer decide.")
|
|
mountOptions.diskType = cmdMount.Flag.String("disk", "", "[hdd|ssd|<tag>] hard drive or solid state drive or any tag")
|
|
mountOptions.ttlSec = cmdMount.Flag.Int("ttl", 0, "file ttl in seconds")
|
|
mountOptions.chunkSizeLimitMB = cmdMount.Flag.Int("chunkSizeLimitMB", 2, "local write buffer size, also chunk large files")
|
|
mountOptions.concurrentWriters = cmdMount.Flag.Int("concurrentWriters", 32, "limit concurrent goroutine writers")
|
|
mountOptions.cacheDirForRead = cmdMount.Flag.String("cacheDir", os.TempDir(), "local cache directory for file chunks and meta data")
|
|
mountOptions.cacheSizeMBForRead = cmdMount.Flag.Int64("cacheCapacityMB", 128, "file chunk read cache capacity in MB")
|
|
mountOptions.cacheDirForWrite = cmdMount.Flag.String("cacheDirWrite", "", "buffer writes mostly for large files")
|
|
mountOptions.cacheMetaTtlSec = cmdMount.Flag.Int("cacheMetaTtlSec", 60, "metadata cache validity seconds")
|
|
mountOptions.dataCenter = cmdMount.Flag.String("dataCenter", "", "prefer to write to the data center")
|
|
mountOptions.allowOthers = cmdMount.Flag.Bool("allowOthers", true, "allows other users to access the file system")
|
|
mountOptions.umaskString = cmdMount.Flag.String("umask", "022", "octal umask, e.g., 022, 0111")
|
|
mountOptions.nonempty = cmdMount.Flag.Bool("nonempty", false, "allows the mounting over a non-empty directory")
|
|
mountOptions.volumeServerAccess = cmdMount.Flag.String("volumeServerAccess", "direct", "access volume servers by [direct|publicUrl|filerProxy]")
|
|
mountOptions.uidMap = cmdMount.Flag.String("map.uid", "", "map local uid to uid on filer, comma-separated <local_uid>:<filer_uid>")
|
|
mountOptions.gidMap = cmdMount.Flag.String("map.gid", "", "map local gid to gid on filer, comma-separated <local_gid>:<filer_gid>")
|
|
mountOptions.readOnly = cmdMount.Flag.Bool("readOnly", false, "read only")
|
|
mountOptions.debug = cmdMount.Flag.Bool("debug", false, "serves runtime profiling data, e.g., http://localhost:<debug.port>/debug/pprof/goroutine?debug=2")
|
|
mountOptions.debugPort = cmdMount.Flag.Int("debug.port", 6061, "http port for debugging")
|
|
mountOptions.localSocket = cmdMount.Flag.String("localSocket", "", "default to /tmp/seaweedfs-mount-<mount_dir_hash>.sock")
|
|
mountOptions.disableXAttr = cmdMount.Flag.Bool("disableXAttr", false, "disable xattr")
|
|
mountOptions.fuseCommandPid = 0
|
|
|
|
// RDMA acceleration flags
|
|
mountOptions.rdmaEnabled = cmdMount.Flag.Bool("rdma.enabled", false, "enable RDMA acceleration for reads")
|
|
mountOptions.rdmaSidecarAddr = cmdMount.Flag.String("rdma.sidecar", "", "RDMA sidecar address (e.g., localhost:8081)")
|
|
mountOptions.rdmaFallback = cmdMount.Flag.Bool("rdma.fallback", true, "fallback to HTTP when RDMA fails")
|
|
mountOptions.rdmaReadOnly = cmdMount.Flag.Bool("rdma.readOnly", false, "use RDMA for reads only (writes use HTTP)")
|
|
mountOptions.rdmaMaxConcurrent = cmdMount.Flag.Int("rdma.maxConcurrent", 64, "max concurrent RDMA operations")
|
|
mountOptions.rdmaTimeoutMs = cmdMount.Flag.Int("rdma.timeoutMs", 5000, "RDMA operation timeout in milliseconds")
|
|
|
|
// ML optimization flags
|
|
mountOptions.mlOptimizationEnabled = cmdMount.Flag.Bool("ml.enabled", false, "enable ML-aware optimizations for machine learning workloads")
|
|
mountOptions.mlPrefetchWorkers = cmdMount.Flag.Int("ml.prefetchWorkers", 8, "number of prefetch worker threads for ML workloads")
|
|
mountOptions.mlConfidenceThreshold = cmdMount.Flag.Float64("ml.confidenceThreshold", 0.6, "minimum confidence threshold to trigger ML prefetch")
|
|
mountOptions.mlMaxPrefetchAhead = cmdMount.Flag.Int("ml.maxPrefetchAhead", 8, "maximum number of chunks to prefetch ahead")
|
|
mountOptions.mlBatchSize = cmdMount.Flag.Int("ml.batchSize", 3, "batch size for ML prefetch operations")
|
|
|
|
mountCpuProfile = cmdMount.Flag.String("cpuprofile", "", "cpu profile output file")
|
|
mountMemProfile = cmdMount.Flag.String("memprofile", "", "memory profile output file")
|
|
mountReadRetryTime = cmdMount.Flag.Duration("readRetryTime", 6*time.Second, "maximum read retry wait time")
|
|
}
|
|
|
|
var cmdMount = &Command{
|
|
UsageLine: "mount -filer=localhost:8888 -dir=/some/dir",
|
|
Short: "mount weed filer to a directory as file system in userspace(FUSE)",
|
|
Long: `mount weed filer to userspace.
|
|
|
|
Pre-requisites:
|
|
1) have SeaweedFS master and volume servers running
|
|
2) have a "weed filer" running
|
|
These 2 requirements can be achieved with one command "weed server -filer=true"
|
|
|
|
This uses github.com/seaweedfs/fuse, which enables writing FUSE file systems on
|
|
Linux, and OS X.
|
|
|
|
On OS X, it requires OSXFUSE (https://osxfuse.github.io/).
|
|
|
|
RDMA Acceleration:
|
|
For ultra-fast reads, enable RDMA acceleration with an RDMA sidecar:
|
|
weed mount -filer=localhost:8888 -dir=/mnt/seaweedfs \
|
|
-rdma.enabled=true -rdma.sidecar=localhost:8081
|
|
|
|
RDMA Options:
|
|
-rdma.enabled=false Enable RDMA acceleration for reads
|
|
-rdma.sidecar="" RDMA sidecar address (required if enabled)
|
|
-rdma.fallback=true Fallback to HTTP when RDMA fails
|
|
-rdma.readOnly=false Use RDMA for reads only (writes use HTTP)
|
|
-rdma.maxConcurrent=64 Max concurrent RDMA operations
|
|
-rdma.timeoutMs=5000 RDMA operation timeout in milliseconds
|
|
|
|
ML Optimization:
|
|
For machine learning workloads, enable intelligent prefetching and caching:
|
|
weed mount -filer=localhost:8888 -dir=/mnt/seaweedfs \
|
|
-ml.enabled=true
|
|
|
|
ML Options:
|
|
-ml.enabled=false Enable ML-aware optimizations
|
|
-ml.prefetchWorkers=8 Number of concurrent prefetch workers
|
|
-ml.confidenceThreshold=0.6 Minimum confidence to trigger ML prefetch
|
|
-ml.maxPrefetchAhead=8 Maximum chunks to prefetch ahead
|
|
-ml.batchSize=3 Batch size for prefetch operations
|
|
|
|
`,
|
|
}
|