-
Notifications
You must be signed in to change notification settings - Fork 1.2k
meta/sql: implement sql batch clone #6656
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
248644e to
3fdb981
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6656 +/- ##
==========================================
- Coverage 51.45% 42.20% -9.25%
==========================================
Files 169 168 -1
Lines 52242 53297 +1055
==========================================
- Hits 26882 22496 -4386
- Misses 22336 28004 +5668
+ Partials 3024 2797 -227 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a SQL batch clone optimization for JuiceFS metadata operations. The primary goal is to improve the performance of cloning file operations by batching multiple non-directory entries into a single database transaction, reducing the number of database round-trips.
Changes:
- Introduces
doBatchClonemethod for SQL backend that processes multiple file/symlink entries in one transaction - Modifies
cloneEntryin base.go to use batch cloning for non-directory entries with fallback to sequential cloning - Adds stub implementations returning
ENOTSUPfor Redis and TKV backends to maintain compatibility
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| pkg/meta/base.go | Adds BatchClone public API and refactors cloneEntry to separate directories from files, calling BatchClone for files |
| pkg/meta/sql.go | Implements doBatchClone with optimized batch INSERT/UPDATE operations and aggregated chunk_ref updates |
| pkg/meta/redis.go | Adds stub doBatchClone returning ENOTSUP for Redis backend |
| pkg/meta/tkv.go | Adds stub doBatchClone returning ENOTSUP for TKV backend |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pkg/meta/base.go
Outdated
| wg.Wait() | ||
| if len(errCh) > 0 { | ||
| eno = <-errCh | ||
| goto END | ||
| } | ||
|
|
||
| // Batch clone all non-directory entries from this batch | ||
| if len(nonDirEntries) > 0 { | ||
| if eno = m.BatchClone(ctx, srcIno, ino, nonDirEntries, cmode, cumask, count); eno == syscall.ENOTSUP { | ||
| // Fallback: clone each file concurrently (same pattern as directories) | ||
| for _, e := range nonDirEntries { | ||
| select { | ||
| case concurrent <- struct{}{}: | ||
| wg.Add(1) | ||
| go func(entry *Entry) { | ||
| defer wg.Done() | ||
| childEno := cloneChild(entry) | ||
| if childEno != 0 { | ||
| errCh <- childEno | ||
| } | ||
| <-concurrent | ||
| }(e) | ||
| default: | ||
| // Synchronous fallback when channel is full | ||
| if childEno := cloneChild(e); childEno != 0 { | ||
| eno = childEno | ||
| goto END | ||
| } | ||
| } | ||
| } | ||
| // Reset error after spawning goroutines - errors will be reported via errCh | ||
| eno = 0 | ||
| } else if eno != 0 { | ||
| goto END | ||
| } | ||
| } | ||
|
|
||
| offset += len(batchEntries) | ||
| if ctx.Canceled() { | ||
| eno = syscall.EINTR | ||
| break | ||
| goto END | ||
| } | ||
| } | ||
|
|
||
| END: | ||
| wg.Wait() |
Copilot
AI
Feb 2, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential WaitGroup race condition in the fallback path. After spawning goroutines for nonDirEntries (lines 3295-3314), wg.Wait() is called again at line 3330. However, if the BatchClone succeeds (eno == 0), no goroutines were spawned for nonDirEntries, so the second wg.Wait() would only wait for the directory goroutines which were already waited for at line 3285. While this is not necessarily incorrect, it creates an unnecessary wait. More critically, if errors occur between lines 3293-3319, the code jumps to END without waiting for the spawned fallback goroutines, which could lead to goroutines writing to errCh after the function returns.
|
Could you provide the performance test report and the performance comparison before and after optimization ? |
Yes, will do so today |
|
@zhijian-pro @jiefenghuang @vyalamar This PR is still a tiny bit rough, but I think it's enough to get started conversing about the solution. In terms of performance here's what I observed with the directory-level batching approach. BaselineThe baseline in my environment. This is with a remote postgres metadata engine. Optimized versionWith the directory-level batching, I see a significant improvement though not quite at the level, which I think is theoretically possible. Optimized version + increased concurrencyTo improve this somewhat, I added a concurrency flag. Presently, this is fixed at 4. With the flag it can be customized as needed. This shows a noticeable improvement still. Thoughts? If we wanted to go the cross-dir batching route, I think we could achieve better performance. But let me know your opinions or if you see a way to make this current approach more performant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pkg/meta/base.go
Outdated
| // Check for errors from concurrent subdir processing (non-blocking) | ||
| select { | ||
| case e := <-errCh: | ||
| eno = e | ||
| goto END |
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential deadlock: The error channel errCh (created earlier in the function at line 3224 with capacity cap(concurrent)) could cause deadlocks. If multiple concurrent goroutines encounter errors and try to send to errCh while the main goroutine has already jumped to END with a different error (e.g., from line 3270, 3281, 3314, or 3322), the goroutines could block on sending to the full channel. The wg.Wait() at line 3334 would then wait indefinitely for blocked goroutines. Consider either: (1) making errCh buffered with a larger capacity, or (2) using context cancellation to signal all goroutines to stop when an error occurs, or (3) having a separate goroutine drain errCh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eakman-datadog Perhaps using errorgroup is simpler and clearer ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot Is using errorgroup a good practice here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored to use errorgroup
pkg/meta/sql.go
Outdated
| nodesIns = append(nodesIns, &info.dstNode) | ||
| edgesIns = append(edgesIns, &edge{ | ||
| Parent: dstParent, | ||
| Name: []byte(info.entry.Name), |
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary type conversion: info.entry.Name is already []byte (as defined in Entry struct at pkg/meta/interface.go:308), so the conversion []byte(info.entry.Name) creates an unnecessary copy of the byte slice. This is a minor performance issue but not a bug. Consider using info.entry.Name directly to avoid the allocation.
| Name: []byte(info.entry.Name), | |
| Name: info.entry.Name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
pkg/meta/sql.go
Outdated
| chunksIns := make([]interface{}, 0) | ||
| symlinksIns := make([]interface{}, 0) | ||
|
|
||
| // CRITICAL: Aggregate chunk_ref updates (addresses TODO at line 5050) |
Copilot
AI
Feb 4, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated comment reference: The comment mentions "addresses TODO at line 5050" but this refers to the old doCloneEntry function's TODO comment, not a line in the current function. While the implementation does address that TODO by aggregating chunk_ref updates, the comment should either reference the doCloneEntry function explicitly (e.g., "addresses TODO in doCloneEntry at line 5050") or be reworded to describe what's being done without the line reference, for better maintainability.
| // CRITICAL: Aggregate chunk_ref updates (addresses TODO at line 5050) | |
| // CRITICAL: Aggregate chunk_ref updates per chunk to minimize database operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
cmd/clone.go
Outdated
| Usage: "preserve the uid, gid, and mode of the file. (This is forced on Windows)", | ||
| }, | ||
| &cli.IntFlag{ | ||
| Name: "concurrency", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
threads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
Adding the
Theoretically, there should be better performance, especially for scenarios with high network latency in metadata engine, large directories, large files, and multiple extended attributes. |
| doDeleteSlice(id uint64, size uint32) error | ||
|
|
||
| doCloneEntry(ctx Context, srcIno Ino, parent Ino, name string, ino Ino, attr *Attr, cmode uint8, cumask uint16, top bool) syscall.Errno | ||
| doBatchClone(ctx Context, srcParent Ino, dstParent Ino, entries []*Entry, cmode uint8, cumask uint16, length *int64, space *int64, inodes *int64, userGroupQuotas *[]userGroupQuotaDelta) syscall.Errno |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mark: Combine the statistics-related parameters into a single structure in new PR. Do the same for doBatchUnlink.
pkg/meta/base.go
Outdated
| if eno = batchEno; batchEno != 0 { | ||
| break | ||
| if batchEno != 0 { | ||
| return batchEno |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait for child
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
pkg/meta/base.go
Outdated
| // Batch clone files immediately (don't wait for subdirs to finish) | ||
| if len(nonDirEntries) > 0 { | ||
| if eno = m.BatchClone(ctx, srcIno, ino, nonDirEntries, cmode, cumask, count); eno == syscall.ENOTSUP { | ||
| // Fallback: clone each file concurrently (same pattern as directories) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mark: move duplicate code into a func in new pr
pkg/meta/sql.go
Outdated
| }) | ||
|
|
||
| // Process in batches (500 per query to avoid size limits) | ||
| batchSize := 500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
txnBatchNum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/meta/base.go
Outdated
| // Synchronous fallback when channel is full | ||
| if childEno := cloneChild(e); childEno != 0 { | ||
| eno = childEno | ||
| goto END |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mark: simplify code without goto in new pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did this indirectly
3741b9b to
dc71e19
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cmd/clone.go
Outdated
| }, | ||
| &cli.IntFlag{ | ||
| Name: "threads", | ||
| Value: 4, |
Copilot
AI
Feb 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default concurrency value should be defined as a constant. The value 4 is hardcoded in multiple places (cmd/clone.go:58, pkg/vfs/internal.go:338, pkg/fs/fs.go:1054) and used as the default concurrency for clone operations. Consider defining this as a constant (e.g., CLONE_DEFAULT_CONCURRENCY = 4) in pkg/meta/utils.go alongside other clone-related constants like CLONE_MODE_PRESERVE_ATTR, to improve maintainability and ensure consistency across the codebase.
| Value: 4, | |
| Value: meta.CLONE_DEFAULT_CONCURRENCY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| defer handler.Close() | ||
|
|
||
| var wg sync.WaitGroup | ||
| var g errgroup.Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use the setlimit control of the error group to manage concurrency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Good call!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I think it's better to keep the concurrent channel. The reason for this is that currently, with the current approach, each call to cloneEntry waits for all of its child entries to complete, before returning i.e.:
// Wait for all goroutines and get first error
if err := g.Wait(); err != nil && eno == 0 {
eno = err.(syscall.Errno)
}If we were to pass in a global errorgroup, shared by all goroutines, the above section would take on a slightly different meaning. It would wait until all goroutines in the errorgroup have completed. I.e. not just children. Could be siblings, cousins, etc.
That might be okay, but I think it makes more sense to keep it the way it is.
| return nil | ||
| }) | ||
| default: | ||
| // Synchronous fallback when concurrency limit reached |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error group also has a trygo function, which you can also take a look at
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed. See #6656 (comment)
4e76feb to
c407cf1
Compare
|
@zhijian-pro @jiefenghuang don't merge. I seem to have introduced a bug somewhere. Need some time to debug. |
|
@zhijian-pro @jiefenghuang This should be ready for a re-review now. FYI, while I was cleaning this up and doing some testing, I noticed a bug with the batch clone implementation and sqlite. With sqlite, at the beginning of the In To resolve this, it's been changed to allocate the inodes prior to beginning the main transaction (see bb3a338). |
Incoroporate newly added doBatchClone engine function and rework clone looping mechanics.
1. consolidate cloneInfo loops 2. remove directory related code for doBatchClone as we only deal with non-directories 3. undo changes to makefile 4. rename concurrency CLI argument to threads 5. refactor to use errorgroup instead of WaitGroup to eliminate deadlock potential and simplify code 6. removes unneeded memory management line 7. properly track subdirectory info 8. remove unneeded type casting 9. use getTxnBatchNum 10. Fix outdated comment
This reverts commit b1e36a8.
This reverts commit c407cf1.
640df63 to
8aee855
Compare
ref #6546
Unit Tests
SQLite
SQLite unit tests:
MySQL
MySQL Unit Test Output:
PostgreSQL
PostgreSQL Output: