Skip to content

Conversation

@MuneebUllahKhan222
Copy link
Contributor

Description:
This PR re-enables symlink following in filesystem traversal and introduces configurable controls for symlink behavior, including:

  • Follow symlinks: a boolean flag to enable or disable following symlinks.
  • Max symlink depth: a limit on how many times a symlink can be resolved to prevent infinite loops. Currently set to 40.

Symlink resolution uses Go’s os.Readlink, which resolves a single symlink per call. If the number of resolutions exceeds the configured max depth, resolveSymLink returns an error.

Additionally, directory scanning is now robust against symlink loops (e.g., A/linkToB -> B and B/linkToA -> A). We maintain a map (visitedSymlinkPaths) of already-resolved directory symlinks to prevent infinite recursion while keeping traversal predictable and stable

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

}
// If symlink resolves to directory scan that directory
ctx.Logger().V(5).Info("Resolved symlink is a directory", "path", resolvedPath)
err = s.scanDir(ctx, resolvedPath, chunksChan)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recursive scanDir creates unbounded nested worker pools

Medium Severity

When a symlink to a directory is found during WalkDir, scanDir is called recursively at line 268 from within the walk callback. Each recursive scanDir creates its own errgroup.Group with SetLimit(s.concurrency). The outer worker pool may still have goroutines running (spawned for files found before the symlink), while the inner pool also spawns up to s.concurrency goroutines. With nested symlinked directories, the effective concurrency becomes depth * s.concurrency, bypassing the user-configured limit and potentially exhausting system resources.

Additional Locations (1)

Fix in Cursor Fix in Web

if s.maxSymlinkDepth > 0 {
s.followSymlinks = true
}
s.visitedSymlinkPaths = make(map[string]struct{})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Visited symlink state leaks across independent ChunkUnit calls

Medium Severity

visitedSymlinkPaths is initialized once in Init and never cleared. markVisitedPath is called unconditionally at the start of every scanDir, including for non-symlink directories. When using the SourceUnitEnumChunker interface, where ChunkUnit is called per enumerated unit, the accumulated state from one unit's scan causes subsequent units to skip symlinks pointing to already-visited directories. This makes scan results depend on unit processing order and can silently miss content.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant