-
Notifications
You must be signed in to change notification settings - Fork 533
Repo manager with parsing logic #566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Caution Review failedThe pull request is closed. WalkthroughIntroduces a bare-repo + worktree RepoManager with multi-tenant lifecycle and eviction, integrates it into parsing (replacing tarball flows with clone/worktree paths), updates parsing/project method signatures (AsyncSession→Session, ints→strs, optional email), adds repository types/resources/exceptions, and switches token selection to secrets.choice. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Controller as ParsingController
participant RepoMgr as RepoManager
participant Git as Git (clone/fetch)
participant FS as FileSystem
participant Meta as MetadataStore
Controller->>RepoMgr: prepare_for_parsing(repo_name, ref, repo_url?, auth_token?, is_commit?, user_id)
RepoMgr->>RepoMgr: _evict_if_needed()/evict_stale_worktrees()
RepoMgr->>RepoMgr: ensure_bare_repo(repo_name, repo_url, token)
RepoMgr->>Git: clone/fetch bare repo (authenticated)
Git->>FS: store bare repo (.bare)
RepoMgr->>Meta: write/update repo .meta
RepoMgr->>RepoMgr: create_worktree(ref, unique_id)
RepoMgr->>Git: checkout worktree from bare repo -> worktree path
RepoMgr->>Meta: store worktree metadata
RepoMgr-->>Controller: return worktree path
Controller->>Controller: parse_directory(worktree_path, db=Session)
sequenceDiagram
autonumber
participant Runtime as PotpieRuntime
participant RepoRes as RepositoryResource
participant RepoMgr as RepoManager
participant Config as RuntimeConfig
Runtime->>Runtime: access repositories property
Runtime->>Config: read repos_base_path & limits
Runtime->>RepoRes: instantiate RepositoryResource (lazy)
RepoRes->>RepoMgr: prepare_for_parsing(repo_name, ref, repo_url?, auth_token?, is_commit?, user_id)
RepoMgr-->>RepoRes: worktree path
RepoRes-->>Runtime: return path
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
app/modules/projects/projects_service.py (1)
200-254: Guard againstbranch_name=Nonebefore branch fallback.With the signature now allowing
None, the branch fallback can accidentally match records wherebranch_name IS NULL(e.g., commit-only entries). If no branch name is available, return early instead of querying.✅ Suggested guard
if commit_id: # If commit_id is provided, only check by commit_id (no fallback to branch) # This ensures repo+commit_id maps to exactly one project project = query.filter(Project.commit_id == commit_id).first() if project: logger.info(f"Found project by commit_id: {project.id}") return project logger.info( f"No project found with commit_id={commit_id}; not falling back to branch lookup." ) return None + if not branch_name: + logger.info("No branch_name provided and commit_id is None; skipping branch lookup.") + return None + # Fall back to branch_name lookup only if commit_id was not provided project = query.filter(Project.branch_name == branch_name).first()app/modules/parsing/graph_construction/parsing_helper.py (1)
333-344: Avoid masking clone errors during cleanup.If
shutil.rmtreefails inside the exception handler, it can obscure the original clone error.🐛 Safer cleanup
- if os.path.exists(temp_clone_dir): - shutil.rmtree(temp_clone_dir) + if os.path.exists(temp_clone_dir): + shutil.rmtree(temp_clone_dir, ignore_errors=True) ... - if os.path.exists(temp_clone_dir): - shutil.rmtree(temp_clone_dir) + if os.path.exists(temp_clone_dir): + shutil.rmtree(temp_clone_dir, ignore_errors=True)app/modules/repo_manager/repo_manager.py (1)
482-522:du -sreturns blocks, not bytes; volume limits are broken.The function uses
["du", "-s", path]but returns the result as bytes, causing volume thresholds to be off by 512–1024x. A 100GB repository would be reported as ~200MB, triggering eviction prematurely and breaking the volume limit feature entirely. Use["du", "-sb", path]to get actual byte counts.Suggested fix
- result = subprocess.run( - ["du", "-s", path], + result = subprocess.run( + ["du", "-sb", path], capture_output=True, text=True, timeout=60, # 60 second timeout for large directories )
🤖 Fix all issues with AI agents
In `@app/modules/parsing/graph_construction/parsing_repomap.py`:
- Around line 617-620: The hidden-directory filter is mistakenly skipping the
repo root because rel_root == "." matches the startswith(".") check; modify the
os.walk loop in parsing_repomap.py so you first guard against the repo root (and
optionally "..") before applying the hidden-part filter: e.g., compute rel_root
= os.path.relpath(root, repo_dir) and only run the any(part.startswith(".") for
part in rel_root.split(os.sep)) check when rel_root not in (".", ".."); update
the loop that references repo_dir, rel_root and os.walk accordingly.
In `@app/modules/parsing/graph_construction/parsing_service.py`:
- Around line 163-180: The code currently always prepares a worktree using
repo_details.branch_name, ignoring repo_details.commit_id; update the
branch/worktree creation logic in parsing_service.py so prepare_for_parsing
respects commit_id: when repo_details.commit_id is present call
self.repo_manager.prepare_for_parsing(repo_details.repo_name,
ref=repo_details.commit_id) (otherwise use ref=repo_details.branch_name), then
create Repo(worktree) and set extracted_dir from that worktree to ensure the
correct revision is parsed.
In `@app/modules/repo_manager/repo_manager.py`:
- Around line 526-567: _in _update_bare_repo_metadata the recomputed volume from
_calculate_volume_bytes is being discarded because the code sets volume =
data.get("volume_bytes", 0) when volume is not None; change the logic so that if
_calculate_volume_bytes returns None you fall back to the existing metadata
value, i.e. call _calculate_volume_bytes(str(bare_repo_path)) into volume and
then if volume is None set volume = data.get("volume_bytes", 0), before
assigning data["volume_bytes"] = volume (refer to symbols
_update_bare_repo_metadata, bare_repo_path, _calculate_volume_bytes, and
data["volume_bytes"]).
- Around line 947-1009: The metadata deletion call incorrectly swaps
branch/commit values — the current self._delete_metadata_entry(repo_name, None
if ref else ref, ref if ref else None) always passes None for the branch when
ref is set; change the call in remove_worktree to pass the ref as the branch (or
correctly map branch vs commit) e.g. call self._delete_metadata_entry(repo_name,
ref, None) when ref represents a branch (or implement proper detection and pass
(repo_name, None, ref) when ref is a commit) so branch metadata is removed
correctly.
In `@potpie/resources/repository.py`:
- Around line 45-78: The async method is_available (and any other async methods
in this class that call RepoManager) currently invokes blocking RepoManager
methods directly; wrap those calls with asyncio.to_thread (or
loop.run_in_executor for older Python) and await the result so the event loop is
not blocked—e.g., obtain repo_manager via self._get_repo_manager(), then call
await asyncio.to_thread(repo_manager.is_repo_available, repo_name, branch,
commit_id, user_id) and handle exceptions the same way; ensure asyncio is
imported and apply this pattern to all methods that call RepoManager (e.g.,
is_available, clone/fetch/worktree callers) so all blocking Git/filesystem
operations run in a threadpool.
- Around line 459-476: The current delete_worktree implementation calls the
private method _get_unique_worktree_path and manually deletes the directory with
shutil.rmtree; instead call the public
RepoManager.cleanup_unique_worktree(repo_name, user_id, unique_id) on the
repo_manager instance to perform proper git worktree cleanup (and fallback
removal) and return/propagate its boolean result or raise RepositoryError on
failure; remove direct use of _get_unique_worktree_path and manual
shutil.rmtree, and consider whether the unused ref parameter in delete_worktree
should be removed or ignored when delegating to cleanup_unique_worktree.
- Around line 276-303: get_volume_info currently calls the deprecated
repo_manager.list_available_repos() to compute repo_count (causing noisy
warnings and inefficiency); replace that call with a non-deprecated counting
approach: either count entries from the generator
repo_manager._iter_metadata_entries(user_id=...) to include all metadata
entries, or use repo_manager.list_repos(user_id=...) if the intent is to count
only bare repos (no worktrees). Update the repo_count calculation in
get_volume_info accordingly, remove the deprecated call, and ensure you pick and
document which set (all entries vs. bare repos) is intended for the volume
metrics.
In `@potpie/types/repository.py`:
- Around line 71-102: The from_dict classmethod in RepositoryInfo currently
forces status to RepositoryStatus.AVAILABLE; change it to read status =
data.get("status") and if status is already a RepositoryStatus use it, if it's a
str try to convert it to RepositoryStatus (normalizing case) and fall back to
RepositoryStatus.AVAILABLE only when missing or conversion fails; then pass that
parsed status into the cls(...) call instead of the hardcoded
RepositoryStatus.AVAILABLE.
🧹 Nitpick comments (5)
app/modules/parsing/graph_construction/parsing_controller.py (1)
186-190: Silence the unuseddbarg and drop dead code.Ruff flags
dbas unused, and the commented task block can be removed to reduce clutter.♻️ Suggested cleanup
- db: Session, + _db: Session, ... - # asyncio.create_task( - # CodeProviderService(db).get_project_structure_async(new_project_id) - # )Also applies to: 206-208
potpie/config.py (3)
130-139: Add exception chaining for clearer error tracebacks.The
get_env_floatfunction should useraise ... from errto preserve the exception chain, consistent with Python best practices for exception handling.♻️ Proposed fix
def get_env_float(key: str, default: Optional[float] = None) -> Optional[float]: value = get_env(key) if value is None: return default try: return float(value) - except ValueError: + except ValueError as err: raise ConfigurationError( f"Environment variable {env_prefix}{key} must be a float" - ) + ) from err
52-55: Consider adding validation forrepos_eviction_threshold_percent.The eviction threshold represents a percentage. If a value is provided, it would be prudent to validate it falls within a sensible range (0.0–100.0) in the
validate()method to catch configuration errors early.♻️ Suggested validation addition in validate()
# Add to validate() method after existing validations: if self.repos_eviction_threshold_percent is not None: if not (0.0 <= self.repos_eviction_threshold_percent <= 100.0): raise ConfigurationError( "repos_eviction_threshold_percent must be between 0 and 100" )
282-293: Add docstring for consistency with other builder methods.All other builder methods (
postgres,neo4j,redis,llm,user,project_path,development_mode,pool_settings) include docstrings. Adding one here would maintain API consistency.♻️ Suggested docstring
def repositories( self, base_path: Optional[str] = None, volume_limit_bytes: Optional[int] = None, eviction_threshold_percent: Optional[float] = None, github_base_url: Optional[str] = None, ) -> RuntimeConfigBuilder: + """Set repository management configuration.""" self._repos_base_path = base_pathpotpie/resources/repository.py (1)
39-43: Consider caching the RepoManager instance.
_get_repo_manager()creates a newRepoManagerinstance on every call. IfRepoManagerhas any initialization overhead or maintains state, this could be inefficient. Consider lazy-initializing and caching the instance.♻️ Proposed caching pattern
def __init__( self, config: RuntimeConfig, db_manager: DatabaseManager, neo4j_manager: Neo4jManager, ): super().__init__(config, db_manager, neo4j_manager) + self._repo_manager: Optional[RepoManager] = None def _get_repo_manager(self): """Get a RepoManager instance configured from RuntimeConfig.""" + if self._repo_manager is None: + from app.modules.repo_manager.repo_manager import RepoManager + self._repo_manager = RepoManager(repos_base_path=self._config.repos_base_path) + return self._repo_manager - from app.modules.repo_manager.repo_manager import RepoManager - - return RepoManager(repos_base_path=self._config.repos_base_path)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/modules/repo_manager/repo_manager.py (1)
495-509:du -sreturns blocks, not bytes—volume math is off.The code treats
du -soutput as bytes, but it reports disk usage in blocks (typically 1 KiB units on GNU systems). This undercounts repo size and weakens eviction thresholds. Use byte-sized output explicitly withdu -sb.🐛 Proposed fix
- result = subprocess.run( - ["du", "-s", path], + result = subprocess.run( + ["du", "-sb", path], capture_output=True, text=True, timeout=60, # 60 second timeout for large directories )
🤖 Fix all issues with AI agents
In `@app/modules/parsing/graph_construction/parsing_service.py`:
- Around line 190-194: Call prepare_for_parsing with the tenant/user context:
update the call to self.repo_manager.prepare_for_parsing(...) to pass the user
id (e.g., repo_details.user_id or the current user_id variable) so worktrees are
registered per-tenant; also update the RepoManager.prepare_for_parsing signature
and any callers to accept a user_id parameter and use it for multi-tenant
metadata/eviction logic (identify symbols: RepoManager.prepare_for_parsing,
parsing_service.prepare_for_parsing call site, repo_details.user_id).
In `@app/modules/repo_manager/repo_manager.py`:
- Around line 651-658: The fetch for existing bare repos uses auth_token
directly, which can be None even after resolving a token into github_token;
change the call to _fetch_ref to pass the resolved github_token (not the
original auth_token) so private repos use the available credentials — i.e.,
after computing github_token via github_token = auth_token or
self._get_github_token() and building clone_url with _build_authenticated_url,
call self._fetch_ref(bare_repo_path, ref, github_token, repo_url) (and ensure
subsequent calls that need credentials use github_token) before calling
_update_bare_repo_metadata.
- Around line 582-599: The code currently sets remote.origin.url to an
authenticated URL (via _build_authenticated_url) which persists the token in the
repo config; instead, remove the subprocess.run call that configures
remote.origin.url and pass the authenticated_url directly to the git fetch
invocation (i.e., use authenticated_url as the remote/URL argument in the
subprocess.run that executes ["git", "-C", str(bare_repo_path), "fetch", ... ,
ref]) so the token is used for that single network operation and is never
written to .git/config.
🧹 Nitpick comments (2)
app/modules/parsing/graph_construction/parsing_repomap.py (1)
618-621: Prefer!= "."(or a real tuple) for clarity.
rel_root not in (".")is a string-membership check, which is easy to misread as a tuple. Consider the explicit comparison to avoid ambiguity.♻️ Suggested tweak
- if rel_root not in (".") and any( + if rel_root != "." and any( part.startswith(".") for part in rel_root.split(os.sep) ):potpie/types/repository.py (1)
81-89: Handle invalid datetime strings gracefully.
datetime.fromisoformatwill raise on malformed values; a single bad metadata row will crash callers. Consider a safe fallback to keepfrom_dictresilient.♻️ Proposed fix
- if isinstance(registered_at, str): - registered_at = datetime.fromisoformat(registered_at) + if isinstance(registered_at, str): + try: + registered_at = datetime.fromisoformat(registered_at) + except ValueError: + registered_at = None @@ - if isinstance(last_accessed, str): - last_accessed = datetime.fromisoformat(last_accessed) + if isinstance(last_accessed, str): + try: + last_accessed = datetime.fromisoformat(last_accessed) + except ValueError: + last_accessed = None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
app/modules/parsing/graph_construction/parsing_helper.py (1)
470-500: Missing@staticmethoddecorator.The method
extract_local_repo_metadatalacks aselfparameter but is missing the@staticmethoddecorator. While calling it viaParseHelper.extract_local_repo_metadata(repo)works, this is fragile:
- Calling via an instance (
self.extract_local_repo_metadata(repo)) would fail withrepobeing bound toself.- Static analysis tools and IDEs will flag this as an error.
- It's inconsistent with
get_local_repo_languageswhich correctly has@staticmethod.Suggested fix
+ `@staticmethod` def extract_local_repo_metadata(repo: Repo): languages = ParseHelper.get_local_repo_languages(repo.working_tree_dir)app/modules/parsing/graph_construction/parsing_service.py (1)
418-508: Critical:duplicate_graphis defined outside theParsingServiceclass.The
duplicate_graphfunction at line 418 has incorrect indentation—it's at the module level rather than indented insideParsingService. This will cause aNameErrorat runtime when the function tries to accessself.search_serviceandself.inference_service, asselfwon't be bound.🐛 Proposed fix: Indent the entire function to be a class method
-async def duplicate_graph(self, old_repo_id: str, new_repo_id: str): - await self.search_service.clone_search_indices(old_repo_id, new_repo_id) - node_batch_size = 3000 # Fixed batch size for nodes - relationship_batch_size = 3000 # Fixed batch size for relationships - try: + async def duplicate_graph(self, old_repo_id: str, new_repo_id: str): + await self.search_service.clone_search_indices(old_repo_id, new_repo_id) + node_batch_size = 3000 # Fixed batch size for nodes + relationship_batch_size = 3000 # Fixed batch size for relationships + try:The entire function body (lines 418-508) needs to be indented by one level (4 spaces) to become a method of
ParsingService.app/modules/auth/tests/auth_service_test.py (1)
114-167: Update signup error tests to match the new tuple return contract.
AuthService.signupnow catches exceptions and returns(None, {"error": ...}), so the duplicate/invalid/weak/empty-display-name tests expecting exceptions will fail. Update those tests to asserterrorinstead ofpytest.raises.🔧 Example adjustment (apply pattern to the other error tests)
- with patch( - "firebase_admin.auth.create_user", - side_effect=Exception("Email already exists"), - ): - with pytest.raises(Exception) as exc_info: - auth_service.signup( - "existing@example.com", "password123", "Test User" - ) - assert "Email already exists" in str(exc_info.value) + with patch( + "firebase_admin.auth.create_user", + side_effect=Exception("Email already exists"), + ): + success_response, error = auth_service.signup( + "existing@example.com", "password123", "Test User" + ) + assert success_response is None + assert "Email already exists" in error["error"]
🤖 Fix all issues with AI agents
In `@app/modules/parsing/graph_construction/parsing_helper.py`:
- Line 543: The method extract_remote_repo_metadata is defined without self but
missing the `@staticmethod` decorator; add `@staticmethod` above
extract_remote_repo_metadata (matching extract_local_repo_metadata) so it can be
called on the class or instance without requiring self and to keep behavior
consistent with the other static extractor methods.
- Around line 526-534: The language byte counts are double-counting because the
chain uses separate if statements so a matched `.py` or `.ts` still falls
through into the final else; update the conditional in parsing_helper.py that
inspects entry.suffix to use an if/elif/elif/else chain (checking ".py", ".ts",
".js" in order) so only one branch increments language_bytes (the language_bytes
dict and the entry.suffix checks are the unique identifiers to locate the
change).
♻️ Duplicate comments (1)
app/modules/parsing/graph_construction/parsing_service.py (1)
190-196:user_idnot passed toprepare_for_parsingfor tenant isolation.The
user_idparameter is available in scope but not passed toprepare_for_parsing. Per theRepoManager.prepare_for_parsingsignature,user_idis used for multi-tenant metadata and eviction tracking. Not passing it may cause worktrees to be registered globally, leading to cross-tenant eviction or lookup mismatches.Suggested fix
worktree = self.repo_manager.prepare_for_parsing( repo_details.repo_name, ref=ref, is_commit=bool(repo_details.commit_id), + user_id=user_id, )
🧹 Nitpick comments (2)
app/modules/parsing/graph_construction/parsing_helper.py (2)
53-58: Remove commented-out code.The commented-out block for filtering symlinked directories is dead code. With
followlinks=Falsealready set, symlinked directories won't be followed during traversal, and symlinked files are explicitly skipped at lines 62-63.Also, per static analysis hint (B007), consider renaming unused
dirnamesto_dirnamesif you want to explicitly indicate it's intentionally unused.Suggested cleanup
- for dirpath, dirnames, filenames in os.walk(path, followlinks=False): - # # Skip symlinked directories - # dirnames[:] = [ - # d for d in dirnames if not os.path.islink(os.path.join(dirpath, d)) - # ] - + for dirpath, _dirnames, filenames in os.walk(path, followlinks=False): for f in filenames:
344-357: Cleanup failure during exception handling could mask original error.The
shutil.rmtree()calls in exception handlers (lines 348, 354) are now unguarded. If cleanup fails (e.g., permission denied), it will raise a new exception that masks the originalGitCommandErroror unexpected error, making debugging harder.Consider wrapping cleanup in a try/except to ensure the original exception propagates:
Suggested fix
except GitCommandError as e: logger.exception("ParsingHelper: Git clone failed") # Clean up temp directory on error if os.path.exists(temp_clone_dir): - shutil.rmtree(temp_clone_dir) + try: + shutil.rmtree(temp_clone_dir) + except Exception as cleanup_err: + logger.warning(f"ParsingHelper: Failed to clean up temp directory: {cleanup_err}") raise ParsingFailedError(f"Failed to clone repository: {e}") from e except Exception as e: logger.exception("ParsingHelper: Unexpected error during git clone") # Clean up temp directory on error if os.path.exists(temp_clone_dir): - shutil.rmtree(temp_clone_dir) + try: + shutil.rmtree(temp_clone_dir) + except Exception as cleanup_err: + logger.warning(f"ParsingHelper: Failed to clean up temp directory: {cleanup_err}") raise ParsingFailedError( f"Unexpected error during repository clone: {e}" ) from e
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@app/modules/parsing/graph_construction/parsing_helper.py`:
- Around line 502-541: In get_local_repo_languages, skip following symlinks to
avoid escaping the root or creating cycles: before using entry.is_dir() or
entry.is_file(), check entry.is_symlink() and continue if True (or use
entry.is_dir(follow_symlinks=False) on Python 3.13+); ensure both directory
traversal (stack.append(entry)) and file size/stat logic only run for
non-symlink entries and consider validating entry.resolve().is_relative_to(root)
if you want extra protection against escapes.
♻️ Duplicate comments (3)
potpie/resources/repository.py (2)
290-290: Still using deprecatedlist_available_repos()method.Line 290 calls
list_available_repos()which was flagged in a previous review as deprecated (logs warning: "list_available_repos() is deprecated. Use list_repos() for bare repos only."). This was marked as addressed in commit e2dd43f but the code still shows the deprecated call.Replace with the non-deprecated counting approach. If counting bare repos only:
-repo_count = len(repo_manager.list_available_repos(user_id=user_id)) +repo_count = len(repo_manager.list_repos(user_id=user_id))Or if counting all metadata entries (including worktrees):
-repo_count = len(repo_manager.list_available_repos(user_id=user_id)) +repo_count = sum(1 for _ in repo_manager._iter_metadata_entries(user_id=user_id))Verify which repositories should be counted for volume metrics.
457-471: UseRepoManager.cleanup_unique_worktreeinstead of manual deletion.Lines 459-471 access the private method
_get_unique_worktree_pathand manually delete withshutil.rmtree, bypassing proper git worktree cleanup.RepoManagerprovides the public methodcleanup_unique_worktree(repo_name, user_id, unique_id)which handles proper git cleanup before falling back to filesystem removal.🔧 Proposed fix
async def delete_worktree( self, repo_name: str, ref: str, user_id: str, unique_id: str, ) -> bool: ... repo_manager = self._get_repo_manager() try: - worktree_path = repo_manager._get_unique_worktree_path( - repo_name=repo_name, ref=ref, user_id=user_id, unique_id=unique_id - ) - - if worktree_path and worktree_path.exists(): - import shutil - - shutil.rmtree(worktree_path) - logger.info(f"Deleted worktree: {worktree_path}") - return True - - logger.warning(f"Worktree not found: {worktree_path}") - return False + return repo_manager.cleanup_unique_worktree( + repo_name=repo_name, + user_id=user_id, + unique_id=unique_id, + ) except Exception as e: raise RepositoryError(f"Failed to delete worktree: {e}") from eNote: The
refparameter appears unused bycleanup_unique_worktree. Consider removing it from the method signature if it's not needed elsewhere.app/modules/parsing/graph_construction/parsing_helper.py (1)
543-543: Missing@staticmethodonextract_remote_repo_metadata(same as prior comment).
🧹 Nitpick comments (2)
potpie/resources/repository.py (1)
39-43: Consider caching the RepoManager instance.Each method call creates a new
RepoManagerinstance. If methods are called frequently, this could add overhead. Consider caching the instance as a private attribute initialized once:def __init__(self, config: RuntimeConfig, db_manager: DatabaseManager, neo4j_manager: Neo4jManager): super().__init__(config, db_manager, neo4j_manager) self._repo_manager = None def _get_repo_manager(self): """Get a RepoManager instance configured from RuntimeConfig.""" if self._repo_manager is None: from app.modules.repo_manager.repo_manager import RepoManager self._repo_manager = RepoManager(repos_base_path=self._config.repos_base_path) return self._repo_managerHowever, if RepoManager is intentionally created per-operation for isolation or thread-safety, the current approach is fine.
app/modules/parsing/graph_construction/parsing_helper.py (1)
51-64: Rename the unuseddirnamesloop variable to satisfy lint.This addresses the B007 warning without behavior change.
♻️ Proposed diff
- for dirpath, dirnames, filenames in os.walk(path, followlinks=False): + for dirpath, _dirnames, filenames in os.walk(path, followlinks=False):
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/modules/repo_manager/repo_manager.py (1)
495-508:du -sreturns block count, not bytes—missing the-bflag.The code comment claims to use
du -sb, but line 499 only specifies["du", "-s", path]. Without the-bflag,dureturns size in display block units (typically 1024-byte blocks by default, configurable via environment variables), not actual bytes. This causes the function to return values that are ~1000x smaller than intended, breaking volume tracking.Fix
# Use 'du -sb' to get size in bytes (summary, bytes) # This is more accurate and faster than walking the directory tree result = subprocess.run( - ["du", "-s", path], + ["du", "-sb", path], capture_output=True, text=True, timeout=60, # 60 second timeout for large directories )
🤖 Fix all issues with AI agents
In `@app/modules/parsing/graph_construction/parsing_helper.py`:
- Line 470: The method extract_local_repo_metadata is defined without self but
missing the `@staticmethod` decorator; add `@staticmethod` above
extract_local_repo_metadata so it behaves like get_local_repo_languages and can
be called both on the class and instances without accidentally receiving self,
ensuring the signature matching and preventing bugs when invoking
self.extract_local_repo_metadata(repo).
- Around line 520-523: The code uses entry.is_dir(follow_symlinks=False) (and
inconsistently entry.is_file()) which requires Python 3.13; either raise the
package minimum to >=3.13 or change these calls to be compatible with 3.10–3.12
— e.g., replace entry.is_dir(follow_symlinks=False) with a symlink-safe check
using entry.is_symlink() and entry.is_dir() (and make entry.is_file() likewise
consistent) in parsing_helper.py so the directory/file detection does not rely
on the newer follow_symlinks parameter; update all uses (entry.is_dir,
entry.is_file) accordingly to ensure consistent symlink handling or bump
requires-python to >=3.13 in pyproject.toml.
In `@app/modules/repo_manager/repo_manager.py`:
- Around line 1522-1534: The current repo eviction removes the filesystem
directory but leaves orphaned metadata; update the eviction logic in the
function containing the shutil.rmtree call (e.g., evict_repo / the method in
repo_manager.py handling repo eviction) to also remove metadata keyed to that
repo: before/after deleting repo_dir, query the centralized metadata store (use
the repository identifier repo_name) and delete or mark all entries belonging to
that repo (e.g., call the metadata service method to list entries by repo and
then delete them or use a bulk delete API such as
delete_entries_by_repo/delete_by_namespace), handle and log any metadata
deletion errors but continue cleanup, and ensure the code paths use the same
repo identifier and transactional ordering so metadata removal won’t be left
orphaned if rmtree succeeds.
- Around line 805-809: The call to _fetch_ref uses auth_token directly which may
be None; instead, resolve the token the same way ensure_bare_repo does and pass
that resolved token to _fetch_ref (e.g., obtain resolved_token via the same
helper used in ensure_bare_repo and replace auth_token with resolved_token when
calling _fetch_ref with repo_url). Ensure you use the same token-resolution path
as in ensure_bare_repo so private repos can be fetched.
🧹 Nitpick comments (8)
app/modules/repo_manager/repo_manager.py (7)
66-66: Unused constant_DEFAULT_VOLUME_LIMIT_GB.The constant is defined but the initialization code manually calculates
100 * 1024 * 1024 * 1024instead of using it.♻️ Suggested fix
- self.volume_limit_bytes = 100 * 1024 * 1024 * 1024 + self.volume_limit_bytes = self._DEFAULT_VOLUME_LIMIT_GB * 1024 * 1024 * 1024 else: - self.volume_limit_bytes = 100 * 1024 * 1024 * 1024 + self.volume_limit_bytes = self._DEFAULT_VOLUME_LIMIT_GB * 1024 * 1024 * 1024Also applies to: 106-108
144-151: Token sanitization may miss newer GitHub token formats.The regex
ghp_[a-zA-Z0-9]{36}only matches classic PATs. GitHub fine-grained tokens usegithub_pat_prefix with variable length.🔒 Suggested fix for broader token coverage
`@staticmethod` def _sanitize_error_message(error_msg: str) -> str: """Remove potential credentials from error messages.""" import re sanitized = re.sub(r"oauth2:[^@]+@", "oauth2:***@", error_msg) - sanitized = re.sub(r"ghp_[a-zA-Z0-9]{36}", "***", sanitized) + # Classic PATs (ghp_), fine-grained PATs (github_pat_), OAuth tokens (gho_), etc. + sanitized = re.sub(r"(ghp_|gho_|ghu_|ghs_|ghr_)[a-zA-Z0-9_]+", "***", sanitized) + sanitized = re.sub(r"github_pat_[a-zA-Z0-9_]+", "***", sanitized) return sanitized
196-205: Minor duplication in path helpers.
_get_repo_dirand_get_repo_local_pathare identical. Consider consolidating or documenting the semantic distinction.
291-298:datetime.utcnow()is deprecated in Python 3.12+.Consider using
datetime.now(timezone.utc)for forward compatibility.♻️ Suggested fix
+from datetime import datetime, timedelta, timezone + `@staticmethod` def _deserialize_datetime(dt_str: Optional[str]) -> datetime: if not dt_str: - return datetime.utcnow() + return datetime.now(timezone.utc).replace(tzinfo=None) try: return datetime.fromisoformat(dt_str) except ValueError: logger.warning(f"Failed to parse datetime '{dt_str}'; defaulting to now()") - return datetime.utcnow() + return datetime.now(timezone.utc).replace(tzinfo=None)Note: This pattern should be applied to all
datetime.utcnow()calls in the file (lines 542, 547, 1175, 1312, 1319, 1393, 1577).
701-708: Uselogger.exceptionfor better stack trace logging.When catching exceptions in
exceptblocks,logger.exceptionautomatically includes the traceback, whilelogger.errordoes not.♻️ Suggested fix
except subprocess.TimeoutExpired: - logger.error("Git clone timeout", repo_name=repo_name) + logger.exception("Git clone timeout for %s", repo_name) raise RuntimeError( f"Git clone timed out after {self._CLONE_TIMEOUT // 60} minutes" - ) + ) from None
1134-1136: Redundant exception inlogging.exceptioncall.
logger.exceptionautomatically includes the exception info; passingein the message is redundant.♻️ Suggested fix
except Exception as e: - logger.exception(f"Error cleaning up unique worktrees: {e}") + logger.exception("Error cleaning up unique worktrees") return False
1447-1449: Deprecation warning on every call may be noisy.Consider using
warnings.warnwithDeprecationWarningcategory for proper deprecation handling, or log once using a class-level flag.♻️ Suggested fix using warnings module
+import warnings + def list_available_repos( self, user_id: Optional[str] = None, limit: Optional[int] = None, ) -> List[Dict[str, Any]]: """ List all available repositories including worktrees. DEPRECATED: Use list_repos() for v2-style behavior (bare repos only). """ - logger.warning( - "list_available_repos() is deprecated. Use list_repos() for bare repos only." - ) + warnings.warn( + "list_available_repos() is deprecated. Use list_repos() for bare repos only.", + DeprecationWarning, + stacklevel=2, + )app/modules/parsing/graph_construction/parsing_helper.py (1)
53-64: Clean up unused variable and dead code.The
dirnamesvariable is unused (as flagged by static analysis), and the commented-out code block (lines 54-57) is redundant sincefollowlinks=Falsealready prevents traversing into symlinked directories.Suggested cleanup
`@staticmethod` def get_directory_size(path): total_size = 0 - for dirpath, dirnames, filenames in os.walk(path, followlinks=False): - # # Skip symlinked directories - # dirnames[:] = [ - # d for d in dirnames if not os.path.islink(os.path.join(dirpath, d)) - # ] - + for dirpath, _, filenames in os.walk(path, followlinks=False): for f in filenames: fp = os.path.join(dirpath, f) # Skip all symlinks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@potpie/resources/repository.py`:
- Around line 379-429: The create_worktree method currently ignores the
exists_ok parameter (it’s accepted in the signature but
repo_manager.create_worktree is called with exists_ok=True), so propagate the
caller's intent by passing the exists_ok parameter through to
repo_manager.create_worktree (i.e., replace the hardcoded exists_ok=True with
exists_ok=exists_ok) or, if the design decides existence behavior should not be
configurable, remove exists_ok from create_worktree's signature and callers;
update references to the create_worktree function accordingly to maintain
consistency.
🧹 Nitpick comments (1)
potpie/resources/repository.py (1)
40-44: Consider caching the RepoManager instance.
_get_repo_manager()creates a newRepoManagerinstance on every method call. IfRepoManagerinitialization is lightweight and stateless, this is fine. However, if it involves any setup cost (e.g., filesystem checks, configuration parsing), consider lazy initialization with caching:♻️ Optional: Lazy singleton pattern
+ _repo_manager: Optional["RepoManager"] = None + def _get_repo_manager(self): """Get a RepoManager instance configured from RuntimeConfig.""" + if self._repo_manager is None: + from app.modules.repo_manager.repo_manager import RepoManager + self._repo_manager = RepoManager(repos_base_path=self._config.repos_base_path) + return self._repo_manager - from app.modules.repo_manager.repo_manager import RepoManager - - return RepoManager(repos_base_path=self._config.repos_base_path)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/modules/repo_manager/repo_manager.py (1)
531-543:du -sreturns blocks, not bytes; code misinterprets the output.The comment claims "Use 'du -sb'" but the actual code only uses
-s. Across systems,du -sreturns blocks (1024-byte on Linux, 512-byte on macOS/BSD), not bytes. The code treats these directly as bytes, causing incorrect size calculations that affect eviction and quota checks.🔧 Suggested fix
result = subprocess.run( - ["du", "-s", path], + ["du", "-sk", path], capture_output=True, text=True, timeout=60, # 60 second timeout for large directories ) if result.returncode == 0: # du output format: "size_bytes\tpath" - size_str = result.stdout.split()[0] - return int(size_str) + size_kb = int(result.stdout.split()[0]) + return size_kb * 1024
🧹 Nitpick comments (1)
app/modules/parsing/graph_construction/parsing_helper.py (1)
51-53: Rename unused loop variable to avoid lint noise.
dirnamesisn’t used; rename to_dirnames(or re-enable dir filtering) to align with Ruff B007.♻️ Suggested tweak
- for dirpath, dirnames, filenames in os.walk(path, followlinks=False): + for dirpath, _dirnames, filenames in os.walk(path, followlinks=False):
The merge-base changed after approval.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/modules/parsing/graph_construction/parsing_controller.py (1)
117-176:⚠️ Potential issue | 🔴 CriticalFix NameError:
asynciois not imported.
Line 152 callsasyncio.create_task(...), but onlycreate_taskis imported. This will raise at runtime in the demo-repo path. Use the importedcreate_task(or importasyncio).🛠️ Proposed fix
- asyncio.create_task( + create_task( CodeProviderService(db).get_project_structure_async( new_project_id ) )
🧹 Nitpick comments (2)
app/modules/parsing/graph_construction/parsing_controller.py (2)
261-265:dbparameter is unused after disabling the async task.
Ruff flagsdbas unused. Consider renaming to_dbor removing it (and updating call sites) to keep the signature clean.🧹 Minimal lint-friendly tweak
- db: Session, + _db: Session,
281-283: Remove commented-out code or guard with a feature flag.
Leaving dead code makes future intent unclear. If needed later, wrap with a feature flag/config instead of commenting out.🧹 Proposed cleanup
- # asyncio.create_task( - # CodeProviderService(db).get_project_structure_async(new_project_id) - # )
The merge-base changed after approval.
The merge-base changed after approval.
The merge-base changed after approval.
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
app/modules/repo_manager/repo_manager.py (2)
1304-1369:⚠️ Potential issue | 🟠 MajorMetadata keys ignore
user_id/unique_id, causing collisions.
_write_metadata_entry()keys only byrepo_name/branch/commit, so multiple users (or multiple unique worktrees for the same ref) overwrite each other. That breaks multi-tenant isolation and eviction accuracy.🛠️ Suggested direction (propagate user_id/unique_id into metadata key)
-def _metadata_filename(self, branch: Optional[str], commit_id: Optional[str]) -> str: +def _metadata_filename( + self, + branch: Optional[str], + commit_id: Optional[str], + user_id: Optional[str] = None, + unique_id: Optional[str] = None, +) -> str: parts: List[str] = [] if branch: parts.append(f"branch={branch}") if commit_id: parts.append(f"commit={commit_id}") + if user_id: + parts.append(f"user={user_id}") + if unique_id: + parts.append(f"unique={unique_id}") if not parts: parts.append("default")Update
_metadata_path,_load_metadata_entry,_write_metadata_entry,_delete_metadata_entry, and callers (e.g.,register_repo,update_last_accessed,remove_worktree) to thread throughuser_idandunique_id.
517-557:⚠️ Potential issue | 🔴 CriticalCode uses
du -s(blocks) but treats output as bytes, breaking eviction thresholds.The comment says "Use 'du -sb'" but the code runs
["du", "-s", path]. On Linux,du -sreturns disk usage in blocks (typically 1024 bytes/block), not bytes. The current code treats this as bytes, undercounting volume by ~881x. This will severely distort eviction calculations—repos won't trigger eviction until usage far exceeds the configured limit.🛠️ Suggested fix
- result = subprocess.run( - ["du", "-s", path], + result = subprocess.run( + ["du", "-sb", path], capture_output=True, text=True, timeout=60, # 60 second timeout for large directories )This aligns the code with the documented intent and returns bytes directly.
🤖 Fix all issues with AI agents
In `@app/modules/repo_manager/repo_manager.py`:
- Around line 477-513: _fmt: _format_repo_info currently parses metadata but
discards a top-level "type" field, making bare repos invisible to list_repos
which filters on metadata["type"]; update _format_repo_info to merge
raw_data.get("type") into the parsed metadata (only setting metadata["type"]
when it's missing or empty) before returning so the returned dict's "metadata"
always contains the repo type used by list_repos; reference symbols:
_format_repo_info, raw_data, metadata, list_repos.
- Around line 981-1053: The remove_worktree method currently always deletes
metadata and returns True even when worktree removal fails; change it to track
removal success and only delete metadata (via _delete_metadata_entry) and return
True when the worktree was actually removed. Concretely, in remove_worktree use
a boolean (e.g., worktree_removed = False), set it to True when subprocess.run
returns returncode==0, and if you fall back to shutil.rmtree, attempt removal
and set worktree_removed based on whether worktree_path no longer exists
(os.path.exists). Move the metadata deletion block to run only if
worktree_removed is True, otherwise log the failure and return False. Ensure you
still catch TimeoutExpired/Exception, log them, attempt rmtree, and set
worktree_removed accordingly before deciding metadata deletion and the returned
value.
- Around line 561-602: The metadata's local_path is being set to the repo root
via _get_repo_local_path which can point to a non-git container; change
_update_bare_repo_metadata so local_path is set to the actual bare repo
directory (use _get_bare_repo_path(repo_name) and stringify it) and only fall
back to _get_repo_local_path if the bare repo path does not exist; keep the rest
of the logic (volume calculation via _get_bare_repo_path, writing via
_write_metadata_entry) unchanged and preserve the function name
_update_bare_repo_metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
app/modules/parsing/graph_construction/parsing_controller.py (1)
58-70:⚠️ Potential issue | 🟡 Minor
~paths are detected but not expanded.
os.path.isdir(os.path.expanduser(...))is used for detection, but the storedrepo_pathkeeps~, which later failsos.path.exists.🛠️ Proposed fix
if is_path: # Move from repo_name to repo_path - repo_details.repo_path = repo_details.repo_name + repo_details.repo_path = os.path.expanduser(repo_details.repo_name) repo_details.repo_name = repo_details.repo_path.split("/")[-1]app/modules/parsing/graph_construction/parsing_service.py (1)
425-510:⚠️ Potential issue | 🔴 Critical
duplicate_graphis de‑indented and no longer a class method.As written, this becomes a module‑level function and
self.duplicate_graph(...)will fail at runtime.🛠️ Proposed fix
-async def duplicate_graph(self, old_repo_id: str, new_repo_id: str): + async def duplicate_graph(self, old_repo_id: str, new_repo_id: str): await self.search_service.clone_search_indices(old_repo_id, new_repo_id) ...app/modules/repo_manager/repo_manager.py (1)
531-543:⚠️ Potential issue | 🔴 CriticalVolume units mismatch:
du -sreturns blocks, not bytes.The function docstring specifies "Size in bytes" but the code runs
du -swhich returns block counts (typically 512B–4KB per block). This causes directory sizes to be underreported by 500–4000×, breaking volume tracking and eviction limits. The comment claims "Use 'du -sb'" but the actual command omits the-bflag.Replace with
du -bordu -sbto get byte-accurate results.🛠️ Proposed fix
- # Use 'du -sb' to get size in bytes (summary, bytes) - # This is more accurate and faster than walking the directory tree - result = subprocess.run( - ["du", "-s", path], + # Use 'du -b' to get size in bytes + result = subprocess.run( + ["du", "-b", path], capture_output=True, text=True, timeout=60, # 60 second timeout for large directories )
🤖 Fix all issues with AI agents
In `@app/modules/conversations/conversation/conversation_service.py`:
- Around line 267-275: The done-callback for fetch_task calls t.exception()
twice and doesn't guard against CancelledError; change the callback used with
asyncio.create_task(_fetch_structure_with_timeout()) so it first checks
t.cancelled() and returns early for cancelled tasks, then calls t.exception()
once (catching CancelledError if needed), stores that result in a variable, and
only calls logger.exception("Failed to fetch project structure", exc_info=...)
when a real exception is present—refer to fetch_task,
_fetch_structure_with_timeout, and Task.exception()/Task.cancelled() when making
this change.
In `@app/modules/repo_manager/repo_manager.py`:
- Around line 951-963: register_repo is currently keyed only by (repo_name,
branch, commit_id) so multiple unique worktrees (metadata with unique_id and
differing user_id) overwrite each other; update the metadata keying logic used
by register_repo and the helpers that build storage paths (e.g.,
_metadata_filename / _metadata_path) to include user_id and/or unique_id (or
introduce a dedicated namespace for worktree entries) so each worktree metadata
entry is stored under a unique composite key; change calls that create metadata
(see metadata dict in this snippet) to ensure unique_id and user_id are present
and adjust lookup/cleanup logic to account for the extended key to preserve
multi‑tenant isolation.
🧹 Nitpick comments (1)
app/modules/parsing/graph_construction/parsing_controller.py (1)
273-280:dbparameter is unused inhandle_new_project.Ruff flags this (
ARG004). Consider removing it from the signature or using it (e.g., for async helpers).
| # Register worktree in metadata | ||
| metadata = {"type": self._TYPE_WORKTREE, "is_commit": is_commit} | ||
| if unique_id: | ||
| metadata["unique_id"] = unique_id | ||
|
|
||
| self.register_repo( | ||
| repo_name=repo_name, | ||
| local_path=str(worktree_path), | ||
| branch=None if is_commit else ref, | ||
| commit_id=ref if is_commit else None, | ||
| user_id=user_id, | ||
| metadata=metadata, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unique worktree metadata overwrites existing entries.
register_repo() keys metadata by (repo_name, branch, commit_id) only, so multiple unique worktrees for the same ref (or multiple users) clobber each other. This breaks multi‑tenant isolation and cleanup targeting.
Suggested direction: include user_id/unique_id in the metadata key (e.g., extend _metadata_filename/_metadata_path) or store unique worktree entries under a dedicated namespace.
🤖 Prompt for AI Agents
In `@app/modules/repo_manager/repo_manager.py` around lines 951 - 963,
register_repo is currently keyed only by (repo_name, branch, commit_id) so
multiple unique worktrees (metadata with unique_id and differing user_id)
overwrite each other; update the metadata keying logic used by register_repo and
the helpers that build storage paths (e.g., _metadata_filename / _metadata_path)
to include user_id and/or unique_id (or introduce a dedicated namespace for
worktree entries) so each worktree metadata entry is stored under a unique
composite key; change calls that create metadata (see metadata dict in this
snippet) to ensure unique_id and user_id are present and adjust lookup/cleanup
logic to account for the extended key to preserve multi‑tenant isolation.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/modules/parsing/graph_construction/parsing_service.py (1)
431-520:⚠️ Potential issue | 🔴 Critical
duplicate_graphis no longer inside the class.It is currently module-scoped but uses
self, which will raise at call sites likeparsing_service.duplicate_graph(...).🧭 Suggested fix (indent into class)
-async def duplicate_graph(self, old_repo_id: str, new_repo_id: str): - await self.search_service.clone_search_indices(old_repo_id, new_repo_id) + async def duplicate_graph(self, old_repo_id: str, new_repo_id: str): + await self.search_service.clone_search_indices(old_repo_id, new_repo_id) ...
🤖 Fix all issues with AI agents
In `@app/modules/parsing/graph_construction/parsing_controller.py`:
- Around line 286-292: The parameter named db: Session is unused in the function
that takes repo_details: ParsingRequest, user_id, user_email, new_project_id,
project_manager: ProjectService, db: Session; rename db to _db to silence lint
warnings (or if callers can be updated, remove the parameter entirely) and
update any call sites or references accordingly so the signature and callers
remain consistent.
- Around line 181-197: The code creates an async email task regardless of
whether user_email is present, causing background errors when user_email is
None; update the block around create_task/EmailHelper().send_email so it only
constructs email_task and registers _on_email_done if user_email (or a truthy
email) is available — i.e., wrap the create_task call, the
definition/registration of _on_email_done, and email_task.add_done_callback in a
conditional (if user_email:) so no task or callback is created when user_email
is missing.
| email_task = create_task( | ||
| EmailHelper().send_email( | ||
| user_email, repo_name, repo_details.branch_name | ||
| ) | ||
| ) | ||
|
|
||
| def _on_email_done(t: asyncio.Task) -> None: | ||
| if t.cancelled(): | ||
| return | ||
| try: | ||
| exc = t.exception() | ||
| except asyncio.CancelledError: | ||
| return | ||
| if exc is not None: | ||
| logger.exception("Failed to send email", exc_info=exc) | ||
|
|
||
| email_task.add_done_callback(_on_email_done) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Skip email task when user_email is missing.
user_email is now optional; creating a send task with None will just log background errors. Guard it.
💡 Suggested fix
- email_task = create_task(
- EmailHelper().send_email(
- user_email, repo_name, repo_details.branch_name
- )
- )
+ if user_email:
+ email_task = create_task(
+ EmailHelper().send_email(
+ user_email, repo_name, repo_details.branch_name
+ )
+ )
- def _on_email_done(t: asyncio.Task) -> None:
- if t.cancelled():
- return
- try:
- exc = t.exception()
- except asyncio.CancelledError:
- return
- if exc is not None:
- logger.exception("Failed to send email", exc_info=exc)
-
- email_task.add_done_callback(_on_email_done)
+ def _on_email_done(t: asyncio.Task) -> None:
+ if t.cancelled():
+ return
+ try:
+ exc = t.exception()
+ except asyncio.CancelledError:
+ return
+ if exc is not None:
+ logger.exception("Failed to send email", exc_info=exc)
+
+ email_task.add_done_callback(_on_email_done)🤖 Prompt for AI Agents
In `@app/modules/parsing/graph_construction/parsing_controller.py` around lines
181 - 197, The code creates an async email task regardless of whether user_email
is present, causing background errors when user_email is None; update the block
around create_task/EmailHelper().send_email so it only constructs email_task and
registers _on_email_done if user_email (or a truthy email) is available — i.e.,
wrap the create_task call, the definition/registration of _on_email_done, and
email_task.add_done_callback in a conditional (if user_email:) so no task or
callback is created when user_email is missing.
| repo_details: ParsingRequest, | ||
| user_id: str, | ||
| user_email: str, | ||
| user_email: str | None, | ||
| new_project_id: str, | ||
| project_manager: ProjectService, | ||
| db: AsyncSession, | ||
| db: Session, | ||
| ): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused db parameter.
If this is intentionally unused, consider renaming to _db to avoid lint noise.
🧹 Suggested fix
- db: Session,
+ _db: Session,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| repo_details: ParsingRequest, | |
| user_id: str, | |
| user_email: str, | |
| user_email: str | None, | |
| new_project_id: str, | |
| project_manager: ProjectService, | |
| db: AsyncSession, | |
| db: Session, | |
| ): | |
| repo_details: ParsingRequest, | |
| user_id: str, | |
| user_email: str | None, | |
| new_project_id: str, | |
| project_manager: ProjectService, | |
| _db: Session, | |
| ): |
🧰 Tools
🪛 Ruff (0.14.14)
[warning] 291-291: Unused static method argument: db
(ARG004)
🤖 Prompt for AI Agents
In `@app/modules/parsing/graph_construction/parsing_controller.py` around lines
286 - 292, The parameter named db: Session is unused in the function that takes
repo_details: ParsingRequest, user_id, user_email, new_project_id,
project_manager: ProjectService, db: Session; rename db to _db to silence lint
warnings (or if callers can be updated, remove the parameter entirely) and
update any call sites or references accordingly so the signature and callers
remain consistent.



RepoManager now handles all the repos for parsing and creating user and task specific worktrees. Additionally:
Summary by CodeRabbit
New Features
Refactor
Security