Skip to content

Conversation

@evanscastonguay
Copy link
Contributor

@evanscastonguay evanscastonguay commented Jan 10, 2026

User description

Summary

  • Support GitLab clone URLs on custom domains (not only gitlab.*)
  • Preserve legacy parsing for gitlab.* hostnames
  • Add unit tests for gitlab.com, custom domains, and invalid URLs

Testing

  • python3 -m pytest tests/unittest/test_gitlab_provider.py

PR Type

Bug fix, Tests


Description

  • Refactor GitLab clone URL parsing to support custom domains

    • Use urllib.parse.urlparse() for robust URL handling
    • Support any GitLab host, not just gitlab.* domains
  • Preserve legacy parsing as fallback for raw URLs

  • Add comprehensive unit tests for gitlab.com, custom domains, and invalid URLs


Diagram Walkthrough

flowchart LR
  A["Clone URL Input"] --> B{"Has http/https scheme?"}
  B -->|Yes| C["Parse with urlparse"]
  C --> D["Extract scheme, host, path"]
  D --> E["Build authenticated URL"]
  E --> F["Return clone URL"]
  B -->|No| G{"Contains gitlab.?"}
  G -->|Yes| H["Legacy gitlab. parsing"]
  H --> F
  G -->|No| I["Return error"]
Loading

File Walkthrough

Relevant files
Bug fix
gitlab_provider.py
Refactor clone URL parsing for custom domains                       

pr_agent/git_providers/gitlab_provider.py

  • Replaced string-based gitlab. domain splitting with
    urllib.parse.urlparse() for robust URL parsing
  • Added support for custom GitLab domains (e.g., gitlab.example.com) in
    addition to gitlab.* domains
  • Simplified token validation logic to check only for access token
    existence
  • Preserved legacy parsing as fallback for non-HTTP URLs
  • Improved error handling with detailed logging and exception artifacts
+29/-15 
Tests
test_gitlab_provider.py
Add clone URL parsing tests                                                           

tests/unittest/test_gitlab_provider.py

  • Added provider.gl.oauth_token = "fake_token" to fixture for token
    availability
  • Added test for gitlab.com clone URL with token injection
  • Added test for custom domain clone URL (e.g., gitlab.example.com)
  • Added test for invalid URL without scheme handling
+25/-0   

@qodo-free-for-open-source-projects
Copy link
Contributor

qodo-free-for-open-source-projects bot commented Jan 10, 2026

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Token exposure in URL

Description: The access token is embedded directly in the clone URL string, which may be logged,
cached, or exposed through error messages, potentially leaking sensitive credentials.
gitlab_provider.py [971-971]

Referred Code
    return f"{parsed.scheme}://oauth2:{access_token}@{netloc}{parsed.path}"
except Exception as exc:
Credential logging risk

Description: The error logging includes the full repo_url_to_clone which may contain embedded
credentials if the URL was already authenticated, potentially exposing tokens in logs.
gitlab_provider.py [973-976]

Referred Code
get_logger().error(
    f"Repo URL: {repo_url_to_clone} could not be parsed for clone.",
    artifact={"error": str(exc)},
)
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Consistent Naming Conventions

Objective: All new variables, functions, and classes must follow the project's established naming
standards

Status: Passed

No Dead or Commented-Out Code

Objective: Keep the codebase clean by ensuring all submitted code is active and necessary

Status: Passed

Robust Error Handling

Objective: Ensure potential errors and edge cases are anticipated and handled gracefully throughout
the code

Status: Passed

Single Responsibility for Functions

Objective: Each function should have a single, well-defined responsibility

Status: Passed

🔴
When relevant, utilize early return

Objective: In a code snippet containing multiple logic conditions (such as 'if-else'), prefer an
early return on edge cases than deep nesting

Status:
Missing early return: The function checks if not access_token at line 958 but returns None at line 960 without
an early return statement, continuing to execute subsequent code blocks unnecessarily.

Referred Code
if not access_token:
    get_logger().error("No access token found for GitLab clone.")
    return None
  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-free-for-open-source-projects
Copy link
Contributor

qodo-free-for-open-source-projects bot commented Jan 10, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Improve parsing for non-HTTP URLs
Suggestion Impact:The commit directly implements the suggested refactoring. It replaces the legacy string-splitting approach (splitting by "gitlab.") with urlparse-based parsing, adds SCP-style URL handling, and improves error handling with try-except blocks. The implementation matches the suggestion's approach almost exactly.

code diff:

+        # Fallback for non-HTTP URLs (e.g., ssh or scp-style).
+        try:
+            if "@" in repo_url_to_clone and ":" in repo_url_to_clone and not repo_url_to_clone.startswith("ssh://"):
+                # Handle SCP-like URLs: git@gitlab.com:group/repo.git
+                repo_url_to_clone = "ssh://" + repo_url_to_clone.replace(":", "/", 1)
+
+            from urllib.parse import urlparse
+            parsed = urlparse(repo_url_to_clone)
+            if not parsed.netloc:
+                raise ValueError("missing host")
+
+            netloc = parsed.netloc.split("@")[-1]
+            scheme = parsed.scheme if parsed.scheme else "https"
+            return f"{scheme}://oauth2:{access_token}@{netloc}{parsed.path}"
+        except Exception as e:
             get_logger().error(
-                f"Repo URL: {repo_url_to_clone} is missing prefix: {scheme} and/or base URL: {base_url}."
+                f"Repo URL: {repo_url_to_clone} could not be parsed for clone.",
+                artifact={"error": str(e)},
             )
             return None

Refactor the fallback URL parsing logic to use urlparse for non-HTTP(S) URLs,
including SCP-style git URLs, instead of relying on string splitting by gitlab..

pr_agent/git_providers/gitlab_provider.py [979-989]

 # Fallback to legacy gitlab.* parsing when a raw URL is provided.
-if "gitlab." not in repo_url_to_clone:
-    get_logger().error(f"Repo URL: {repo_url_to_clone} is not a valid gitlab URL.")
-    return None
-scheme, base_url = repo_url_to_clone.split("gitlab.")
-if not all([scheme, base_url]):
+try:
+    if "@" in repo_url_to_clone and ":" in repo_url_to_clone and not repo_url_to_clone.startswith("ssh://"):
+        # Handle SCP-like URLs, e.g., git@gitlab.com:group/repo.git
+        repo_url_to_clone = "ssh://" + repo_url_to_clone.replace(":", "/", 1)
+
+    parsed = urlparse(repo_url_to_clone)
+    if not parsed.netloc:
+        raise ValueError("missing host")
+
+    netloc = parsed.netloc.split("@")[-1]
+    scheme = parsed.scheme if parsed.scheme else "https"
+    return f"{scheme}://oauth2:{access_token}@{netloc}{parsed.path}"
+except Exception as exc:
     get_logger().error(
-        f"Repo URL: {repo_url_to_clone} is missing prefix: {scheme} and/or base URL: {base_url}."
+        f"Repo URL: {repo_url_to_clone} could not be parsed for clone.",
+        artifact={"error": str(exc)},
     )
     return None
-return f"{scheme}oauth2:{access_token}@gitlab.{base_url}"

[Suggestion processed]

Suggestion importance[1-10]: 8

__

Why: This suggestion significantly improves the robustness of the fallback URL parsing logic, which was the main goal of the PR, by correctly handling non-HTTP and custom domain URLs.

Medium
Learned
best practice
Use consistent exception naming convention
Suggestion Impact:The suggestion was directly implemented. The exception variable was renamed from 'exc' to 'e' in both the except clause (line 6) and the error logging statement (line 10). Additionally, a new exception handler was added later in the code (line 34) that also uses 'e' as the exception variable name, following the same convention.

code diff:

-            except Exception as exc:
+            except Exception as e:
                 get_logger().error(
                     f"Repo URL: {repo_url_to_clone} could not be parsed for clone.",
-                    artifact={"error": str(exc)},
+                    artifact={"error": str(e)},

The exception is caught as 'exc' but the pattern should use 'as e' for
consistency with the codebase's error handling conventions. This ensures uniform
exception handling across the project.

pr_agent/git_providers/gitlab_provider.py [965-977]

 try:
     from urllib.parse import urlparse
     parsed = urlparse(repo_url_to_clone)
     if not parsed.scheme or not parsed.netloc:
         raise ValueError("missing scheme or host")
     netloc = parsed.netloc.split("@")[-1]
     return f"{parsed.scheme}://oauth2:{access_token}@{netloc}{parsed.path}"
-except Exception as exc:
+except Exception as e:
     get_logger().error(
         f"Repo URL: {repo_url_to_clone} could not be parsed for clone.",
-        artifact={"error": str(exc)},
+        artifact={"error": str(e)},
     )
     return None

[Suggestion processed]

Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - When catching exceptions in try-except blocks, always capture the exception object using 'as e' syntax to enable proper error logging and debugging.

Low
General
Remove redundant import statement
Suggestion Impact:The commit removed the redundant import statement on line 5 (previously line 966) as suggested. The commit also made additional changes including renaming the exception variable from `exc` to `e` and refactoring the fallback URL parsing logic, but the core suggestion to remove the redundant import was implemented.

code diff:

-                from urllib.parse import urlparse

Remove the redundant from urllib.parse import urlparse statement inside the
_prepare_clone_url_with_token method, as it is already imported at the file
level.

pr_agent/git_providers/gitlab_provider.py [962-977]

 # Note: GitLab instances are not always hosted under a gitlab.* domain.
 # Build a clone URL that works with any host (e.g., gitlab.example.com).
 if repo_url_to_clone.startswith(("http://", "https://")):
     try:
-        from urllib.parse import urlparse
         parsed = urlparse(repo_url_to_clone)
         if not parsed.scheme or not parsed.netloc:
             raise ValueError("missing scheme or host")
         netloc = parsed.netloc.split("@")[-1]
         return f"{parsed.scheme}://oauth2:{access_token}@{netloc}{parsed.path}"
     except Exception as exc:
         get_logger().error(
             f"Repo URL: {repo_url_to_clone} could not be parsed for clone.",
             artifact={"error": str(exc)},
         )
         return None

[Suggestion processed]

Suggestion importance[1-10]: 3

__

Why: The suggestion correctly identifies a redundant import statement, and removing it is a good code cleanup practice that improves maintainability.

Low
  • Update
  • Author self-review: I have reviewed the PR code suggestions, and addressed the relevant ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant