Skip to content

Conversation

@arceuzvx
Copy link
Owner

@arceuzvx arceuzvx commented Jan 27, 2026

Summary by CodeRabbit

  • New Features

    • Dockerfile analysis: security, size, and best-practice checks with severity filtering and security-only mode.
    • Image recommendations (hardened/smaller/optimized), multi-stage detection, non-root and HEALTHCHECK checks, and JSON output for CI/CD.
    • New CLI commands for Docker analysis with path, JSON, severity, and security-only options.
  • Documentation

    • Added Docker Commands section to README and manual; added built-in manual entry (slim man).

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 27, 2026

📝 Walkthrough

Walkthrough

Adds a Dockerfile analysis feature: new CLI docker subcommand, a Dockerfile scanner producing structured issues and recommendations, a core image-alternatives module, and corresponding README, man page, and changelog updates for JSON output, severity/security filters, and hardened-image suggestions. (33 words)

Changes

Cohort / File(s) Summary
Documentation & Changelog
CHANGELOG.md, README.md, man/slim.1
Adds Docker Commands / Built-in manual entries, README examples for slim docker (--json, --severity, --security-only), and changelog notes describing Docker analysis capabilities.
CLI Integration
slim/cli.py
Adds docker subcommand and handler cmd_docker_scan(args) -> int; flags --json, --path, --severity, --security-only; formats human/JSON output and returns non-zero on critical issues.
Core Image Alternatives
slim/core/docker_images.py
New module with ImageAlternative dataclass, AlternativeReason enum, IMAGE_ALTERNATIVES mapping, tag/registry heuristics (RISKY_TAGS, HARDENED_REGISTRIES, OPTIMIZED_TAG_MARKERS), parsing helpers, and get_alternatives(image) recommendation engine.
Dockerfile Scanner
slim/scanners/docker_scanner.py
New scanner with models (DockerfileIssue, ImageRecommendation, DockerScanResult) and functions (find_dockerfile, parse_dockerfile, analyze_dockerfile, scan_dockerfile, get_scan_result_dict). Detects base images, multi-stage, USER, HEALTHCHECK, secrets, risky tags, ADD/COPY/pipes, package-manager patterns, and emits issues + recommendations.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI as CLI Parser
    participant Scanner as docker_scanner
    participant Images as docker_images
    participant FS as Filesystem

    User->>CLI: slim docker [--path PATH | --json | --severity ...]
    CLI->>Scanner: scan_dockerfile(project_path or dockerfile_path)
    Scanner->>FS: find_dockerfile(project_path)
    FS-->>Scanner: dockerfile_path
    Scanner->>FS: parse_dockerfile(dockerfile_path)
    FS-->>Scanner: lines (line_number, text)
    Scanner->>Scanner: analyze_dockerfile(lines)
    loop per FROM base image
        Scanner->>Images: get_alternatives(image)
        Images-->>Scanner: ImageAlternative[]
    end
    Scanner-->>CLI: DockerScanResult
    CLI->>CLI: apply filters (severity, security-only), format (human or JSON)
    CLI-->>User: output report (exit code reflects critical issues)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Poem

🐇 I hop through lines of FROM and RUN,

I sniff for secrets hidden from the sun.
I nudge toward slim, away from "latest" blight,
With carrot-flavored tips, I set things right. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 55.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add Dockerfile security and optimization analysis' accurately and concisely summarizes the main changes: introduction of a new Dockerfile analysis feature with security scanning and optimization recommendations.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@slim/scanners/docker_scanner.py`:
- Around line 174-183: The parse_dockerfile function currently swallows all
errors; update it to either log the exception with details or re-raise so
failures aren’t silent: catch specific exceptions (e.g., IOError/OSError and
UnicodeDecodeError) around dockerfile_path.read_text and in the except block
call a logger (or processLogger) to record a clear message including
dockerfile_path and the exception (or re-raise the exception after logging),
then return the parsed lines as before; reference the parse_dockerfile function
and the dockerfile_path variable when making this change.
🧹 Nitpick comments (3)
README.md (1)

163-199: Add language specifier to fenced code block.

The example output code block is missing a language identifier. Per markdown best practices, specify a language (e.g., text or plaintext) for consistency with other code blocks in this file.

📝 Proposed fix
 **Example output:**
-```
+```text
 SlimStack Dockerfile Analysis
slim/core/docker_images.py (1)

256-259: Remove redundant no-op assignment.

The conditional if version == "latest": version = "latest" has no effect. This appears to be leftover from development.

🧹 Proposed fix
     # Extract version from tag (e.g., "3.12-bookworm" -> "3.12")
     version = tag.split("-")[0] if "-" in tag else tag
-    if version == "latest":
-        version = "latest"
slim/scanners/docker_scanner.py (1)

15-15: Remove unused import.

format_size is imported but never used in this module.

🧹 Proposed fix
-from slim.core.utils import format_size

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@slim/core/docker_images.py`:
- Around line 248-271: The version extraction in get_alternatives produces
"latest" for images with no explicit version which yields invalid alternatives
like "python:latest-slim"; update get_alternatives to detect when tag is
"latest" or empty (use get_image_tag result) and skip version templating in that
case: when tag == "latest" or tag == ""/None, format alternative images without
inserting the version (e.g., pass an empty version or remove the "{version}-"
segment) and ensure you strip any leftover separators so IMAGE_ALTERNATIVES
entries produce valid image names; adjust the loop that builds formatted_image
(and keep returning ImageAlternative with reason/description/size_estimate) so
get_alternatives, IMAGE_ALTERNATIVES, get_base_image_name, get_image_tag, and
ImageAlternative are referenced and used accordingly.
- Around line 192-201: The has_risky_tag() function currently checks tags by
exact matching against the RISKY_TAGS set, missing tags with risky substrings
like "3.12-rc1". Update has_risky_tag() to iterate over RISKY_TAGS and check if
any risky substring exists within the provided tag (substring matching) rather
than exact equality, so tags containing risky substrings are properly flagged.
🧹 Nitpick comments (2)
slim/scanners/docker_scanner.py (2)

15-15: Unused import.

format_size is imported but never used in this file.

🧹 Remove unused import
-from slim.core.utils import format_size

129-145: Cleanup patterns may produce false positives for multi-line RUN commands.

The negative lookahead patterns (apt_no_cleanup, npm_no_cleanup, pip_no_cache) check within a single line. Multi-line RUN commands using backslash continuation are common and may have cleanup commands on subsequent lines:

RUN apt-get install -y foo \
    && rm -rf /var/lib/apt/lists/*

This is an inherent limitation of line-by-line analysis. Since these are info severity, the impact is acceptable, but consider adding a note in the suggestion that users should verify multi-line RUN commands.

Comment on lines 192 to 201
# Tags that indicate security risk
RISKY_TAGS = {
"latest",
"dev",
"development",
"beta",
"rc",
"nightly",
"unstable",
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

RISKY_TAGS check may miss compound version tags.

has_risky_tag() uses exact matching against this set, so tags like 3.12-rc1, 18-beta, or 2.0-nightly won't be flagged as risky because the full tag string isn't in the set.

🔧 Proposed fix to use substring matching in has_risky_tag()
 def has_risky_tag(image: str) -> bool:
     """Check if an image uses a risky tag."""
     tag = get_image_tag(image).lower()
-    return tag in RISKY_TAGS
+    # Check for exact match or as substring (e.g., "3.12-rc1" contains "rc")
+    if tag in RISKY_TAGS:
+        return True
+    return any(risky in tag for risky in RISKY_TAGS)
🤖 Prompt for AI Agents
In `@slim/core/docker_images.py` around lines 192 - 201, The has_risky_tag()
function currently checks tags by exact matching against the RISKY_TAGS set,
missing tags with risky substrings like "3.12-rc1". Update has_risky_tag() to
iterate over RISKY_TAGS and check if any risky substring exists within the
provided tag (substring matching) rather than exact equality, so tags containing
risky substrings are properly flagged.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@slim/core/docker_images.py`:
- Around line 204-223: The tag parsing currently splits on ":" globally causing
registry:port and digest refs to be mis-parsed; update get_image_tag to first
strip any digest by splitting on "@" and taking the left side, then take only
the final path segment after the last "/" and parse a tag from that segment by
splitting on the last ":" (returning the part after the last ":" or "latest" if
none). Also apply the same digest stripping to get_base_image_name (remove
anything after "@") before removing registry and tag so base names are computed
correctly; reference get_image_tag and get_base_image_name to locate and change
the logic.
🧹 Nitpick comments (1)
slim/core/docker_images.py (1)

279-280: Remove unreachable duplicate return.

There are two consecutive return result statements; the second is dead code.

♻️ Proposed fix
-    return result
-    return result
+    return result

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@slim/core/docker_images.py`:
- Around line 279-280: The function in slim/core/docker_images.py contains a
duplicated final statement "return result" (two consecutive returns); remove the
second, unreachable "return result" so the function ends with a single return,
ensuring no change to logic in the surrounding function (look for the duplicate
"return result" occurrences to locate and delete the trailing one).
🧹 Nitpick comments (2)
slim/core/docker_images.py (2)

216-222: Add docstring for consistency.

The function is missing a docstring. For consistency with the other functions in this module, add a brief docstring describing the purpose and behavior.

📝 Suggested docstring
 def get_image_tag(image: str) -> str:
+    """Extract the tag from an image string."""
     # Drop digest if present
     image = image.split("@", 1)[0]

259-261: Simplify redundant assignment.

The code assigns version = "latest" when version is already "latest". This self-assignment is redundant and can be removed or clarified with a comment.

♻️ Proposed simplification
-    # Skip version templating for 'latest' - use a placeholder or skip templated alternatives
-    if version == "latest":
-        version = "latest"  # Hardcoded alternatives will still work
+    # Note: When version is 'latest', templated alternatives will be skipped below
+    # and only hardcoded alternatives (without {version}) will be returned

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@slim/core/docker_images.py`:
- Around line 256-280: In get_alternatives, the computed formatted string is not
used so callers receive template strings; update the function to append the
resolved alternative (use the formatted value) rather than the original
template: when building results inside the for loop, replace adding alt (or
alt.image) with a constructed ImageAlternative that contains the formatted image
string (use formatted for alt.image) so the returned list contains concrete
image names; keep the existing checks (skip when formatted == image and when
"{version}" present but version is empty) and preserve other alt metadata when
constructing the returned ImageAlternative.
- Around line 219-223: get_image_registry currently only treats the first path
segment as a registry when it contains a dot, so images like
"localhost:5000/repo" are misclassified; update get_image_registry to treat the
first segment as a registry if it contains a dot OR a colon OR equals
"localhost" (i.e., check parts[0] for "." or ":" or exact "localhost") so
private registries with ports or localhost hosts are returned correctly; ensure
callers such as is_hardened_image use the updated get_image_registry behavior.

Comment on lines +219 to +223
def get_image_registry(image: str) -> Optional[str]:
parts = image.split("/")
if len(parts) > 1 and "." in parts[0]:
return parts[0]
return None
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Detect registries with ports / localhost.

get_image_registry() returns None for images like localhost:5000/repo because it only checks for "." in the first segment. That drops legitimate registries and makes is_hardened_image() unreliable for private registries.

🔧 Suggested fix
 def get_image_registry(image: str) -> Optional[str]:
-    parts = image.split("/")
-    if len(parts) > 1 and "." in parts[0]:
+    image = _strip_digest(image)
+    parts = image.split("/")
+    if len(parts) > 1 and (
+        "." in parts[0] or ":" in parts[0] or parts[0] == "localhost"
+    ):
         return parts[0]
     return None
🤖 Prompt for AI Agents
In `@slim/core/docker_images.py` around lines 219 - 223, get_image_registry
currently only treats the first path segment as a registry when it contains a
dot, so images like "localhost:5000/repo" are misclassified; update
get_image_registry to treat the first segment as a registry if it contains a dot
OR a colon OR equals "localhost" (i.e., check parts[0] for "." or ":" or exact
"localhost") so private registries with ports or localhost hosts are returned
correctly; ensure callers such as is_hardened_image use the updated
get_image_registry behavior.

Comment on lines +256 to +280
def get_alternatives(image: str) -> list[ImageAlternative]:
base_name = get_base_image_name(image)
tag = get_image_tag(image).lower()
version = _extract_version(tag)

alternatives = IMAGE_ALTERNATIVES.get(base_name, [])
results: list[ImageAlternative] = []

for alt in alternatives:
if "{version}" in alt.image and not version:
continue

formatted = (
alt.image.format(name=base_name, version=version)
if "{version}" in alt.image
else alt.image
)

# Do not recommend the same image
if formatted == image:
continue

results.append(alt)

return results
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return formatted alternatives, not templates.

formatted is computed but never used in the returned list; callers will get template strings like {name}:{version}-slim. This breaks the recommendation output.

🔧 Suggested fix
-        results.append(alt)
+        results.append(
+            ImageAlternative(
+                image=formatted,
+                reason=alt.reason,
+                description=alt.description,
+                size_estimate=alt.size_estimate,
+            )
+        )
🤖 Prompt for AI Agents
In `@slim/core/docker_images.py` around lines 256 - 280, In get_alternatives, the
computed formatted string is not used so callers receive template strings;
update the function to append the resolved alternative (use the formatted value)
rather than the original template: when building results inside the for loop,
replace adding alt (or alt.image) with a constructed ImageAlternative that
contains the formatted image string (use formatted for alt.image) so the
returned list contains concrete image names; keep the existing checks (skip when
formatted == image and when "{version}" present but version is empty) and
preserve other alt metadata when constructing the returned ImageAlternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants