feat: add Dockerfile security and optimization analysis #1

arceuzvx · 2026-01-27T17:59:31Z

Summary by CodeRabbit

New Features
- Dockerfile analysis: security, size, and best-practice checks with severity filtering and security-only mode.
- Image recommendations (hardened/smaller/optimized), multi-stage detection, non-root and HEALTHCHECK checks, and JSON output for CI/CD.
- New CLI commands for Docker analysis with path, JSON, severity, and security-only options.
Documentation
- Added Docker Commands section to README and manual; added built-in manual entry (slim man).

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-27T17:59:53Z

📝 Walkthrough

Walkthrough

Adds a Dockerfile analysis feature: new CLI docker subcommand, a Dockerfile scanner producing structured issues and recommendations, a core image-alternatives module, and corresponding README, man page, and changelog updates for JSON output, severity/security filters, and hardened-image suggestions. (33 words)

Changes

Cohort / File(s)	Summary
Documentation & Changelog `CHANGELOG.md`, `README.md`, `man/slim.1`	Adds Docker Commands / Built-in manual entries, README examples for `slim docker` (`--json`, `--severity`, `--security-only`), and changelog notes describing Docker analysis capabilities.
CLI Integration `slim/cli.py`	Adds `docker` subcommand and handler `cmd_docker_scan(args) -> int`; flags `--json`, `--path`, `--severity`, `--security-only`; formats human/JSON output and returns non-zero on critical issues.
Core Image Alternatives `slim/core/docker_images.py`	New module with `ImageAlternative` dataclass, `AlternativeReason` enum, `IMAGE_ALTERNATIVES` mapping, tag/registry heuristics (`RISKY_TAGS`, `HARDENED_REGISTRIES`, `OPTIMIZED_TAG_MARKERS`), parsing helpers, and `get_alternatives(image)` recommendation engine.
Dockerfile Scanner `slim/scanners/docker_scanner.py`	New scanner with models (`DockerfileIssue`, `ImageRecommendation`, `DockerScanResult`) and functions (`find_dockerfile`, `parse_dockerfile`, `analyze_dockerfile`, `scan_dockerfile`, `get_scan_result_dict`). Detects base images, multi-stage, USER, HEALTHCHECK, secrets, risky tags, ADD/COPY/pipes, package-manager patterns, and emits issues + recommendations.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI as CLI Parser
    participant Scanner as docker_scanner
    participant Images as docker_images
    participant FS as Filesystem

    User->>CLI: slim docker [--path PATH | --json | --severity ...]
    CLI->>Scanner: scan_dockerfile(project_path or dockerfile_path)
    Scanner->>FS: find_dockerfile(project_path)
    FS-->>Scanner: dockerfile_path
    Scanner->>FS: parse_dockerfile(dockerfile_path)
    FS-->>Scanner: lines (line_number, text)
    Scanner->>Scanner: analyze_dockerfile(lines)
    loop per FROM base image
        Scanner->>Images: get_alternatives(image)
        Images-->>Scanner: ImageAlternative[]
    end
    Scanner-->>CLI: DockerScanResult
    CLI->>CLI: apply filters (severity, security-only), format (human or JSON)
    CLI-->>User: output report (exit code reflects critical issues)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

Poem

🐇 I hop through lines of FROM and RUN,

I sniff for secrets hidden from the sun.
I nudge toward slim, away from "latest" blight,
With carrot-flavored tips, I set things right. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 55.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: add Dockerfile security and optimization analysis' accurately and concisely summarizes the main changes: introduction of a new Dockerfile analysis feature with security scanning and optimization recommendations.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@slim/scanners/docker_scanner.py`:
- Around line 174-183: The parse_dockerfile function currently swallows all
errors; update it to either log the exception with details or re-raise so
failures aren’t silent: catch specific exceptions (e.g., IOError/OSError and
UnicodeDecodeError) around dockerfile_path.read_text and in the except block
call a logger (or processLogger) to record a clear message including
dockerfile_path and the exception (or re-raise the exception after logging),
then return the parsed lines as before; reference the parse_dockerfile function
and the dockerfile_path variable when making this change.

🧹 Nitpick comments (3)

README.md (1)
163-199: Add language specifier to fenced code block.

The example output code block is missing a language identifier. Per markdown best practices, specify a language (e.g., text or plaintext) for consistency with other code blocks in this file.
📝 Proposed fix
 **Example output:**
-```
+```text
 SlimStack Dockerfile Analysis
slim/core/docker_images.py (1)
256-259: Remove redundant no-op assignment.

The conditional if version == "latest": version = "latest" has no effect. This appears to be leftover from development.
🧹 Proposed fix
     # Extract version from tag (e.g., "3.12-bookworm" -> "3.12")
     version = tag.split("-")[0] if "-" in tag else tag
-    if version == "latest":
-        version = "latest"
slim/scanners/docker_scanner.py (1)
15-15: Remove unused import.

format_size is imported but never used in this module.
🧹 Proposed fix
-from slim.core.utils import format_size

slim/scanners/docker_scanner.py

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@slim/core/docker_images.py`:
- Around line 248-271: The version extraction in get_alternatives produces
"latest" for images with no explicit version which yields invalid alternatives
like "python:latest-slim"; update get_alternatives to detect when tag is
"latest" or empty (use get_image_tag result) and skip version templating in that
case: when tag == "latest" or tag == ""/None, format alternative images without
inserting the version (e.g., pass an empty version or remove the "{version}-"
segment) and ensure you strip any leftover separators so IMAGE_ALTERNATIVES
entries produce valid image names; adjust the loop that builds formatted_image
(and keep returning ImageAlternative with reason/description/size_estimate) so
get_alternatives, IMAGE_ALTERNATIVES, get_base_image_name, get_image_tag, and
ImageAlternative are referenced and used accordingly.
- Around line 192-201: The has_risky_tag() function currently checks tags by
exact matching against the RISKY_TAGS set, missing tags with risky substrings
like "3.12-rc1". Update has_risky_tag() to iterate over RISKY_TAGS and check if
any risky substring exists within the provided tag (substring matching) rather
than exact equality, so tags containing risky substrings are properly flagged.

🧹 Nitpick comments (2)

slim/scanners/docker_scanner.py (2)
15-15: Unused import.

format_size is imported but never used in this file.
🧹 Remove unused import
-from slim.core.utils import format_size
129-145: Cleanup patterns may produce false positives for multi-line RUN commands.

The negative lookahead patterns (apt_no_cleanup, npm_no_cleanup, pip_no_cache) check within a single line. Multi-line RUN commands using backslash continuation are common and may have cleanup commands on subsequent lines:
RUN apt-get install -y foo \
    && rm -rf /var/lib/apt/lists/*
This is an inherent limitation of line-by-line analysis. Since these are info severity, the impact is acceptable, but consider adding a note in the suggestion that users should verify multi-line RUN commands.

coderabbitai · 2026-01-27T18:24:41Z

slim/core/docker_images.py

+# Tags that indicate security risk
+RISKY_TAGS = {
+    "latest",
+    "dev",
+    "development",
+    "beta",
+    "rc",
+    "nightly",
+    "unstable",
+}


⚠️ Potential issue | 🟡 Minor

RISKY_TAGS check may miss compound version tags.

has_risky_tag() uses exact matching against this set, so tags like 3.12-rc1, 18-beta, or 2.0-nightly won't be flagged as risky because the full tag string isn't in the set.

🔧 Proposed fix to use substring matching in has_risky_tag()

def has_risky_tag(image: str) -> bool: """Check if an image uses a risky tag.""" tag = get_image_tag(image).lower() - return tag in RISKY_TAGS + # Check for exact match or as substring (e.g., "3.12-rc1" contains "rc") + if tag in RISKY_TAGS: + return True + return any(risky in tag for risky in RISKY_TAGS)

🤖 Prompt for AI Agents

In `@slim/core/docker_images.py` around lines 192 - 201, The has_risky_tag() function currently checks tags by exact matching against the RISKY_TAGS set, missing tags with risky substrings like "3.12-rc1". Update has_risky_tag() to iterate over RISKY_TAGS and check if any risky substring exists within the provided tag (substring matching) rather than exact equality, so tags containing risky substrings are properly flagged.

slim/core/docker_images.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@slim/core/docker_images.py`:
- Around line 204-223: The tag parsing currently splits on ":" globally causing
registry:port and digest refs to be mis-parsed; update get_image_tag to first
strip any digest by splitting on "@" and taking the left side, then take only
the final path segment after the last "/" and parse a tag from that segment by
splitting on the last ":" (returning the part after the last ":" or "latest" if
none). Also apply the same digest stripping to get_base_image_name (remove
anything after "@") before removing registry and tag so base names are computed
correctly; reference get_image_tag and get_base_image_name to locate and change
the logic.

🧹 Nitpick comments (1)

slim/core/docker_images.py (1)
279-280: Remove unreachable duplicate return.

There are two consecutive return result statements; the second is dead code.
♻️ Proposed fix
-    return result
-    return result
+    return result

slim/core/docker_images.py

…arceuzvx/SlimStack into feature/dockerfile-optimization merge

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@slim/core/docker_images.py`:
- Around line 279-280: The function in slim/core/docker_images.py contains a
duplicated final statement "return result" (two consecutive returns); remove the
second, unreachable "return result" so the function ends with a single return,
ensuring no change to logic in the surrounding function (look for the duplicate
"return result" occurrences to locate and delete the trailing one).

🧹 Nitpick comments (2)

slim/core/docker_images.py (2)
216-222: Add docstring for consistency.

The function is missing a docstring. For consistency with the other functions in this module, add a brief docstring describing the purpose and behavior.
📝 Suggested docstring
 def get_image_tag(image: str) -> str:
+    """Extract the tag from an image string."""
     # Drop digest if present
     image = image.split("@", 1)[0]
259-261: Simplify redundant assignment.

The code assigns version = "latest" when version is already "latest". This self-assignment is redundant and can be removed or clarified with a comment.
♻️ Proposed simplification
-    # Skip version templating for 'latest' - use a placeholder or skip templated alternatives
-    if version == "latest":
-        version = "latest"  # Hardcoded alternatives will still work
+    # Note: When version is 'latest', templated alternatives will be skipped below
+    # and only hardcoded alternatives (without {version}) will be returned

slim/core/docker_images.py

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@slim/core/docker_images.py`:
- Around line 256-280: In get_alternatives, the computed formatted string is not
used so callers receive template strings; update the function to append the
resolved alternative (use the formatted value) rather than the original
template: when building results inside the for loop, replace adding alt (or
alt.image) with a constructed ImageAlternative that contains the formatted image
string (use formatted for alt.image) so the returned list contains concrete
image names; keep the existing checks (skip when formatted == image and when
"{version}" present but version is empty) and preserve other alt metadata when
constructing the returned ImageAlternative.
- Around line 219-223: get_image_registry currently only treats the first path
segment as a registry when it contains a dot, so images like
"localhost:5000/repo" are misclassified; update get_image_registry to treat the
first segment as a registry if it contains a dot OR a colon OR equals
"localhost" (i.e., check parts[0] for "." or ":" or exact "localhost") so
private registries with ports or localhost hosts are returned correctly; ensure
callers such as is_hardened_image use the updated get_image_registry behavior.

coderabbitai · 2026-01-27T19:07:15Z

slim/core/docker_images.py

+def get_image_registry(image: str) -> Optional[str]:
+    parts = image.split("/")
+    if len(parts) > 1 and "." in parts[0]:
+        return parts[0]
+    return None


⚠️ Potential issue | 🟡 Minor

Detect registries with ports / localhost.

get_image_registry() returns None for images like localhost:5000/repo because it only checks for "." in the first segment. That drops legitimate registries and makes is_hardened_image() unreliable for private registries.

🔧 Suggested fix

def get_image_registry(image: str) -> Optional[str]: - parts = image.split("/") - if len(parts) > 1 and "." in parts[0]: + image = _strip_digest(image) + parts = image.split("/") + if len(parts) > 1 and ( + "." in parts[0] or ":" in parts[0] or parts[0] == "localhost" + ): return parts[0] return None

🤖 Prompt for AI Agents

In `@slim/core/docker_images.py` around lines 219 - 223, get_image_registry currently only treats the first path segment as a registry when it contains a dot, so images like "localhost:5000/repo" are misclassified; update get_image_registry to treat the first segment as a registry if it contains a dot OR a colon OR equals "localhost" (i.e., check parts[0] for "." or ":" or exact "localhost") so private registries with ports or localhost hosts are returned correctly; ensure callers such as is_hardened_image use the updated get_image_registry behavior.

coderabbitai · 2026-01-27T19:07:15Z

slim/core/docker_images.py

+def get_alternatives(image: str) -> list[ImageAlternative]:
+    base_name = get_base_image_name(image)
+    tag = get_image_tag(image).lower()
+    version = _extract_version(tag)
+
+    alternatives = IMAGE_ALTERNATIVES.get(base_name, [])
+    results: list[ImageAlternative] = []
+
+    for alt in alternatives:
+        if "{version}" in alt.image and not version:
+            continue
+
+        formatted = (
+            alt.image.format(name=base_name, version=version)
+            if "{version}" in alt.image
+            else alt.image
+        )
+
+        # Do not recommend the same image
+        if formatted == image:
+            continue
+
+        results.append(alt)
+
+    return results


⚠️ Potential issue | 🟠 Major

Return formatted alternatives, not templates.

formatted is computed but never used in the returned list; callers will get template strings like {name}:{version}-slim. This breaks the recommendation output.

🔧 Suggested fix

- results.append(alt) + results.append( + ImageAlternative( + image=formatted, + reason=alt.reason, + description=alt.description, + size_estimate=alt.size_estimate, + ) + )

🤖 Prompt for AI Agents

In `@slim/core/docker_images.py` around lines 256 - 280, In get_alternatives, the computed formatted string is not used so callers receive template strings; update the function to append the resolved alternative (use the formatted value) rather than the original template: when building results inside the for loop, replace adding alt (or alt.image) with a constructed ImageAlternative that contains the formatted image string (use formatted for alt.image) so the returned list contains concrete image names; keep the existing checks (skip when formatted == image and when "{version}" present but version is empty) and preserve other alt metadata when constructing the returned ImageAlternative.

feat: add Dockerfile security and optimization analysis

5394185