-
Notifications
You must be signed in to change notification settings - Fork 0
feat: add Dockerfile security and optimization analysis #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughAdds a Dockerfile analysis feature: new CLI Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI as CLI Parser
participant Scanner as docker_scanner
participant Images as docker_images
participant FS as Filesystem
User->>CLI: slim docker [--path PATH | --json | --severity ...]
CLI->>Scanner: scan_dockerfile(project_path or dockerfile_path)
Scanner->>FS: find_dockerfile(project_path)
FS-->>Scanner: dockerfile_path
Scanner->>FS: parse_dockerfile(dockerfile_path)
FS-->>Scanner: lines (line_number, text)
Scanner->>Scanner: analyze_dockerfile(lines)
loop per FROM base image
Scanner->>Images: get_alternatives(image)
Images-->>Scanner: ImageAlternative[]
end
Scanner-->>CLI: DockerScanResult
CLI->>CLI: apply filters (severity, security-only), format (human or JSON)
CLI-->>User: output report (exit code reflects critical issues)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@slim/scanners/docker_scanner.py`:
- Around line 174-183: The parse_dockerfile function currently swallows all
errors; update it to either log the exception with details or re-raise so
failures aren’t silent: catch specific exceptions (e.g., IOError/OSError and
UnicodeDecodeError) around dockerfile_path.read_text and in the except block
call a logger (or processLogger) to record a clear message including
dockerfile_path and the exception (or re-raise the exception after logging),
then return the parsed lines as before; reference the parse_dockerfile function
and the dockerfile_path variable when making this change.
🧹 Nitpick comments (3)
README.md (1)
163-199: Add language specifier to fenced code block.The example output code block is missing a language identifier. Per markdown best practices, specify a language (e.g.,
textorplaintext) for consistency with other code blocks in this file.📝 Proposed fix
**Example output:** -``` +```text SlimStack Dockerfile Analysisslim/core/docker_images.py (1)
256-259: Remove redundant no-op assignment.The conditional
if version == "latest": version = "latest"has no effect. This appears to be leftover from development.🧹 Proposed fix
# Extract version from tag (e.g., "3.12-bookworm" -> "3.12") version = tag.split("-")[0] if "-" in tag else tag - if version == "latest": - version = "latest"slim/scanners/docker_scanner.py (1)
15-15: Remove unused import.
format_sizeis imported but never used in this module.🧹 Proposed fix
-from slim.core.utils import format_size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@slim/core/docker_images.py`:
- Around line 248-271: The version extraction in get_alternatives produces
"latest" for images with no explicit version which yields invalid alternatives
like "python:latest-slim"; update get_alternatives to detect when tag is
"latest" or empty (use get_image_tag result) and skip version templating in that
case: when tag == "latest" or tag == ""/None, format alternative images without
inserting the version (e.g., pass an empty version or remove the "{version}-"
segment) and ensure you strip any leftover separators so IMAGE_ALTERNATIVES
entries produce valid image names; adjust the loop that builds formatted_image
(and keep returning ImageAlternative with reason/description/size_estimate) so
get_alternatives, IMAGE_ALTERNATIVES, get_base_image_name, get_image_tag, and
ImageAlternative are referenced and used accordingly.
- Around line 192-201: The has_risky_tag() function currently checks tags by
exact matching against the RISKY_TAGS set, missing tags with risky substrings
like "3.12-rc1". Update has_risky_tag() to iterate over RISKY_TAGS and check if
any risky substring exists within the provided tag (substring matching) rather
than exact equality, so tags containing risky substrings are properly flagged.
🧹 Nitpick comments (2)
slim/scanners/docker_scanner.py (2)
15-15: Unused import.
format_sizeis imported but never used in this file.🧹 Remove unused import
-from slim.core.utils import format_size
129-145: Cleanup patterns may produce false positives for multi-line RUN commands.The negative lookahead patterns (
apt_no_cleanup,npm_no_cleanup,pip_no_cache) check within a single line. Multi-line RUN commands using backslash continuation are common and may have cleanup commands on subsequent lines:RUN apt-get install -y foo \ && rm -rf /var/lib/apt/lists/*This is an inherent limitation of line-by-line analysis. Since these are
infoseverity, the impact is acceptable, but consider adding a note in the suggestion that users should verify multi-line RUN commands.
slim/core/docker_images.py
Outdated
| # Tags that indicate security risk | ||
| RISKY_TAGS = { | ||
| "latest", | ||
| "dev", | ||
| "development", | ||
| "beta", | ||
| "rc", | ||
| "nightly", | ||
| "unstable", | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RISKY_TAGS check may miss compound version tags.
has_risky_tag() uses exact matching against this set, so tags like 3.12-rc1, 18-beta, or 2.0-nightly won't be flagged as risky because the full tag string isn't in the set.
🔧 Proposed fix to use substring matching in has_risky_tag()
def has_risky_tag(image: str) -> bool:
"""Check if an image uses a risky tag."""
tag = get_image_tag(image).lower()
- return tag in RISKY_TAGS
+ # Check for exact match or as substring (e.g., "3.12-rc1" contains "rc")
+ if tag in RISKY_TAGS:
+ return True
+ return any(risky in tag for risky in RISKY_TAGS)🤖 Prompt for AI Agents
In `@slim/core/docker_images.py` around lines 192 - 201, The has_risky_tag()
function currently checks tags by exact matching against the RISKY_TAGS set,
missing tags with risky substrings like "3.12-rc1". Update has_risky_tag() to
iterate over RISKY_TAGS and check if any risky substring exists within the
provided tag (substring matching) rather than exact equality, so tags containing
risky substrings are properly flagged.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@slim/core/docker_images.py`:
- Around line 204-223: The tag parsing currently splits on ":" globally causing
registry:port and digest refs to be mis-parsed; update get_image_tag to first
strip any digest by splitting on "@" and taking the left side, then take only
the final path segment after the last "/" and parse a tag from that segment by
splitting on the last ":" (returning the part after the last ":" or "latest" if
none). Also apply the same digest stripping to get_base_image_name (remove
anything after "@") before removing registry and tag so base names are computed
correctly; reference get_image_tag and get_base_image_name to locate and change
the logic.
🧹 Nitpick comments (1)
slim/core/docker_images.py (1)
279-280: Remove unreachable duplicate return.There are two consecutive
return resultstatements; the second is dead code.♻️ Proposed fix
- return result - return result + return result
…arceuzvx/SlimStack into feature/dockerfile-optimization merge
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@slim/core/docker_images.py`:
- Around line 279-280: The function in slim/core/docker_images.py contains a
duplicated final statement "return result" (two consecutive returns); remove the
second, unreachable "return result" so the function ends with a single return,
ensuring no change to logic in the surrounding function (look for the duplicate
"return result" occurrences to locate and delete the trailing one).
🧹 Nitpick comments (2)
slim/core/docker_images.py (2)
216-222: Add docstring for consistency.The function is missing a docstring. For consistency with the other functions in this module, add a brief docstring describing the purpose and behavior.
📝 Suggested docstring
def get_image_tag(image: str) -> str: + """Extract the tag from an image string.""" # Drop digest if present image = image.split("@", 1)[0]
259-261: Simplify redundant assignment.The code assigns
version = "latest"whenversionis already"latest". This self-assignment is redundant and can be removed or clarified with a comment.♻️ Proposed simplification
- # Skip version templating for 'latest' - use a placeholder or skip templated alternatives - if version == "latest": - version = "latest" # Hardcoded alternatives will still work + # Note: When version is 'latest', templated alternatives will be skipped below + # and only hardcoded alternatives (without {version}) will be returned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@slim/core/docker_images.py`:
- Around line 256-280: In get_alternatives, the computed formatted string is not
used so callers receive template strings; update the function to append the
resolved alternative (use the formatted value) rather than the original
template: when building results inside the for loop, replace adding alt (or
alt.image) with a constructed ImageAlternative that contains the formatted image
string (use formatted for alt.image) so the returned list contains concrete
image names; keep the existing checks (skip when formatted == image and when
"{version}" present but version is empty) and preserve other alt metadata when
constructing the returned ImageAlternative.
- Around line 219-223: get_image_registry currently only treats the first path
segment as a registry when it contains a dot, so images like
"localhost:5000/repo" are misclassified; update get_image_registry to treat the
first segment as a registry if it contains a dot OR a colon OR equals
"localhost" (i.e., check parts[0] for "." or ":" or exact "localhost") so
private registries with ports or localhost hosts are returned correctly; ensure
callers such as is_hardened_image use the updated get_image_registry behavior.
| def get_image_registry(image: str) -> Optional[str]: | ||
| parts = image.split("/") | ||
| if len(parts) > 1 and "." in parts[0]: | ||
| return parts[0] | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Detect registries with ports / localhost.
get_image_registry() returns None for images like localhost:5000/repo because it only checks for "." in the first segment. That drops legitimate registries and makes is_hardened_image() unreliable for private registries.
🔧 Suggested fix
def get_image_registry(image: str) -> Optional[str]:
- parts = image.split("/")
- if len(parts) > 1 and "." in parts[0]:
+ image = _strip_digest(image)
+ parts = image.split("/")
+ if len(parts) > 1 and (
+ "." in parts[0] or ":" in parts[0] or parts[0] == "localhost"
+ ):
return parts[0]
return None🤖 Prompt for AI Agents
In `@slim/core/docker_images.py` around lines 219 - 223, get_image_registry
currently only treats the first path segment as a registry when it contains a
dot, so images like "localhost:5000/repo" are misclassified; update
get_image_registry to treat the first segment as a registry if it contains a dot
OR a colon OR equals "localhost" (i.e., check parts[0] for "." or ":" or exact
"localhost") so private registries with ports or localhost hosts are returned
correctly; ensure callers such as is_hardened_image use the updated
get_image_registry behavior.
| def get_alternatives(image: str) -> list[ImageAlternative]: | ||
| base_name = get_base_image_name(image) | ||
| tag = get_image_tag(image).lower() | ||
| version = _extract_version(tag) | ||
|
|
||
| alternatives = IMAGE_ALTERNATIVES.get(base_name, []) | ||
| results: list[ImageAlternative] = [] | ||
|
|
||
| for alt in alternatives: | ||
| if "{version}" in alt.image and not version: | ||
| continue | ||
|
|
||
| formatted = ( | ||
| alt.image.format(name=base_name, version=version) | ||
| if "{version}" in alt.image | ||
| else alt.image | ||
| ) | ||
|
|
||
| # Do not recommend the same image | ||
| if formatted == image: | ||
| continue | ||
|
|
||
| results.append(alt) | ||
|
|
||
| return results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return formatted alternatives, not templates.
formatted is computed but never used in the returned list; callers will get template strings like {name}:{version}-slim. This breaks the recommendation output.
🔧 Suggested fix
- results.append(alt)
+ results.append(
+ ImageAlternative(
+ image=formatted,
+ reason=alt.reason,
+ description=alt.description,
+ size_estimate=alt.size_estimate,
+ )
+ )🤖 Prompt for AI Agents
In `@slim/core/docker_images.py` around lines 256 - 280, In get_alternatives, the
computed formatted string is not used so callers receive template strings;
update the function to append the resolved alternative (use the formatted value)
rather than the original template: when building results inside the for loop,
replace adding alt (or alt.image) with a constructed ImageAlternative that
contains the formatted image string (use formatted for alt.image) so the
returned list contains concrete image names; keep the existing checks (skip when
formatted == image and when "{version}" present but version is empty) and
preserve other alt metadata when constructing the returned ImageAlternative.
Summary by CodeRabbit
New Features
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.