Skip to content

Fix Jindo DataLoad without target for multi mount Datasets#5631

Open
Nakshatra480 wants to merge 2 commits intofluid-cloudnative:masterfrom
Nakshatra480:fix-jindo-dataload-multi-mount
Open

Fix Jindo DataLoad without target for multi mount Datasets#5631
Nakshatra480 wants to merge 2 commits intofluid-cloudnative:masterfrom
Nakshatra480:fix-jindo-dataload-multi-mount

Conversation

@Nakshatra480
Copy link

Closes #4439

Summary

This PR fixes Jindo DataLoad behavior when spec.target is not set but the target Dataset has multiple mounts.

  • If DataLoad.spec.target is specified, behavior is unchanged.
  • If DataLoad.spec.target is empty, the controller now derives targetPaths from Dataset.spec.mounts (one entry per mount path, replicas default to 1).
  • Adds a unit test to cover the no target, multi mount case for genDataLoadValue.

This addresses the issue where a JindoRuntime DataLoad could fail when no target was provided for a dataset with multiple mount points (see issue #4439).

Implementation details

  • Updated JindoEngine.genDataLoadValue to:

    • Keep existing logic for explicit spec.target.
    • Fallback to iterating targetDataset.Spec.Mounts when spec.target is empty, building cdataload.TargetPath entries with:
      • Path from the mount Path
      • Replicas set to 1
      • FluidNative decided via utils.IsTargetPathUnderFluidNativeMounts.
  • Extended Test_genDataLoadValue to include:

    • "dataset with multiple mounts and no explicit target":
      • Verifies that two mount paths (/mnt0, /mnt1) are translated into TargetPaths with correct Replicas and FluidNative values.
      • Confirms options such as loadMemorydata and hdfsConfig are still set as before.

Why this change

For JindoRuntime, users reasonably expect a DataLoad without spec.target to preload all dataset mounts. Previously, this scenario was not handled explicitly, leading to failures when the dataset had multiple mount points. By falling back to dataset mounts, we align the controller behavior with user expectations and the other runtimes’ patterns.

Testing

  • Unit tests:
    • go test ./pkg/ddc/jindo -run Test_genDataLoadValue$
  • All existing Test_genDataLoadValue cases continue to pass.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Nakshatra480, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the JindoRuntime's data loading capabilities by introducing a more robust handling mechanism for datasets with multiple mount points when no explicit target is specified in the DataLoad configuration. Previously, such configurations could lead to failures; now, the system intelligently infers the intended target paths from the dataset's mounts, aligning its behavior with user expectations and other runtime patterns. This change improves the reliability and user-friendliness of data loading for multi-mount Jindo datasets.

Highlights

  • Jindo DataLoad Fix: Resolved an issue where Jindo DataLoad operations would fail when spec.target was not explicitly set, but the target Dataset had multiple mount points.
  • Automatic Target Path Derivation: When DataLoad.spec.target is empty, the controller now automatically derives targetPaths from the Dataset.spec.mounts, creating one entry per mount path with replicas defaulting to 1.
  • Unit Test Coverage: A new unit test has been added to cover the specific scenario of a DataLoad without a target on a dataset with multiple mounts, ensuring correct behavior for genDataLoadValue.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes an issue where a Jindo DataLoad would fail if no target paths were specified for a dataset with multiple mounts. The logic now correctly falls back to using the dataset's mount points, with a default replica count of 1. The implementation is clear and the addition of a unit test for the multi-mount scenario is a great improvement. I've added one suggestion to further enhance test coverage by including an edge case.

Signed-off-by: Nakshatra Sharma <nakshatra.sharma3012@gmail.com>
@Nakshatra480 Nakshatra480 force-pushed the fix-jindo-dataload-multi-mount branch from 6f7a6bf to 406ea8d Compare January 29, 2026 09:48
Signed-off-by: Nakshatra Sharma <nakshatrasharma@Nakshatras-MacBook-Air.local>
@sonarqubecloud
Copy link

@Nakshatra480
Copy link
Author

@cheyang pls review these changes and give me the feedback if anything comes up thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] JindoRuntime Dataload failed without specifying the target field for multiple mountpoints in Dataset

1 participant