Skip to content

Path: strengthen Windows invalid-char handling, clarify constructors, document S3 edge cases #197

@ethe

Description

@ethe

Summary

  • The unified Path design is solid across local and remote backends, but a few areas need hardening and clearer guidance:
    • Windows invalid characters in path_to_local can still surface and break file operations.
    • Constructor semantics between parse and From<&str>/child are easy to misuse.
    • S3 keys with “weird” segments (//, ./..) are deliberately unsupported; this should be documented or optionally relaxed.

Current Behavior

  • Core implementation in fusio/src/path/mod.rs and fusio/src/path/parts.rs.
  • Delimiter is / everywhere; segments percent-encode reserved/unsafe chars.
  • Local:
    • Path::from_filesystem_pathfile:// URL → percent-decoded once → validated Path.
    • path_to_local(&Path) builds a file:///... URL and decodes to PathBuf. On Windows, only subsequent colons : are remapped to %3A (the first C: is preserved).
  • S3:
    • Object URL path uses strict encoding; list uses encoded prefix query parameter.
    • Path::parse rejects empty segments and ./...

Problems / Risks

  • Windows invalid characters:
    • path_to_local only special-cases :, but Windows forbids < > : " / \\ | ? * and has trailing dot/space rules. These can survive conversion and cause open/create failures.
  • Constructor semantics surprise:
    • From<&str>/FromIterator/child encode and normalize (empty segments are dropped), but parse only validates and rejects empty/./... The same input string can yield different outcomes depending on the constructor.
  • S3 key edge cases:
    • S3 allows keys with //, ./.., or literal %. Path::parse rejects some of these by design, which is safe but should be explicitly documented. Some ingestion workflows may need a “relaxed” mode.
  • Round-tripping:
    • Local → Path → local on Windows is only partially stable (colons handled), but other invalid characters may not map predictably.
  • Minor:
    • Error::Canonicalize variant is declared but not used.

Proposed Improvements

  • Windows hardening

    • Expand path_to_local (Windows) to percent-encode additional invalid characters per segment: < > : " / \\ | ? *.
    • Preserve the drive C: prefix, continue to encode subsequent colons in the rest of the path.
    • Keep trailing dot/space encoded so underlying FS operations succeed.
    • Add clear comments on why encoding is applied at this boundary to preserve Path semantics while keeping local FS operations valid.
  • Constructor clarity

    • Document and illustrate intended usage:
      • Use Path::from, from_iter, and child for constructing portable paths from application strings (handles encoding).
      • Use Path::parse for external/raw sources (URL paths, S3 keys) to avoid double-encoding; list the validation rules (no empty segments, ./.. forbidden).
    • Consider adding Path::from_os_str_lossy (non-wasm) for ergonomic local construction.
  • S3 edge cases

    • Document that Path::parse rejects keys with ./.. segments or //. Clarify this is a safety/portability choice.
    • Optionally add a clearly-named escape hatch (e.g., Path::parse_relaxed or Path::from_raw_key_unchecked) for ingestion scenarios that must handle arbitrary keys, with warnings about local portability.
  • API polish

    • Add try_to_local_path() returning a structured error on unsafe Windows cases; optionally to_local_path_lossy() to force-encode and always succeed.
    • Either wire in canonicalization or remove the unused Canonicalize error variant.
    • Optionally rename from_url_path to a name that signals one-time decode + validation, or document it clearly.
  • CI/tests

    • Add Windows unit tests for:
      • Invalid chars mapping in path_to_local and round-trip via from_filesystem_path.
      • Drive-letter paths (C:\\foo\\bar), and a UNC sample if supported.
    • Add tests clarifying constructor differences and S3 encoding behavior (including % in keys).
    • Option: add a minimal Windows CI job that runs the path module tests only.

Acceptance Criteria

  • Windows: path_to_local maps all invalid filename characters safely; creating/opening files with those characters (encoded) works.
  • Round-trip tests pass on Windows for typical and edge-case names.
  • Documentation:
    • Clear guidance on parse vs from/child, with examples.
    • S3 edge cases documented, with an optional relaxed constructor if implemented.
  • API:
    • try_to_local_path() (and/or to_local_path_lossy()) added with documented semantics, or decision documented.
    • Canonicalize variant either used or removed.
  • CI: Windows job executes path tests successfully.

Affected Code

  • fusio/src/path/mod.rs (path_to_local, docs for Path API)
  • fusio/src/path/parts.rs (docs for PathPart encoding)
  • S3: fusio/src/impls/remotes/aws/{fs.rs,s3.rs,credential.rs,multipart_upload.rs} (docs only)
  • Tests: fusio/src/path/mod.rs tests; new Windows-specific tests

Backwards Compatibility

  • Changes are mostly additive and documentation-driven.
  • Windows path_to_local behavior expands encoding; paths with invalid chars will now map to different on-disk names than before (more reliable). Note in changelog and consider a feature flag if needed.

Open Questions

  • Do we want a “relaxed” parse for S3 ingestion? If yes, how should it interact with local FS (likely not supported → documented)?
  • Should From<&str> keep silently dropping empty segments, or do we add a strict constructor to avoid surprises?

Checklist

  • Expand Windows invalid-char handling in path_to_local
  • Add Windows unit tests for invalid chars and round-trip
  • Clarify parse vs from/child in docs with examples
  • Document S3 key restrictions; optionally add “relaxed” constructor
  • Add try_to_local_path() or to_local_path_lossy() and document
  • Use or remove Error::Canonicalize
  • Add a minimal Windows CI job for path tests

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions