Skip to content

feat(sagemaker): Reconnection toolkit #8485

Merged
laileni-aws merged 26 commits intoaws:masterfrom
msgupta-amazon:reconnection-toolkit
Feb 23, 2026
Merged

feat(sagemaker): Reconnection toolkit #8485
laileni-aws merged 26 commits intoaws:masterfrom
msgupta-amazon:reconnection-toolkit

Conversation

@msgupta-amazon
Copy link
Contributor

@msgupta-amazon msgupta-amazon commented Jan 6, 2026

##Problem
Right now, we don't support reconnection from toolkit for hyperpod spaces.

##Solution
This PR will add the reconnection functionality which leads to seamless user experience.

The duplicate check is failing but we need that code at 2 places, one for initial connection and in detached-server so that when the user closes the main window, detached server is able to provide the seamless user experience.

##Notes
Tested the unit test cases and reconnection use cases end to end locally.


  • Treat all work as PUBLIC. Private feature/x branches will not be squash-merged at release time.
  • Your code changes must meet the guidelines in CONTRIBUTING.md.
  • License: I confirm that my contribution is made under the terms of the Apache 2.0 license.

@msgupta-amazon msgupta-amazon requested a review from a team as a code owner January 6, 2026 21:33
@github-actions
Copy link

github-actions bot commented Jan 6, 2026

  • This pull request implements a feat or fix, so it must include a changelog entry (unless the fix is for an unreleased feature). Review the changelog guidelines.
    • Note: beta or "experiment" features that have active users should announce fixes in the changelog.
    • If this is not a feature or fix, use an appropriate type from the title guidelines. For example, telemetry-only changes should use the telemetry type.

@msgupta-amazon msgupta-amazon changed the title feat(Sagemaker): Reconnection toolkit feat(sagemaker): Reconnection toolkit Jan 6, 2026
@msgupta-amazon
Copy link
Contributor Author

lint update

@msgupta-amazon msgupta-amazon reopened this Jan 6, 2026
@msgupta-amazon msgupta-amazon marked this pull request as draft January 9, 2026 18:03
@msgupta-amazon msgupta-amazon marked this pull request as ready for review January 9, 2026 22:51
@aws-ajangg
Copy link
Contributor

has this been tested locally? if so, can you update the description

appType
appType,
workspaceName,
undefined,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please reconsider refactoring prepareDevEnvConnection to accept an options object instead of many ordered arguments. Especially since it seems like some of them can be optional/undefined. When the list of arguments eventually grows, it can be hard for others to use it because they have to maintain the same order of arguments, a lot of which may not be relevant to them.

@laileni-aws laileni-aws closed this Feb 9, 2026
@laileni-aws laileni-aws reopened this Feb 9, 2026
@laileni-aws laileni-aws closed this Feb 9, 2026
@laileni-aws laileni-aws reopened this Feb 9, 2026
@amazon-inspector-ohio
Copy link

⏳ I'm reviewing this pull request for security vulnerabilities and code quality issues. I'll provide an update when I'm done

@amazon-inspector-ohio
Copy link

✅ I finished the code review, and didn't find any security or code quality issues.

mgupta-om and others added 2 commits February 11, 2026 15:27
- Parse API response JSON once instead of 3 times for better efficiency
- Add error handling for JSON parsing failures
- Extract complex connection key fallback logic to separate function
- Add URL encoding for query parameters to handle special characters
- Replace ChildProcess with native execFile in detached server to avoid vscode dependency
- Add SAGEMAKER_LOCAL_SERVER_FILE_PATH environment variable to hyperpod connection
- Store wsUrl and token in hyperpod mapping for reconnection support
return JSON.parse(content)
} catch (err: any) {
if (err.code === 'ENOENT') {
return {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we return empty here?

export class KubectlClient {
private kubeConfig: k8s.KubeConfig
private k8sApi: k8s.CustomObjectsApi
private eksCluster: any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assign datatype instead of any?

Copy link
Contributor

@laileni-aws laileni-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Need internal team approval
  • Need to add a changeLog

@laileni-aws
Copy link
Contributor

/retryBuilds

@NewtonDer
Copy link
Contributor

Approved

try {
// Attempt to fetch EKS cluster details using AWS CLI
const region = connectionInfo.region || parse(connectionInfo.clusterArn).region
const { stdout } = await execFileAsync('aws', [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using the CLI instead of a typescript SDK? This will fail if the user does not have cli installed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, is this the "Deeplink" use case? If so can you clarify the UX we decided on.

@@ -0,0 +1,4 @@
{
"type": "Feature",
"description": "We can connect to hyperpod spaces through toolkit but now user will be able to save their work in case of connection disruption as we now support reconnection."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the right wording here?
Kinda sounds like we are releasing connect to hyperpod spaces and that the user can manually save their work.

Using kiro:
"This update enhances HyperPod Space connectivity in the Toolkit by adding reconnection support. In the event of a connection disruption, users can now seamlessly reconnect to their existing session and resume work without losing in-progress changes."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed

@laileni-aws
Copy link
Contributor

/retryBuilds

@ashishrp-aws
Copy link
Contributor

/retryBuilds

@laileni-aws
Copy link
Contributor

/retryBuilds

@laileni-aws
Copy link
Contributor

/retryBuilds

@laileni-aws laileni-aws merged commit 747f18c into aws:master Feb 23, 2026
25 of 34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants