GitHub - cloudsmith-io/cloudsmith-sagemaker-demo: Cloudsmith <-> Sagemaker Demo

Cloudsmith + SageMaker (Quick Start)

Welcome to this quickstart guide on integrating Cloudsmith artifact management with your Sagemaker workflows.

1. Prerequisites

A Cloudsmith repository with:

2 Python upstreams pointing to:
- https://pypi.org
- https://download.pytorch.org/whl
distilgpt2 model

Install: awscli v2, docker, python3.

Copy the .env-example to .env and fill in:

CLOUDSMITH_NAMESPACE=
CLOUDSMITH_REPO=your-
CLOUDSMITH_USERNAME=
CLOUDSMITH_API_KEY=
AWS_REGION=us-east-1

You must be logged in/authenticated with AWS CLI in your current session, as well as Cloudsmith's Docker repository you have created.

2. Bootstrap AWS (idempotent)

Creates: secret, S3 bucket, IAM role, VPC + registry auth Lambda, auto-updates .env with bootstrap output:

bash infrastructure/bootstrap.sh

3. Build & Push Images (Cloudsmith-only dependencies)

DOCKER_BUILDKIT=1
bash scripts/build_and_push_training.sh
bash scripts/build_and_push_inference.sh

4. Launch Training Job

python scripts/launch_training_job.py --epochs 1 --instance-type ml.m5.large

Training downloads base model from Cloudsmith HF endpoint, fine‑tunes, then uploads a new revision and records CLOUDSMITH_HF_FINETUNED_REVISION.

5. Deploy Inference Endpoint

python scripts/deploy_inference_endpoint.py \
  --endpoint-name cloudsmith-inference-endpoint \
  --startup-timeout 180

6. Invoke

aws sagemaker-runtime invoke-endpoint \
  --endpoint-name cloudsmith-inference-endpoint \
  --region $AWS_REGION \
  --content-type application/json \
  --cli-binary-format raw-in-base64-out \
  --body '{"inputs":"Hi"}' /dev/stdout

Cleanup (Teardown)

Use infrastructure/cleanup.sh to remove demo resources (endpoint, models/configs, IAM roles, Lambda, VPC + sub-resources, S3 bucket, secret).

Please note that some VPC resources may need to be removed manually

Dry run:

DRY_RUN=true SKIP_CONFIRM=true bash infrastructure/cleanup.sh

Cleanup:

bash infrastructure/cleanup.sh

Implement with existing workflows:

Use Cloudsmith’s Hugging Face-compatible endpoint (upload + snapshot download only).

Upload (after training)

from huggingface_hub import HfApi
import os

ns = os.environ['CLOUDSMITH_NAMESPACE']
repo = os.environ['CLOUDSMITH_REPO']
token = os.environ['CLOUDSMITH_API_KEY']  # or fetched from Secrets Manager
endpoint = f"https://huggingface.cloudsmith.io/{ns}/{repo}"

api = HfApi(token=token, endpoint=endpoint)
api.upload_folder(
    folder_path="/opt/ml/model",   # directory with model files
    repo_id="distilbert-base-uncased-finetuned",
    repo_type="model",
    token=token,
)

Download (in inference container / any client)

from huggingface_hub import HfApi
import os

ns = os.environ['CLOUDSMITH_NAMESPACE']
repo = os.environ['CLOUDSMITH_REPO']
token = os.environ['CLOUDSMITH_API_KEY']
endpoint = f"https://huggingface.cloudsmith.io/{ns}/{repo}"
api = HfApi(token=token, endpoint=endpoint)

local_dir = "/opt/ml/model"
api.snapshot_download(
    repo_id="distilbert-base-uncased-finetuned",  # or base model name
    repo_type="model",
    revision="main",  # or a specific uploaded revision hash/tag
    local_dir=local_dir,
    token=token,
)

Notes

VPC

SageMaker must reach the private Cloudsmith registry (container + Hugging Face endpoints) over the public internet. When you use a private image, you launch the training job / endpoint inside a VPC so the underlying instance has:

Private subnets (we create two) where the job runs.
An egress path (NAT) so those subnets can reach docker.cloudsmith.io and huggingface.cloudsmith.io to pull the image and model files.
A security group that allows outbound HTTPS (inbound not needed for pulls).

If egress is missing (no NAT / route), image pull or model download will fail.

ECR (For inference)

AWS does not currently authenticate to private third‑party registries during SageMaker CreateModel for real‑time inference; only ECR (or public/no‑auth images) are supported.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
dev/Studio		dev/Studio
docker		docker
infrastructure		infrastructure
scripts		scripts
src/train		src/train
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cloudsmith + SageMaker (Quick Start)

1. Prerequisites

2. Bootstrap AWS (idempotent)

3. Build & Push Images (Cloudsmith-only dependencies)

4. Launch Training Job

5. Deploy Inference Endpoint

6. Invoke

Cleanup (Teardown)

Implement with existing workflows:

Upload (after training)

Download (in inference container / any client)

Notes

VPC

ECR (For inference)

About

Uh oh!

Uh oh!

Languages

cloudsmith-io/cloudsmith-sagemaker-demo

Folders and files

Latest commit

History

Repository files navigation

Cloudsmith + SageMaker (Quick Start)

1. Prerequisites

2. Bootstrap AWS (idempotent)

3. Build & Push Images (Cloudsmith-only dependencies)

4. Launch Training Job

5. Deploy Inference Endpoint

6. Invoke

Cleanup (Teardown)

Implement with existing workflows:

Upload (after training)

Download (in inference container / any client)

Notes

VPC

ECR (For inference)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages