Litellm -> bedrock setup #1638

axyz · 2026-02-25T18:54:47Z

axyz
Feb 25, 2026

I'm currently using pi via a litellm proxy that connect to various bedrock models.

with lots of help from LLMs I was able to get a semi-working setup, and would like to discuss some points:

extension code for custom provider:

import type * as pi from "pi";
import {
  type Api,
  type AssistantMessageEventStream,
  type Context,
  createAssistantMessageEventStream,
  type Model,
  type SimpleStreamOptions,
  streamSimpleOpenAIResponses,
} from "@mariozechner/pi-ai";

export default function(pi: typeof import("pi")) {
  const MY_PROVIDER_BASE_URL = "https://my-provider.url";
  const MY_PROVIDER_API_KEY_CMD = "my-token-refresher-cmd";

  // Comprehensive compat flags to prevent myProvider/OpenAI validation errors
  const myProviderCompat = {
    supportsStore: false,
    requiresToolResultName: false,  // Don't add 'name' to tool results (causes empty string errors)
    requiresAssistantAfterToolResult: false,
    requiresMistralToolIds: false
  };

  function getMyProviderApiKey(): string {
    const { execSync } = require("child_process");
    return execSync(MY_PROVIDER_API_KEY_CMD, { encoding: "utf-8" }).trim();
  }

  /**
   * Post-process the OpenAI Responses API payload to fix Bedrock compatibility.
   *
   * When a tool result contains images, pi-ai inserts a follow-up `role: "user"`
   * message with the image content immediately after the `function_call_output`.
   * LiteLLM's Bedrock proxy doesn't understand this pattern — Bedrock requires that
   * every `tool_use` block is immediately followed by a `tool_result` block, so any
   * extra `user` message injected in between causes:
   *   "tool_use ids were found without tool_result blocks immediately after"
   *
   * Fix: move the image content from the follow-up user message into the
   * `function_call_output`'s `output` field as a list (the OpenAI Responses API
   * supports `output: string | ResponseFunctionCallOutputItemList`). This way the
   * image reaches the model while keeping tool results contiguous, satisfying
   * Bedrock's strict ordering requirement.
   */
  function fixBedrockPayload(params: any): void {
    if (!Array.isArray(params.input)) return;
    const input: any[] = params.input;
    let i = 0;
    while (i < input.length) {
      const item = input[i];
      if (
        item?.type === "function_call_output" &&
        i + 1 < input.length
      ) {
        const next = input[i + 1];
        // Check if the next item is the synthetic image user message
        if (
          next?.role === "user" &&
          Array.isArray(next.content) &&
          next.content.length > 0 &&
          next.content[0]?.type === "input_text" &&
          next.content[0]?.text === "Attached image(s) from tool result:"
        ) {
          // Move image content into the function_call_output itself.
          // Build a content list: start with the existing text output, then append images.
          const outputList: any[] = [];
          if (typeof item.output === "string" && item.output.length > 0) {
            outputList.push({ type: "input_text", text: item.output });
          }
          for (const part of next.content) {
            if (part.type === "input_image") {
              outputList.push(part);
            }
          }
          item.output = outputList;
          // Remove the now-absorbed follow-up user message
          input.splice(i + 1, 1);
          // Don't advance i; re-check i+1 in case there are consecutive tool results
          continue;
        }
      }
      i++;
    }
  }

  function streamViaOpenAIResponses(model: any, context: any, options: any) {
    const stream = createAssistantMessageEventStream();

    (async () => {
      try {
        const apiKey = getMyProivderApiKey();

        const headers = model?.headers ? { ...model.headers } : undefined;
        if (headers) {
          delete (headers as Record<string, string>).Authorization;
          delete (headers as Record<string, string>).authorization;
        }

        const modelWithAuth = { ...model, baseUrl: MY_PROVIDER_BASE_URL, headers };
        // Disable caching for myProvider to prevent prompt_cache_key being sent to Bedrock
        // LiteLLM's Bedrock proxy doesn't support OpenAI's prompt_cache_key parameter
        const originalOnPayload = options?.onPayload;
        const streamOptions = {
          ...options,
          apiKey,
          cacheRetention: "none" as const,
          onPayload: (params: any) => {
            fixBedrockPayload(params);
            originalOnPayload?.(params);
          },
        };

        const innerStream = streamSimpleOpenAIResponses(
          modelWithAuth as Model<"openai-responses">,
          context,
          streamOptions
        );

        for await (const event of innerStream) {
          stream.push(event);
        }
        stream.end();
      } catch (error) {
        stream.push({
          type: "error",
          reason: "error",
          error: {
            role: "assistant",
            content: [],
            api: model.api,
            provider: model.provider,
            model: model.id,
            usage: {
              input: 0,
              output: 0,
              cacheRead: 0,
              cacheWrite: 0,
              totalTokens: 0,
              cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0, total: 0 },
            },
            stopReason: "error",
            errorMessage: error instanceof Error ? error.message : String(error),
            timestamp: Date.now(),
          },
        });
        stream.end();
      }
    })();

    return stream;
  }

  // LiteLLM OpenAI Chat Completions compatible provider
  pi.registerProvider("myProvider", {
    baseUrl: MY_PROVIDER_BASE_URL,
    apiKey: `!${MY_PROVIDER_API_KEY_CMD}`,
    authHeader: true,
    api: "openai-responses",
    streamSimple: streamViaOpenAIResponses,

    models: [
      {
        id: "bedrock/anthropic.claude-sonnet-4-6",
        name: "Claude Sonnet 4.6",
        reasoning: true,
        input: ["text", "image"],
        compat: myProviderCompat,
        contextWindow: 200000,
        maxTokens: 64000,
        cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }
      },
      {
        id: "bedrock/deepseek-r1",
        name: "DeepSeek R1",
        reasoning: true,
        input: ["text"],
        compat: {
          supportsStore: false,
          supportsFunctionCalling: false,  // DeepSeek R1 doesn't support tools
          supportsToolChoice: false,
          supportsVision: false
        },
        contextWindow: 128000,
        maxTokens: 32768,
        cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }
      },
      {
        id: "openai/gpt-5.2",
        name: "GPT-5.2",
        reasoning: false,
        input: ["text", "image"],
        compat: myProviderCompat,
        contextWindow: 272000,
        maxTokens: 128000,
        cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 }
      },
    ]
  });
}

my thoughts and concerns in random order:

for such use case isn't there anything more "batteries included"? Is it expected and correct to extend StreamSimple this way?
apiKey with ! commands seem to only be running when the session start and then cached forever. Is there a plan to eventually support a rotation with an expiration in minutes or similar? Is it a proper solution to inject the headers in the stream for each request like above (assuming the command used to refresh token already handling caching internally...)?
myProviderCompat looks peculiar, but maybe is expected for litellm or specific models, I assume would make sense to have a broad preset and guidelines for specific exceptions (e.g. see deepseek not supporting tools), but in general is this a good approach to have custom compats like that?
that manual error handling looks dodgy, is there a default handler for that? (or should it be there?)
the most horrible piece of code is that logic for image inputs (when an image read tool was triggered used to get 400 errors). To me looks like a hack, likely very error prone (even though so far seems to work on simple tool usages)
I also noticed issues with thinking levels, where sometime enabling thinking shows the thinking stream but never the final response

Not expecting a thorough code review :P but I wanted to share this for a general direction, and especially because this is probably a relatively commen setup in enterprise settings and maybe other people encountered the same challenges and found better solutions. There may be some opportunities to extract some reusable logic, but I'm not 100% sure what, and how "custom" diferent setups can get.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Litellm -> bedrock setup #1638

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Litellm -> bedrock setup #1638

Uh oh!

axyz Feb 25, 2026

Replies: 0 comments

axyz
Feb 25, 2026