You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently using pi via a litellm proxy that connect to various bedrock models.
with lots of help from LLMs I was able to get a semi-working setup, and would like to discuss some points:
extension code for custom provider:
importtype*aspifrom"pi";import{typeApi,typeAssistantMessageEventStream,typeContext,createAssistantMessageEventStream,typeModel,typeSimpleStreamOptions,streamSimpleOpenAIResponses,}from"@mariozechner/pi-ai";exportdefaultfunction(pi: typeofimport("pi")){constMY_PROVIDER_BASE_URL="https://my-provider.url";constMY_PROVIDER_API_KEY_CMD="my-token-refresher-cmd";// Comprehensive compat flags to prevent myProvider/OpenAI validation errorsconstmyProviderCompat={supportsStore: false,requiresToolResultName: false,// Don't add 'name' to tool results (causes empty string errors)requiresAssistantAfterToolResult: false,requiresMistralToolIds: false};functiongetMyProviderApiKey(): string{const{ execSync }=require("child_process");returnexecSync(MY_PROVIDER_API_KEY_CMD,{encoding: "utf-8"}).trim();}/** * Post-process the OpenAI Responses API payload to fix Bedrock compatibility. * * When a tool result contains images, pi-ai inserts a follow-up `role: "user"` * message with the image content immediately after the `function_call_output`. * LiteLLM's Bedrock proxy doesn't understand this pattern — Bedrock requires that * every `tool_use` block is immediately followed by a `tool_result` block, so any * extra `user` message injected in between causes: * "tool_use ids were found without tool_result blocks immediately after" * * Fix: move the image content from the follow-up user message into the * `function_call_output`'s `output` field as a list (the OpenAI Responses API * supports `output: string | ResponseFunctionCallOutputItemList`). This way the * image reaches the model while keeping tool results contiguous, satisfying * Bedrock's strict ordering requirement. */functionfixBedrockPayload(params: any): void{if(!Array.isArray(params.input))return;constinput: any[]=params.input;leti=0;while(i<input.length){constitem=input[i];if(item?.type==="function_call_output"&&i+1<input.length){constnext=input[i+1];// Check if the next item is the synthetic image user messageif(next?.role==="user"&&Array.isArray(next.content)&&next.content.length>0&&next.content[0]?.type==="input_text"&&next.content[0]?.text==="Attached image(s) from tool result:"){// Move image content into the function_call_output itself.// Build a content list: start with the existing text output, then append images.constoutputList: any[]=[];if(typeofitem.output==="string"&&item.output.length>0){outputList.push({type: "input_text",text: item.output});}for(constpartofnext.content){if(part.type==="input_image"){outputList.push(part);}}item.output=outputList;// Remove the now-absorbed follow-up user messageinput.splice(i+1,1);// Don't advance i; re-check i+1 in case there are consecutive tool resultscontinue;}}i++;}}functionstreamViaOpenAIResponses(model: any,context: any,options: any){conststream=createAssistantMessageEventStream();(async()=>{try{constapiKey=getMyProivderApiKey();constheaders=model?.headers ? { ...model.headers} : undefined;if(headers){delete(headersasRecord<string,string>).Authorization;delete(headersasRecord<string,string>).authorization;}constmodelWithAuth={ ...model,baseUrl: MY_PROVIDER_BASE_URL, headers };// Disable caching for myProvider to prevent prompt_cache_key being sent to Bedrock// LiteLLM's Bedrock proxy doesn't support OpenAI's prompt_cache_key parameterconstoriginalOnPayload=options?.onPayload;conststreamOptions={
...options,
apiKey,cacheRetention: "none"asconst,onPayload: (params: any)=>{fixBedrockPayload(params);originalOnPayload?.(params);},};constinnerStream=streamSimpleOpenAIResponses(modelWithAuthasModel<"openai-responses">,context,streamOptions);forawait(consteventofinnerStream){stream.push(event);}stream.end();}catch(error){stream.push({type: "error",reason: "error",error: {role: "assistant",content: [],api: model.api,provider: model.provider,model: model.id,usage: {input: 0,output: 0,cacheRead: 0,cacheWrite: 0,totalTokens: 0,cost: {input: 0,output: 0,cacheRead: 0,cacheWrite: 0,total: 0},},stopReason: "error",errorMessage: errorinstanceofError ? error.message : String(error),timestamp: Date.now(),},});stream.end();}})();returnstream;}// LiteLLM OpenAI Chat Completions compatible providerpi.registerProvider("myProvider",{baseUrl: MY_PROVIDER_BASE_URL,apiKey: `!${MY_PROVIDER_API_KEY_CMD}`,authHeader: true,api: "openai-responses",streamSimple: streamViaOpenAIResponses,models: [{id: "bedrock/anthropic.claude-sonnet-4-6",name: "Claude Sonnet 4.6",reasoning: true,input: ["text","image"],compat: myProviderCompat,contextWindow: 200000,maxTokens: 64000,cost: {input: 0,output: 0,cacheRead: 0,cacheWrite: 0}},{id: "bedrock/deepseek-r1",name: "DeepSeek R1",reasoning: true,input: ["text"],compat: {supportsStore: false,supportsFunctionCalling: false,// DeepSeek R1 doesn't support toolssupportsToolChoice: false,supportsVision: false},contextWindow: 128000,maxTokens: 32768,cost: {input: 0,output: 0,cacheRead: 0,cacheWrite: 0}},{id: "openai/gpt-5.2",name: "GPT-5.2",reasoning: false,input: ["text","image"],compat: myProviderCompat,contextWindow: 272000,maxTokens: 128000,cost: {input: 0,output: 0,cacheRead: 0,cacheWrite: 0}},]});}
my thoughts and concerns in random order:
for such use case isn't there anything more "batteries included"? Is it expected and correct to extend StreamSimple this way?
apiKey with ! commands seem to only be running when the session start and then cached forever. Is there a plan to eventually support a rotation with an expiration in minutes or similar? Is it a proper solution to inject the headers in the stream for each request like above (assuming the command used to refresh token already handling caching internally...)?
myProviderCompat looks peculiar, but maybe is expected for litellm or specific models, I assume would make sense to have a broad preset and guidelines for specific exceptions (e.g. see deepseek not supporting tools), but in general is this a good approach to have custom compats like that?
that manual error handling looks dodgy, is there a default handler for that? (or should it be there?)
the most horrible piece of code is that logic for image inputs (when an image read tool was triggered used to get 400 errors). To me looks like a hack, likely very error prone (even though so far seems to work on simple tool usages)
I also noticed issues with thinking levels, where sometime enabling thinking shows the thinking stream but never the final response
Not expecting a thorough code review :P but I wanted to share this for a general direction, and especially because this is probably a relatively commen setup in enterprise settings and maybe other people encountered the same challenges and found better solutions. There may be some opportunities to extract some reusable logic, but I'm not 100% sure what, and how "custom" diferent setups can get.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I'm currently using pi via a litellm proxy that connect to various bedrock models.
with lots of help from LLMs I was able to get a semi-working setup, and would like to discuss some points:
extension code for custom provider:
my thoughts and concerns in random order:
!commands seem to only be running when the session start and then cached forever. Is there a plan to eventually support a rotation with an expiration in minutes or similar? Is it a proper solution to inject the headers in the stream for each request like above (assuming the command used to refresh token already handling caching internally...)?Not expecting a thorough code review :P but I wanted to share this for a general direction, and especially because this is probably a relatively commen setup in enterprise settings and maybe other people encountered the same challenges and found better solutions. There may be some opportunities to extract some reusable logic, but I'm not 100% sure what, and how "custom" diferent setups can get.
Beta Was this translation helpful? Give feedback.
All reactions