A unified Node.js API gateway for OpenAI, Anthropic, and Mistral with intelligent caching, comprehensive cost tracking, and error normalization. Built with TypeScript and Fastify.
Repository: https://github.com/natiassefa/AI-Proxy
- Multi-Provider Support: Unified interface for OpenAI, Anthropic (Claude), and Mistral AI
- Streaming Support (SSE): Real-time token streaming using Server-Sent Events for all providers
- MCP Server Integration: Connect to Model Context Protocol (MCP) servers for automatic tool execution
- Automatic Tool Execution: AI models can automatically use MCP tools during conversations
- Intelligent Caching: Optional Redis caching to reduce API costs and improve response times
- Cost Tracking: Detailed per-model cost tracking with breakdowns for input/output tokens
- Type-Safe: Full TypeScript support with comprehensive type definitions
- Error Handling: Normalized error responses with helpful schema validation
- Hot Reloading: Development server with automatic reload on file changes
- Request Validation: Zod schema validation with human-readable error messages
- Node.js 18+
- pnpm (or npm/yarn)
- API keys for at least one provider (OpenAI, Anthropic, or Mistral)
# Clone the repository
git clone https://github.com/natiassefa/AI-Proxy.git
cd AI-Proxy
# Install dependencies
pnpm install
# Copy environment template
cp .env.example .envEdit your .env file with your API keys:
# Server Configuration
PORT=8080
# AI Provider API Keys (at least one required)
OPENAI_API_KEY=your_openai_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here
MISTRAL_API_KEY=your_mistral_api_key_here
# Optional: Redis Cache Configuration
# For local Redis with Docker:
REDIS_URL=redis://localhost:6379
# Optional: MCP Servers Configuration
# Create mcp-servers.json file at project root (see MCP section below)# Development mode (with hot reloading)
pnpm dev
# Production mode
pnpm build
pnpm startThe server will start on http://localhost:8080 (or your configured port).
Caching is completely optional. The server works perfectly without Redis, but enabling it can significantly reduce API costs and improve response times for repeated requests.
The easiest way to run Redis locally is with Docker:
# Run Redis in a Docker container
docker run -d \
--name redis-aiproxy \
-p 6379:6379 \
redis:7-alpine
# Or with Docker Compose (create docker-compose.yml):docker-compose.yml:
version: "3.8"
services:
redis:
image: redis:7-alpine
ports:
- "6379:6379"
volumes:
- redis-data:/data
volumes:
redis-data:Then run:
docker-compose up -dOnce Redis is running, add to your .env:
REDIS_URL=redis://localhost:6379The cache will automatically:
- Cache responses for 10 minutes (600 seconds)
- Reduce redundant API calls
- Improve response times for cached requests
Note: If REDIS_URL is not set, the server will work normally without caching - no errors, no warnings, just direct API calls.
The proxy supports connecting to MCP servers to enable automatic tool execution. MCP servers expose tools and resources that AI models can use during conversations.
Create a mcp-servers.json file at the project root:
[
{
"name": "filesystem",
"transport": "stdio",
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "."]
},
{
"name": "remote-http",
"transport": "http",
"url": "http://localhost:8000/mcp",
"headers": {
"Authorization": "Bearer your-token-here"
},
"timeout": 30000
},
{
"name": "remote-sse",
"transport": "sse",
"url": "http://localhost:8001",
"headers": {
"Authorization": "Bearer your-token-here"
},
"timeout": 30000,
"reconnectDelay": 1000,
"maxReconnectAttempts": 5
}
]Transport Types:
- stdio: Local process transport (requires
commandandargs) - http: HTTP-based transport for remote servers (requires
url) - sse: Server-Sent Events transport for remote servers (requires
url)
Configuration Options:
name: Unique identifier for the MCP servertransport: Transport type (stdio,http, orsse)url: Server URL (required forhttpandssetransports)command: Command to execute (required forstdiotransport)args: Command arguments (optional, forstdiotransport)env: Environment variables (optional, forstdiotransport)headers: HTTP headers (optional, forhttpandssetransports)timeout: Request timeout in milliseconds (optional, default: 30000)reconnectDelay: Initial reconnect delay in ms (optional, forsse, default: 1000)maxReconnectAttempts: Maximum reconnect attempts (optional, forsse, default: 5)
The server will automatically discover and connect to configured MCP servers on startup.
List all available MCP tools:
curl http://localhost:8080/v1/mcp/toolsGet tools for a specific server:
curl http://localhost:8080/v1/mcp/filesystem/toolsGet information about a specific tool:
curl http://localhost:8080/v1/mcp/tools/read_fileList MCP servers and their status:
curl http://localhost:8080/v1/mcp/serversCall an MCP tool directly (for testing):
curl -X POST http://localhost:8080/v1/mcp/tools/read_file/call \
-H "Content-Type: application/json" \
-d '{"arguments":{"path":"README.md"}}'Enable automatic tool execution by setting useMcpTools: true:
curl -X POST http://localhost:8080/v1/chat \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Read README.md and summarize it"}
],
"useMcpTools": true
}'What happens automatically:
- Proxy includes MCP tools in the request
- AI model requests a tool (e.g.,
read_file) - Proxy executes the tool via MCP protocol
- Tool results are sent back to the model
- Model generates final response using tool results
All tool execution happens automatically in a single request!
Example with filesystem server:
# List directory contents
curl -X POST http://localhost:8080/v1/chat \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "What files are in the current directory?"}
],
"useMcpTools": true
}'For more details, see MCP_AUTOMATIC_TOOLS.md.
curl http://localhost:8080/Response:
{
"status": "ok",
"service": "AI Proxy",
"mcp": {
"enabled": true,
"serverCount": 2,
"toolCount": 15
}
}The health check includes MCP status when MCP servers are configured.
Non-Streaming (default):
curl -X POST http://localhost:8080/v1/chat \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Hello, how are you?"}
]
}'With MCP Tools (Automatic Tool Execution):
curl -X POST http://localhost:8080/v1/chat \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "Read package.json and list the dependencies"}
],
"useMcpTools": true
}'When useMcpTools: true is set, the proxy automatically:
- Includes all available MCP tools in the request
- Executes tools when the model requests them
- Returns tool results to the model
- Continues the conversation until a final response
With Custom Tools:
curl -X POST http://localhost:8080/v1/chat \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "gpt-4o",
"messages": [
{"role": "user", "content": "What is the weather?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {"type": "string"}
},
"required": ["location"]
}
}
}
]
}'Note: Custom tools must be executed by the client. Only MCP tools are automatically executed by the proxy.
Stream responses using Server-Sent Events (SSE) for real-time token delivery:
Request:
curl -N -H "Accept: text/event-stream" \
"http://localhost:8080/v1/chat?stream=true" \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Tell me a story"}]
}'Or with stream in the request body:
curl -N -H "Accept: text/event-stream" \
"http://localhost:8080/v1/chat" \
-H "Content-Type: application/json" \
-d '{
"provider": "openai",
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Tell me a story"}],
"stream": true
}'Response (SSE format):
event: chunk
data: {"content":"Once","role":"assistant"}
event: chunk
data: {"content":" upon","role":"assistant"}
event: chunk
data: {"content":" a","role":"assistant"}
event: done
data: {"usage":{"prompt_tokens":10,"completion_tokens":150,"total_tokens":160},"cost":{"total_tokens":160,"estimated_cost_usd":"0.001600",...},"latency_ms":2345}
JavaScript Client Example:
// Note: EventSource only supports GET requests, so for POST you'll need fetch or a library
const response = await fetch("http://localhost:8080/v1/chat?stream=true", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
provider: "openai",
model: "gpt-4o",
messages: [{ role: "user", content: "Hello" }],
}),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split("\n\n");
for (const line of lines) {
if (line.startsWith("event: chunk")) {
const dataLine = lines[lines.indexOf(line) + 1];
if (dataLine && dataLine.startsWith("data: ")) {
const data = JSON.parse(dataLine.slice(6));
console.log(data.content); // Accumulate content
}
} else if (line.startsWith("event: done")) {
const dataLine = lines[lines.indexOf(line) + 1];
if (dataLine && dataLine.startsWith("data: ")) {
const data = JSON.parse(dataLine.slice(6));
console.log("Usage:", data.usage);
console.log("Cost:", data.cost);
}
}
}
}Python Client Example:
import requests
import json
url = "http://localhost:8080/v1/chat?stream=true"
data = {
"provider": "openai",
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}
response = requests.post(url, json=data, stream=True)
for line in response.iter_lines():
if line:
line = line.decode('utf-8')
if line.startswith('data: '):
try:
data = json.loads(line[6:])
print(data)
except json.JSONDecodeError:
passImportant Notes:
- Streaming requests are not cached - Each streaming request makes a fresh API call
- The
streamparameter can be passed as a query parameter (?stream=true) or in the request body ("stream": true) - Use
curl -Nflag to disable buffering and see chunks in real-time - Postman and similar tools may buffer the entire response - use
curl -Nor browser EventSource for real streaming - Anthropic streaming: Usage data is not available in streaming mode (will show zeros) - use non-streaming requests for accurate token counts
OpenAI:
gpt-4o,gpt-4o-minigpt-4-turbo,gpt-4gpt-3.5-turboo1,o1-mini,o3,o3-mini
Anthropic:
claude-sonnet-4-5,claude-haiku-4-5,claude-opus-4-1- Legacy:
claude-sonnet-4,claude-3-7-sonnet, etc.
Mistral:
mistral-large,mistral-small,mistral-mediummistral-nemo,pixtral-12b,codestral
{
"provider": "openai",
"message": {
"role": "assistant",
"content": "Hello! I'm doing well, thank you for asking..."
},
"usage": {
"prompt_tokens": 10,
"completion_tokens": 25,
"total_tokens": 35
},
"cost": {
"total_tokens": 35,
"input_tokens": 10,
"output_tokens": 25,
"estimated_cost_usd": "0.000350",
"input_cost_usd": "0.000025",
"output_cost_usd": "0.000250"
},
"latency_ms": 1234
}If validation fails, you'll get a helpful error response:
{
"error": "Invalid request",
"details": {
"provider": {
"_errors": ["Required"]
}
},
"expectedSchema": {
"structure": {
"type": "object",
"properties": {
"provider": {
"type": "enum",
"options": ["openai", "anthropic", "mistral"]
},
"model": { "type": "string" },
"messages": {
"type": "array",
"items": {
"type": "object",
"properties": {
"role": {
"type": "enum",
"options": ["system", "user", "assistant"]
},
"content": { "type": "string" }
}
}
}
}
},
"description": "provider: enum: openai | anthropic | mistral\nmodel: string\nmessages: array of:\n role: enum: system | user | assistant\n content: string",
"example": {
"provider": "openai",
"model": "gpt-4-turbo",
"messages": [{ "role": "user", "content": "Hello" }]
}
}
}src/
βββ config.ts # Environment configuration
βββ index.ts # Application entry point
βββ server.ts # Fastify server setup
βββ providers/ # AI provider implementations
β βββ base.ts # Provider routing logic
β βββ types.ts # Shared provider types
β βββ openai.ts # OpenAI integration
β βββ anthropic.ts # Anthropic integration
β βββ mistral.ts # Mistral integration
β βββ streaming/ # Streaming implementations
β βββ index.ts # Streaming router
β βββ openai.ts # OpenAI streaming
β βββ anthropic.ts # Anthropic streaming
β βββ mistral.ts # Mistral streaming
βββ mcp/ # MCP (Model Context Protocol) support
β βββ types.ts # MCP protocol types
β βββ client.ts # MCP client base class
β βββ manager.ts # MCP server manager
β βββ transport/ # MCP transport implementations
β βββ index.ts # Transport factory
β βββ stdio.ts # Stdio transport
β βββ http.ts # HTTP transport
β βββ sse.ts # SSE transport
βββ routes/ # API routes
β βββ chat.ts # Chat completions endpoint
β βββ health.ts # Health check endpoint
β βββ mcp.ts # MCP API endpoints
βββ utils/ # Utility modules
βββ cache.ts # Redis caching
βββ costTracker/ # Cost tracking module
β βββ index.ts # Main cost tracker
β βββ types.ts # Cost types
β βββ pricing/ # Provider pricing data
β βββ calculators/ # Cost calculation logic
βββ logger.ts # Winston logger
βββ schemaFormatter.ts # Schema validation helpers
βββ sse.ts # SSE utilities
βββ mcpToolConverter.ts # MCP tool conversion utilities
The cost tracker provides detailed per-model pricing:
- OpenAI: Comprehensive pricing for all GPT models, O1/O3 series
- Anthropic: Claude Sonnet, Haiku, and Opus models
- Mistral: Large, Small, Medium, Nemo, Pixtral, and Codestral models
Costs are calculated based on:
- Separate input/output token pricing
- Per-million-token rates
- Detailed breakdowns in responses
# Development with hot reloading
pnpm dev
# Build for production
pnpm build
# Run tests
pnpm test
# Type checking
pnpm tsc --noEmit- Runtime: Node.js with TypeScript
- Framework: Fastify
- Validation: Zod
- Caching: Redis (ioredis)
- Logging: Winston
- HTTP Client: Axios
- MCP Protocol: JSON-RPC 2.0 over stdio, HTTP, or SSE transport
We welcome contributions! This project is designed to be extensible and easy to contribute to.
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes: Follow the existing code style and patterns
- Add tests: If applicable, add tests for new functionality
- Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request: Describe your changes and why they're valuable
- New Providers: Add support for additional AI providers (Cohere, Google, etc.)
- MCP Enhancements: Resource handling, prompt templates, WebSocket transport
- Pricing Updates: Keep pricing data current as providers update their rates
- Features: Caching improvements, rate limiting, request queuing, etc.
- Documentation: Improve docs, add examples, tutorials
- Testing: Add unit tests, integration tests, E2E tests
- Performance: Optimize caching, reduce latency, improve throughput
- Error Handling: Better error messages, retry logic, circuit breakers
- Use TypeScript with strict mode
- Follow existing patterns and conventions
- Use meaningful variable and function names
- Add JSDoc comments for public APIs
- Keep functions focused and modular
Feel free to open an issue for:
- Bug reports
- Feature requests
- Questions about implementation
- Documentation improvements
This project is licensed under the MIT License - see the LICENSE file for details.
- MCP_AUTOMATIC_TOOLS.md: Complete guide to automatic MCP tool execution
- mcp-servers.json.example: Example MCP server configuration
- OpenAI, Anthropic, and Mistral for their excellent AI APIs
- The Fastify team for the amazing web framework
- The Model Context Protocol team for the MCP specification
- All contributors who help improve this project
Made with β€οΈ for the AI community