🧠 AI Proxy Server

A unified Node.js API gateway for OpenAI, Anthropic, and Mistral with intelligent caching, comprehensive cost tracking, and error normalization. Built with TypeScript and Fastify.

Repository: https://github.com/natiassefa/AI-Proxy

✨ Features

Multi-Provider Support: Unified interface for OpenAI, Anthropic (Claude), and Mistral AI
Streaming Support (SSE): Real-time token streaming using Server-Sent Events for all providers
MCP Server Integration: Connect to Model Context Protocol (MCP) servers for automatic tool execution
Automatic Tool Execution: AI models can automatically use MCP tools during conversations
Intelligent Caching: Optional Redis caching to reduce API costs and improve response times
Cost Tracking: Detailed per-model cost tracking with breakdowns for input/output tokens
Type-Safe: Full TypeScript support with comprehensive type definitions
Error Handling: Normalized error responses with helpful schema validation
Hot Reloading: Development server with automatic reload on file changes
Request Validation: Zod schema validation with human-readable error messages

🚀 Quick Start

Prerequisites

Node.js 18+
pnpm (or npm/yarn)
API keys for at least one provider (OpenAI, Anthropic, or Mistral)

Installation

# Clone the repository
git clone https://github.com/natiassefa/AI-Proxy.git
cd AI-Proxy

# Install dependencies
pnpm install

# Copy environment template
cp .env.example .env

Configuration

Edit your .env file with your API keys:

# Server Configuration
PORT=8080

# AI Provider API Keys (at least one required)
OPENAI_API_KEY=your_openai_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here
MISTRAL_API_KEY=your_mistral_api_key_here

# Optional: Redis Cache Configuration
# For local Redis with Docker:
REDIS_URL=redis://localhost:6379

# Optional: MCP Servers Configuration
# Create mcp-servers.json file at project root (see MCP section below)

Running the Server

# Development mode (with hot reloading)
pnpm dev

# Production mode
pnpm build
pnpm start

The server will start on http://localhost:8080 (or your configured port).

📦 Optional: Redis Caching Setup

Caching is completely optional. The server works perfectly without Redis, but enabling it can significantly reduce API costs and improve response times for repeated requests.

Using Docker (Recommended)

The easiest way to run Redis locally is with Docker:

# Run Redis in a Docker container
docker run -d \
  --name redis-aiproxy \
  -p 6379:6379 \
  redis:7-alpine

# Or with Docker Compose (create docker-compose.yml):

docker-compose.yml:

version: "3.8"
services:
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis-data:/data

volumes:
  redis-data:

Then run:

docker-compose up -d

Once Redis is running, add to your .env:

REDIS_URL=redis://localhost:6379

The cache will automatically:

Cache responses for 10 minutes (600 seconds)
Reduce redundant API calls
Improve response times for cached requests

Note: If REDIS_URL is not set, the server will work normally without caching - no errors, no warnings, just direct API calls.

🔌 MCP (Model Context Protocol) Server Support

The proxy supports connecting to MCP servers to enable automatic tool execution. MCP servers expose tools and resources that AI models can use during conversations.

Configuration

Create a mcp-servers.json file at the project root:

[
  {
    "name": "filesystem",
    "transport": "stdio",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-filesystem", "."]
  },
  {
    "name": "remote-http",
    "transport": "http",
    "url": "http://localhost:8000/mcp",
    "headers": {
      "Authorization": "Bearer your-token-here"
    },
    "timeout": 30000
  },
  {
    "name": "remote-sse",
    "transport": "sse",
    "url": "http://localhost:8001",
    "headers": {
      "Authorization": "Bearer your-token-here"
    },
    "timeout": 30000,
    "reconnectDelay": 1000,
    "maxReconnectAttempts": 5
  }
]

Transport Types:

stdio: Local process transport (requires command and args)
http: HTTP-based transport for remote servers (requires url)
sse: Server-Sent Events transport for remote servers (requires url)

Configuration Options:

name: Unique identifier for the MCP server
transport: Transport type (stdio, http, or sse)
url: Server URL (required for http and sse transports)
command: Command to execute (required for stdio transport)
args: Command arguments (optional, for stdio transport)
env: Environment variables (optional, for stdio transport)
headers: HTTP headers (optional, for http and sse transports)
timeout: Request timeout in milliseconds (optional, default: 30000)
reconnectDelay: Initial reconnect delay in ms (optional, for sse, default: 1000)
maxReconnectAttempts: Maximum reconnect attempts (optional, for sse, default: 5)

The server will automatically discover and connect to configured MCP servers on startup.

MCP API Endpoints

List all available MCP tools:

curl http://localhost:8080/v1/mcp/tools

Get tools for a specific server:

curl http://localhost:8080/v1/mcp/filesystem/tools

Get information about a specific tool:

curl http://localhost:8080/v1/mcp/tools/read_file

List MCP servers and their status:

curl http://localhost:8080/v1/mcp/servers

Call an MCP tool directly (for testing):

curl -X POST http://localhost:8080/v1/mcp/tools/read_file/call \
  -H "Content-Type: application/json" \
  -d '{"arguments":{"path":"README.md"}}'

Using MCP Tools in Chat

Enable automatic tool execution by setting useMcpTools: true:

curl -X POST http://localhost:8080/v1/chat \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Read README.md and summarize it"}
    ],
    "useMcpTools": true
  }'

What happens automatically:

Proxy includes MCP tools in the request
AI model requests a tool (e.g., read_file)
Proxy executes the tool via MCP protocol
Tool results are sent back to the model
Model generates final response using tool results

All tool execution happens automatically in a single request!

Example with filesystem server:

# List directory contents
curl -X POST http://localhost:8080/v1/chat \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "What files are in the current directory?"}
    ],
    "useMcpTools": true
  }'

For more details, see MCP_AUTOMATIC_TOOLS.md.

📡 API Usage

Health Check

curl http://localhost:8080/

Response:

{
  "status": "ok",
  "service": "AI Proxy",
  "mcp": {
    "enabled": true,
    "serverCount": 2,
    "toolCount": 15
  }
}

The health check includes MCP status when MCP servers are configured.

Chat Completions

Non-Streaming (default):

curl -X POST http://localhost:8080/v1/chat \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

With MCP Tools (Automatic Tool Execution):

curl -X POST http://localhost:8080/v1/chat \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Read package.json and list the dependencies"}
    ],
    "useMcpTools": true
  }'

When useMcpTools: true is set, the proxy automatically:

Includes all available MCP tools in the request
Executes tools when the model requests them
Returns tool results to the model
Continues the conversation until a final response

With Custom Tools:

curl -X POST http://localhost:8080/v1/chat \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "What is the weather?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string"}
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'

Note: Custom tools must be executed by the client. Only MCP tools are automatically executed by the proxy.

Streaming Responses (SSE)

Stream responses using Server-Sent Events (SSE) for real-time token delivery:

Request:

curl -N -H "Accept: text/event-stream" \
  "http://localhost:8080/v1/chat?stream=true" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Tell me a story"}]
  }'

Or with stream in the request body:

curl -N -H "Accept: text/event-stream" \
  "http://localhost:8080/v1/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

Response (SSE format):

event: chunk
data: {"content":"Once","role":"assistant"}

event: chunk
data: {"content":" upon","role":"assistant"}

event: chunk
data: {"content":" a","role":"assistant"}

event: done
data: {"usage":{"prompt_tokens":10,"completion_tokens":150,"total_tokens":160},"cost":{"total_tokens":160,"estimated_cost_usd":"0.001600",...},"latency_ms":2345}

JavaScript Client Example:

// Note: EventSource only supports GET requests, so for POST you'll need fetch or a library
const response = await fetch("http://localhost:8080/v1/chat?stream=true", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    provider: "openai",
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello" }],
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split("\n\n");

  for (const line of lines) {
    if (line.startsWith("event: chunk")) {
      const dataLine = lines[lines.indexOf(line) + 1];
      if (dataLine && dataLine.startsWith("data: ")) {
        const data = JSON.parse(dataLine.slice(6));
        console.log(data.content); // Accumulate content
      }
    } else if (line.startsWith("event: done")) {
      const dataLine = lines[lines.indexOf(line) + 1];
      if (dataLine && dataLine.startsWith("data: ")) {
        const data = JSON.parse(dataLine.slice(6));
        console.log("Usage:", data.usage);
        console.log("Cost:", data.cost);
      }
    }
  }
}

Python Client Example:

import requests
import json

url = "http://localhost:8080/v1/chat?stream=true"
data = {
    "provider": "openai",
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
}

response = requests.post(url, json=data, stream=True)

for line in response.iter_lines():
    if line:
        line = line.decode('utf-8')
        if line.startswith('data: '):
            try:
                data = json.loads(line[6:])
                print(data)
            except json.JSONDecodeError:
                pass

Important Notes:

Streaming requests are not cached - Each streaming request makes a fresh API call
The stream parameter can be passed as a query parameter (?stream=true) or in the request body ("stream": true)
Use curl -N flag to disable buffering and see chunks in real-time
Postman and similar tools may buffer the entire response - use curl -N or browser EventSource for real streaming
Anthropic streaming: Usage data is not available in streaming mode (will show zeros) - use non-streaming requests for accurate token counts

Supported Providers and Models

OpenAI:

gpt-4o, gpt-4o-mini
gpt-4-turbo, gpt-4
gpt-3.5-turbo
o1, o1-mini, o3, o3-mini

Anthropic:

claude-sonnet-4-5, claude-haiku-4-5, claude-opus-4-1
Legacy: claude-sonnet-4, claude-3-7-sonnet, etc.

Mistral:

mistral-large, mistral-small, mistral-medium
mistral-nemo, pixtral-12b, codestral

Response Format

{
  "provider": "openai",
  "message": {
    "role": "assistant",
    "content": "Hello! I'm doing well, thank you for asking..."
  },
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 25,
    "total_tokens": 35
  },
  "cost": {
    "total_tokens": 35,
    "input_tokens": 10,
    "output_tokens": 25,
    "estimated_cost_usd": "0.000350",
    "input_cost_usd": "0.000025",
    "output_cost_usd": "0.000250"
  },
  "latency_ms": 1234
}

Error Handling

If validation fails, you'll get a helpful error response:

{
  "error": "Invalid request",
  "details": {
    "provider": {
      "_errors": ["Required"]
    }
  },
  "expectedSchema": {
    "structure": {
      "type": "object",
      "properties": {
        "provider": {
          "type": "enum",
          "options": ["openai", "anthropic", "mistral"]
        },
        "model": { "type": "string" },
        "messages": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "role": {
                "type": "enum",
                "options": ["system", "user", "assistant"]
              },
              "content": { "type": "string" }
            }
          }
        }
      }
    },
    "description": "provider: enum: openai | anthropic | mistral\nmodel: string\nmessages: array of:\n  role: enum: system | user | assistant\n  content: string",
    "example": {
      "provider": "openai",
      "model": "gpt-4-turbo",
      "messages": [{ "role": "user", "content": "Hello" }]
    }
  }
}

🏗️ Project Structure

src/
├── config.ts              # Environment configuration
├── index.ts               # Application entry point
├── server.ts              # Fastify server setup
├── providers/             # AI provider implementations
│   ├── base.ts           # Provider routing logic
│   ├── types.ts          # Shared provider types
│   ├── openai.ts         # OpenAI integration
│   ├── anthropic.ts      # Anthropic integration
│   ├── mistral.ts        # Mistral integration
│   └── streaming/        # Streaming implementations
│       ├── index.ts      # Streaming router
│       ├── openai.ts     # OpenAI streaming
│       ├── anthropic.ts  # Anthropic streaming
│       └── mistral.ts    # Mistral streaming
├── mcp/                   # MCP (Model Context Protocol) support
│   ├── types.ts          # MCP protocol types
│   ├── client.ts         # MCP client base class
│   ├── manager.ts        # MCP server manager
│   └── transport/         # MCP transport implementations
│       ├── index.ts      # Transport factory
│       ├── stdio.ts      # Stdio transport
│       ├── http.ts       # HTTP transport
│       └── sse.ts        # SSE transport
├── routes/                # API routes
│   ├── chat.ts           # Chat completions endpoint
│   ├── health.ts         # Health check endpoint
│   └── mcp.ts            # MCP API endpoints
└── utils/                 # Utility modules
    ├── cache.ts          # Redis caching
    ├── costTracker/      # Cost tracking module
    │   ├── index.ts      # Main cost tracker
    │   ├── types.ts      # Cost types
    │   ├── pricing/      # Provider pricing data
    │   └── calculators/  # Cost calculation logic
    ├── logger.ts         # Winston logger
    ├── schemaFormatter.ts # Schema validation helpers
    ├── sse.ts            # SSE utilities
    └── mcpToolConverter.ts # MCP tool conversion utilities

💰 Cost Tracking

The cost tracker provides detailed per-model pricing:

OpenAI: Comprehensive pricing for all GPT models, O1/O3 series
Anthropic: Claude Sonnet, Haiku, and Opus models
Mistral: Large, Small, Medium, Nemo, Pixtral, and Codestral models

Costs are calculated based on:

Separate input/output token pricing
Per-million-token rates
Detailed breakdowns in responses

🛠️ Development

# Development with hot reloading
pnpm dev

# Build for production
pnpm build

# Run tests
pnpm test

# Type checking
pnpm tsc --noEmit

Tech Stack

Runtime: Node.js with TypeScript
Framework: Fastify
Validation: Zod
Caching: Redis (ioredis)
Logging: Winston
HTTP Client: Axios
MCP Protocol: JSON-RPC 2.0 over stdio, HTTP, or SSE transport

🤝 Contributing

We welcome contributions! This project is designed to be extensible and easy to contribute to.

How to Contribute

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Make your changes: Follow the existing code style and patterns
Add tests: If applicable, add tests for new functionality
Commit your changes: git commit -m 'Add amazing feature'
Push to the branch: git push origin feature/amazing-feature
Open a Pull Request: Describe your changes and why they're valuable

Areas for Contribution

New Providers: Add support for additional AI providers (Cohere, Google, etc.)
MCP Enhancements: Resource handling, prompt templates, WebSocket transport
Pricing Updates: Keep pricing data current as providers update their rates
Features: Caching improvements, rate limiting, request queuing, etc.
Documentation: Improve docs, add examples, tutorials
Testing: Add unit tests, integration tests, E2E tests
Performance: Optimize caching, reduce latency, improve throughput
Error Handling: Better error messages, retry logic, circuit breakers

Code Style

Use TypeScript with strict mode
Follow existing patterns and conventions
Use meaningful variable and function names
Add JSDoc comments for public APIs
Keep functions focused and modular

Questions?

Feel free to open an issue for:

Bug reports
Feature requests
Questions about implementation
Documentation improvements

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

📚 Additional Documentation

MCP_AUTOMATIC_TOOLS.md: Complete guide to automatic MCP tool execution
mcp-servers.json.example: Example MCP server configuration

🙏 Acknowledgments

OpenAI, Anthropic, and Mistral for their excellent AI APIs
The Fastify team for the amazing web framework
The Model Context Protocol team for the MCP specification
All contributors who help improve this project

Made with ❤️ for the AI community

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mcp-servers.json.example		mcp-servers.json.example
openapi.yaml		openapi.yaml
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json

License

natiassefa/AI-Proxy

Folders and files

Latest commit

History

Repository files navigation

🧠 AI Proxy Server

✨ Features

🚀 Quick Start

Prerequisites

Installation

Configuration

Running the Server

📦 Optional: Redis Caching Setup

Using Docker (Recommended)

🔌 MCP (Model Context Protocol) Server Support

Configuration

MCP API Endpoints

Using MCP Tools in Chat

📡 API Usage

Health Check

Chat Completions

Streaming Responses (SSE)

Supported Providers and Models

Response Format

Error Handling

🏗️ Project Structure

💰 Cost Tracking

🛠️ Development

Tech Stack

🤝 Contributing

How to Contribute

Areas for Contribution

Code Style

Questions?

📝 License

📚 Additional Documentation

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages