Skip to content

natiassefa/AI-Proxy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

6 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 AI Proxy Server

A unified Node.js API gateway for OpenAI, Anthropic, and Mistral with intelligent caching, comprehensive cost tracking, and error normalization. Built with TypeScript and Fastify.

Repository: https://github.com/natiassefa/AI-Proxy

✨ Features

  • Multi-Provider Support: Unified interface for OpenAI, Anthropic (Claude), and Mistral AI
  • Streaming Support (SSE): Real-time token streaming using Server-Sent Events for all providers
  • MCP Server Integration: Connect to Model Context Protocol (MCP) servers for automatic tool execution
  • Automatic Tool Execution: AI models can automatically use MCP tools during conversations
  • Intelligent Caching: Optional Redis caching to reduce API costs and improve response times
  • Cost Tracking: Detailed per-model cost tracking with breakdowns for input/output tokens
  • Type-Safe: Full TypeScript support with comprehensive type definitions
  • Error Handling: Normalized error responses with helpful schema validation
  • Hot Reloading: Development server with automatic reload on file changes
  • Request Validation: Zod schema validation with human-readable error messages

πŸš€ Quick Start

Prerequisites

  • Node.js 18+
  • pnpm (or npm/yarn)
  • API keys for at least one provider (OpenAI, Anthropic, or Mistral)

Installation

# Clone the repository
git clone https://github.com/natiassefa/AI-Proxy.git
cd AI-Proxy

# Install dependencies
pnpm install

# Copy environment template
cp .env.example .env

Configuration

Edit your .env file with your API keys:

# Server Configuration
PORT=8080

# AI Provider API Keys (at least one required)
OPENAI_API_KEY=your_openai_api_key_here
ANTHROPIC_API_KEY=your_anthropic_api_key_here
MISTRAL_API_KEY=your_mistral_api_key_here

# Optional: Redis Cache Configuration
# For local Redis with Docker:
REDIS_URL=redis://localhost:6379

# Optional: MCP Servers Configuration
# Create mcp-servers.json file at project root (see MCP section below)

Running the Server

# Development mode (with hot reloading)
pnpm dev

# Production mode
pnpm build
pnpm start

The server will start on http://localhost:8080 (or your configured port).

πŸ“¦ Optional: Redis Caching Setup

Caching is completely optional. The server works perfectly without Redis, but enabling it can significantly reduce API costs and improve response times for repeated requests.

Using Docker (Recommended)

The easiest way to run Redis locally is with Docker:

# Run Redis in a Docker container
docker run -d \
  --name redis-aiproxy \
  -p 6379:6379 \
  redis:7-alpine

# Or with Docker Compose (create docker-compose.yml):

docker-compose.yml:

version: "3.8"
services:
  redis:
    image: redis:7-alpine
    ports:
      - "6379:6379"
    volumes:
      - redis-data:/data

volumes:
  redis-data:

Then run:

docker-compose up -d

Once Redis is running, add to your .env:

REDIS_URL=redis://localhost:6379

The cache will automatically:

  • Cache responses for 10 minutes (600 seconds)
  • Reduce redundant API calls
  • Improve response times for cached requests

Note: If REDIS_URL is not set, the server will work normally without caching - no errors, no warnings, just direct API calls.

πŸ”Œ MCP (Model Context Protocol) Server Support

The proxy supports connecting to MCP servers to enable automatic tool execution. MCP servers expose tools and resources that AI models can use during conversations.

Configuration

Create a mcp-servers.json file at the project root:

[
  {
    "name": "filesystem",
    "transport": "stdio",
    "command": "npx",
    "args": ["-y", "@modelcontextprotocol/server-filesystem", "."]
  },
  {
    "name": "remote-http",
    "transport": "http",
    "url": "http://localhost:8000/mcp",
    "headers": {
      "Authorization": "Bearer your-token-here"
    },
    "timeout": 30000
  },
  {
    "name": "remote-sse",
    "transport": "sse",
    "url": "http://localhost:8001",
    "headers": {
      "Authorization": "Bearer your-token-here"
    },
    "timeout": 30000,
    "reconnectDelay": 1000,
    "maxReconnectAttempts": 5
  }
]

Transport Types:

  • stdio: Local process transport (requires command and args)
  • http: HTTP-based transport for remote servers (requires url)
  • sse: Server-Sent Events transport for remote servers (requires url)

Configuration Options:

  • name: Unique identifier for the MCP server
  • transport: Transport type (stdio, http, or sse)
  • url: Server URL (required for http and sse transports)
  • command: Command to execute (required for stdio transport)
  • args: Command arguments (optional, for stdio transport)
  • env: Environment variables (optional, for stdio transport)
  • headers: HTTP headers (optional, for http and sse transports)
  • timeout: Request timeout in milliseconds (optional, default: 30000)
  • reconnectDelay: Initial reconnect delay in ms (optional, for sse, default: 1000)
  • maxReconnectAttempts: Maximum reconnect attempts (optional, for sse, default: 5)

The server will automatically discover and connect to configured MCP servers on startup.

MCP API Endpoints

List all available MCP tools:

curl http://localhost:8080/v1/mcp/tools

Get tools for a specific server:

curl http://localhost:8080/v1/mcp/filesystem/tools

Get information about a specific tool:

curl http://localhost:8080/v1/mcp/tools/read_file

List MCP servers and their status:

curl http://localhost:8080/v1/mcp/servers

Call an MCP tool directly (for testing):

curl -X POST http://localhost:8080/v1/mcp/tools/read_file/call \
  -H "Content-Type: application/json" \
  -d '{"arguments":{"path":"README.md"}}'

Using MCP Tools in Chat

Enable automatic tool execution by setting useMcpTools: true:

curl -X POST http://localhost:8080/v1/chat \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Read README.md and summarize it"}
    ],
    "useMcpTools": true
  }'

What happens automatically:

  1. Proxy includes MCP tools in the request
  2. AI model requests a tool (e.g., read_file)
  3. Proxy executes the tool via MCP protocol
  4. Tool results are sent back to the model
  5. Model generates final response using tool results

All tool execution happens automatically in a single request!

Example with filesystem server:

# List directory contents
curl -X POST http://localhost:8080/v1/chat \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "What files are in the current directory?"}
    ],
    "useMcpTools": true
  }'

For more details, see MCP_AUTOMATIC_TOOLS.md.

πŸ“‘ API Usage

Health Check

curl http://localhost:8080/

Response:

{
  "status": "ok",
  "service": "AI Proxy",
  "mcp": {
    "enabled": true,
    "serverCount": 2,
    "toolCount": 15
  }
}

The health check includes MCP status when MCP servers are configured.

Chat Completions

Non-Streaming (default):

curl -X POST http://localhost:8080/v1/chat \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

With MCP Tools (Automatic Tool Execution):

curl -X POST http://localhost:8080/v1/chat \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Read package.json and list the dependencies"}
    ],
    "useMcpTools": true
  }'

When useMcpTools: true is set, the proxy automatically:

  • Includes all available MCP tools in the request
  • Executes tools when the model requests them
  • Returns tool results to the model
  • Continues the conversation until a final response

With Custom Tools:

curl -X POST http://localhost:8080/v1/chat \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "What is the weather?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "description": "Get weather for a location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {"type": "string"}
            },
            "required": ["location"]
          }
        }
      }
    ]
  }'

Note: Custom tools must be executed by the client. Only MCP tools are automatically executed by the proxy.

Streaming Responses (SSE)

Stream responses using Server-Sent Events (SSE) for real-time token delivery:

Request:

curl -N -H "Accept: text/event-stream" \
  "http://localhost:8080/v1/chat?stream=true" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Tell me a story"}]
  }'

Or with stream in the request body:

curl -N -H "Accept: text/event-stream" \
  "http://localhost:8080/v1/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "provider": "openai",
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Tell me a story"}],
    "stream": true
  }'

Response (SSE format):

event: chunk
data: {"content":"Once","role":"assistant"}

event: chunk
data: {"content":" upon","role":"assistant"}

event: chunk
data: {"content":" a","role":"assistant"}

event: done
data: {"usage":{"prompt_tokens":10,"completion_tokens":150,"total_tokens":160},"cost":{"total_tokens":160,"estimated_cost_usd":"0.001600",...},"latency_ms":2345}

JavaScript Client Example:

// Note: EventSource only supports GET requests, so for POST you'll need fetch or a library
const response = await fetch("http://localhost:8080/v1/chat?stream=true", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    provider: "openai",
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello" }],
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  const lines = chunk.split("\n\n");

  for (const line of lines) {
    if (line.startsWith("event: chunk")) {
      const dataLine = lines[lines.indexOf(line) + 1];
      if (dataLine && dataLine.startsWith("data: ")) {
        const data = JSON.parse(dataLine.slice(6));
        console.log(data.content); // Accumulate content
      }
    } else if (line.startsWith("event: done")) {
      const dataLine = lines[lines.indexOf(line) + 1];
      if (dataLine && dataLine.startsWith("data: ")) {
        const data = JSON.parse(dataLine.slice(6));
        console.log("Usage:", data.usage);
        console.log("Cost:", data.cost);
      }
    }
  }
}

Python Client Example:

import requests
import json

url = "http://localhost:8080/v1/chat?stream=true"
data = {
    "provider": "openai",
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
}

response = requests.post(url, json=data, stream=True)

for line in response.iter_lines():
    if line:
        line = line.decode('utf-8')
        if line.startswith('data: '):
            try:
                data = json.loads(line[6:])
                print(data)
            except json.JSONDecodeError:
                pass

Important Notes:

  • Streaming requests are not cached - Each streaming request makes a fresh API call
  • The stream parameter can be passed as a query parameter (?stream=true) or in the request body ("stream": true)
  • Use curl -N flag to disable buffering and see chunks in real-time
  • Postman and similar tools may buffer the entire response - use curl -N or browser EventSource for real streaming
  • Anthropic streaming: Usage data is not available in streaming mode (will show zeros) - use non-streaming requests for accurate token counts

Supported Providers and Models

OpenAI:

  • gpt-4o, gpt-4o-mini
  • gpt-4-turbo, gpt-4
  • gpt-3.5-turbo
  • o1, o1-mini, o3, o3-mini

Anthropic:

  • claude-sonnet-4-5, claude-haiku-4-5, claude-opus-4-1
  • Legacy: claude-sonnet-4, claude-3-7-sonnet, etc.

Mistral:

  • mistral-large, mistral-small, mistral-medium
  • mistral-nemo, pixtral-12b, codestral

Response Format

{
  "provider": "openai",
  "message": {
    "role": "assistant",
    "content": "Hello! I'm doing well, thank you for asking..."
  },
  "usage": {
    "prompt_tokens": 10,
    "completion_tokens": 25,
    "total_tokens": 35
  },
  "cost": {
    "total_tokens": 35,
    "input_tokens": 10,
    "output_tokens": 25,
    "estimated_cost_usd": "0.000350",
    "input_cost_usd": "0.000025",
    "output_cost_usd": "0.000250"
  },
  "latency_ms": 1234
}

Error Handling

If validation fails, you'll get a helpful error response:

{
  "error": "Invalid request",
  "details": {
    "provider": {
      "_errors": ["Required"]
    }
  },
  "expectedSchema": {
    "structure": {
      "type": "object",
      "properties": {
        "provider": {
          "type": "enum",
          "options": ["openai", "anthropic", "mistral"]
        },
        "model": { "type": "string" },
        "messages": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "role": {
                "type": "enum",
                "options": ["system", "user", "assistant"]
              },
              "content": { "type": "string" }
            }
          }
        }
      }
    },
    "description": "provider: enum: openai | anthropic | mistral\nmodel: string\nmessages: array of:\n  role: enum: system | user | assistant\n  content: string",
    "example": {
      "provider": "openai",
      "model": "gpt-4-turbo",
      "messages": [{ "role": "user", "content": "Hello" }]
    }
  }
}

πŸ—οΈ Project Structure

src/
β”œβ”€β”€ config.ts              # Environment configuration
β”œβ”€β”€ index.ts               # Application entry point
β”œβ”€β”€ server.ts              # Fastify server setup
β”œβ”€β”€ providers/             # AI provider implementations
β”‚   β”œβ”€β”€ base.ts           # Provider routing logic
β”‚   β”œβ”€β”€ types.ts          # Shared provider types
β”‚   β”œβ”€β”€ openai.ts         # OpenAI integration
β”‚   β”œβ”€β”€ anthropic.ts      # Anthropic integration
β”‚   β”œβ”€β”€ mistral.ts        # Mistral integration
β”‚   └── streaming/        # Streaming implementations
β”‚       β”œβ”€β”€ index.ts      # Streaming router
β”‚       β”œβ”€β”€ openai.ts     # OpenAI streaming
β”‚       β”œβ”€β”€ anthropic.ts  # Anthropic streaming
β”‚       └── mistral.ts    # Mistral streaming
β”œβ”€β”€ mcp/                   # MCP (Model Context Protocol) support
β”‚   β”œβ”€β”€ types.ts          # MCP protocol types
β”‚   β”œβ”€β”€ client.ts         # MCP client base class
β”‚   β”œβ”€β”€ manager.ts        # MCP server manager
β”‚   └── transport/         # MCP transport implementations
β”‚       β”œβ”€β”€ index.ts      # Transport factory
β”‚       β”œβ”€β”€ stdio.ts      # Stdio transport
β”‚       β”œβ”€β”€ http.ts       # HTTP transport
β”‚       └── sse.ts        # SSE transport
β”œβ”€β”€ routes/                # API routes
β”‚   β”œβ”€β”€ chat.ts           # Chat completions endpoint
β”‚   β”œβ”€β”€ health.ts         # Health check endpoint
β”‚   └── mcp.ts            # MCP API endpoints
└── utils/                 # Utility modules
    β”œβ”€β”€ cache.ts          # Redis caching
    β”œβ”€β”€ costTracker/      # Cost tracking module
    β”‚   β”œβ”€β”€ index.ts      # Main cost tracker
    β”‚   β”œβ”€β”€ types.ts      # Cost types
    β”‚   β”œβ”€β”€ pricing/      # Provider pricing data
    β”‚   └── calculators/  # Cost calculation logic
    β”œβ”€β”€ logger.ts         # Winston logger
    β”œβ”€β”€ schemaFormatter.ts # Schema validation helpers
    β”œβ”€β”€ sse.ts            # SSE utilities
    └── mcpToolConverter.ts # MCP tool conversion utilities

πŸ’° Cost Tracking

The cost tracker provides detailed per-model pricing:

  • OpenAI: Comprehensive pricing for all GPT models, O1/O3 series
  • Anthropic: Claude Sonnet, Haiku, and Opus models
  • Mistral: Large, Small, Medium, Nemo, Pixtral, and Codestral models

Costs are calculated based on:

  • Separate input/output token pricing
  • Per-million-token rates
  • Detailed breakdowns in responses

πŸ› οΈ Development

# Development with hot reloading
pnpm dev

# Build for production
pnpm build

# Run tests
pnpm test

# Type checking
pnpm tsc --noEmit

Tech Stack

  • Runtime: Node.js with TypeScript
  • Framework: Fastify
  • Validation: Zod
  • Caching: Redis (ioredis)
  • Logging: Winston
  • HTTP Client: Axios
  • MCP Protocol: JSON-RPC 2.0 over stdio, HTTP, or SSE transport

🀝 Contributing

We welcome contributions! This project is designed to be extensible and easy to contribute to.

How to Contribute

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes: Follow the existing code style and patterns
  4. Add tests: If applicable, add tests for new functionality
  5. Commit your changes: git commit -m 'Add amazing feature'
  6. Push to the branch: git push origin feature/amazing-feature
  7. Open a Pull Request: Describe your changes and why they're valuable

Areas for Contribution

  • New Providers: Add support for additional AI providers (Cohere, Google, etc.)
  • MCP Enhancements: Resource handling, prompt templates, WebSocket transport
  • Pricing Updates: Keep pricing data current as providers update their rates
  • Features: Caching improvements, rate limiting, request queuing, etc.
  • Documentation: Improve docs, add examples, tutorials
  • Testing: Add unit tests, integration tests, E2E tests
  • Performance: Optimize caching, reduce latency, improve throughput
  • Error Handling: Better error messages, retry logic, circuit breakers

Code Style

  • Use TypeScript with strict mode
  • Follow existing patterns and conventions
  • Use meaningful variable and function names
  • Add JSDoc comments for public APIs
  • Keep functions focused and modular

Questions?

Feel free to open an issue for:

  • Bug reports
  • Feature requests
  • Questions about implementation
  • Documentation improvements

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ“š Additional Documentation

πŸ™ Acknowledgments

  • OpenAI, Anthropic, and Mistral for their excellent AI APIs
  • The Fastify team for the amazing web framework
  • The Model Context Protocol team for the MCP specification
  • All contributors who help improve this project

Made with ❀️ for the AI community

About

Proxy for LLMs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published