Add Semantic Routing or Mixture-of-Models skill to Emerging Techniques

## Issue Description

### Overview

Add a new skill for **Semantic Routing** or **Mixture-of-Models** (vLLM Semantic Router) to the `19-emerging-techniques` category. Semantic Routing provides system-level intelligence for Mixture-of-Models (MoM) through signal-driven decision engine and plugin chain architecture for intelligent LLM routing, security, and optimization.

### What is Semantic Routing?

Semantic Routing is an intelligent routing layer that uses **signal-driven decisions** and **plugin chains** to:

1. **Route queries intelligently** across multiple specialized models (math → Qwen-Math, code → DeepSeek-Coder)
2. **Optimize costs** by using smaller models for simple tasks, larger models for complex ones
3. **Secure LLM systems** with built-in jailbreak, PII, and hallucination detection
4. **Reduce latency** through semantic caching (10-100× speedup)
5. **Enable model collaboration** through Mixture-of-Models (MoM) architecture

### Key Features

**Signal-Driven Decision Engine:**
- **10 signal types**: keyword , embedding, domain/MMLU, fact_check, user_feedback, preference, language, latency (TPOT/TTFT), context, complexity
- **Flexible combination**: AND/OR operators for complex routing logic
- **Multi-signal fusion**: Combine signals for higher accuracy than single classifiers

**Plugin Chain Architecture:**
- `semantic-cache` - 10-100× latency reduction for similar queries
- `jailbreak` - Adversarial prompt detection and blocking
- `pii` - Personally identifiable information detection
- `system_prompt` - Dynamic system prompt injection per route
- `header_mutation` - HTTP header manipulation for routing control
- `hallucination` - Token-level hallucination detection during generation

**Model Training:**
https://huggingface.co/llm-semantic-router

### Why This Belongs in Emerging Techniques

1. **Novel approach**: System-level intelligence for MoM (vs. model-level MoE)
2. **Production-ready**: Used in real-world vLLM deployments
3. **Research-backed**: NeurIPS 2025 MLForSys paper, ICLR 2026 RouterArena #1 ranking
4. **Cost-effective**: 80-90% cost reduction vs. always using largest model
5. **Active development**: Regular releases, bi-weekly community meetings, AMD partnership

### Proposed Skill Structure

```
19-emerging-techniques/semantic-routing/
├── SKILL.md                    # 200-500 lines main guidance
├── references/
│   ├── README.md              # Architecture overview
│   ├── signals.md             # 10 signal types deep dive
│   ├── plugins.md             # Plugin chain architecture
│   ├── training.md            # ModernBERT + LoRA training guide
│   ├── deployment.md          # Docker/Kubernetes deployment
│   ├── api.md                 # API reference
│   └── issues.md              # Common issues and solutions
└── examples/
    ├── basic-routing.yaml     # Simple keyword routing
    ├── multi-signal.yaml      # Complex signal combination
    └── production-stack.yaml  # Full production setup
```

### Content Outline

**SKILL.md (200-500 lines):**

1. **When to Use**
   - Multi-model collaboration scenarios
   - Cost optimization needs
   - Security requirements (jailbreak/PII/hallucination)
   - Semantic caching for latency reduction

2. **Quick Start**
   ```bash
   pip install vllm-sr
   vllm-sr serve
   ```

3. **Core Concepts**
   - Mixture of Models (MoM) vs. Mixture of Experts (MoE)
   - Signal-Driven Decisions (10 signal types overview)
   - Plugin Chain Architecture (6 plugins overview)

4. **Two Complete Workflows with Checklists**
   - Workflow 1: Basic Multi-Model Routing
     - [ ] Define signals (keyword + domain)
     - [ ] Configure decision rules (AND/OR)
     - [ ] Set model mappings
     - [ ] Test routing decisions
     - [ ] Validate routing accuracy
   
   - Workflow 2: Production Deployment
     - [ ] Configure security plugins (jailbreak + PII)
     - [ ] Enable semantic cache
     - [ ] Set up monitoring metrics
     - [ ] Configure multiple backend models
     - [ ] Load testing
     - [ ] Deploy to Kubernetes

5. **When to Use vs Alternatives**
   - vs. LiteLLM (simple routing only)
   - vs. LangChain Router (slow LLM-based routing)
   - vs. Hand-written if-else (hard to maintain)

6. **Common Issues**
   - Signal conflicts resolution
   - Inaccurate routing decisions
   - High latency troubleshooting
   - Low cache hit rate optimization
   - Model loading failures

**references/ (300KB+ target):**

- **signals.md**: Detailed documentation of all 10 signal types with configuration examples, latency comparison, use cases, and combination strategies
- **plugins.md**: Deep dive into 6 plugins, plugin development guide, execution order
- **training.md**: Why ModernBERT, 4 classifier models, LoRA training methodology, datasets, performance metrics
- **deployment.md**: Docker Compose, Kubernetes + Helm, production configuration, performance tuning, observability
- **api.md**: OpenAI-compatible API, routing API, classification API, configuration API
- **issues.md**: Real GitHub issues, common errors and solutions, debugging methods

**examples/:**

- **basic-routing.yaml**: Simple keyword-based routing
- **multi-signal.yaml**: Multi-signal combination (keyword + domain + embedding)
- **production-stack.yaml**: Full production config with plugins, monitoring, multiple models

### Key Highlights to Emphasize

**Why Use Semantic Router?**
- **Cost optimization**: Use Llama-3-8B for simple queries, GPT-4 for complex ones
- **Quality improvement**: Route math to Qwen-Math, code to DeepSeek-Coder
- **Security built-in**: Jailbreak, PII, hallucination detection
- **Performance boost**: 10-100× latency reduction via semantic cache

**Core Advantages:**
1. **Multi-signal fusion**: 10 signals combined > single classifier
2. **Low latency**: keyword 1ms, embedding 10-50ms, domain 50-100ms
3. **Extensible**: Plugin architecture for custom signals and processing
4. **Production-ready**: Kubernetes-native, Prometheus metrics, OpenTelemetry tracing

### Resources

- **GitHub**: https://github.com/vllm-project/semantic-router (513 source files)
- **Documentation**: https://vllm-semantic-router.com (24,000+ lines)
- **Paper**: [When to Reason: Semantic Router for vLLM](https://arxiv.org/abs/2510.08731) (NeurIPS 2025)
- **Blog**: https://blog.vllm.ai/2025/09/11/semantic-router.html
- **Community**: vLLM Slack #semantic-router channel

### Acceptance Criteria

- [ ] SKILL.md with proper YAML frontmatter (`name: semantic-routing`)
- [ ] 200-500 lines of focused guidance in SKILL.md
- [ ] 300KB+ reference documentation from official sources
- [ ] At least 2 complete workflows with checklists
- [ ] Code examples with language tags (```yaml, ```bash, ```python)
- [ ] "When to use vs alternatives" section
- [ ] Common issues and solutions section
- [ ] References one level deep from SKILL.md (no nested references)
- [ ] Examples directory with 3 runnable configuration files

### Related Skills

- **12-inference-serving/vllm** - vLLM inference engine (backend for semantic router)
- **14-agents/langchain** - Agent frameworks that can benefit from intelligent routing
- **15-rag** - RAG systems that benefit from semantic caching and routing
- **16-prompt-engineering/dspy** - Prompt optimization with routing decisions

---

**Labels**: `enhancement`, `new-skill`, `emerging-techniques`, `documentation`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Semantic Routing or Mixture-of-Models skill to Emerging Techniques #23

Issue Description

Overview

What is Semantic Routing?

Key Features

Why This Belongs in Emerging Techniques

Proposed Skill Structure

Content Outline

Key Highlights to Emphasize

Resources

Acceptance Criteria

Related Skills

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Semantic Routing or Mixture-of-Models skill to Emerging Techniques #23

Description

Issue Description

Overview

What is Semantic Routing?

Key Features

Why This Belongs in Emerging Techniques

Proposed Skill Structure

Content Outline

Key Highlights to Emphasize

Resources

Acceptance Criteria

Related Skills

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions