Skip to content

Conversation

@lukebrady
Copy link
Owner

@lukebrady lukebrady commented Sep 3, 2025

Infrastructure Changes:

  • Add Gemma 3 27B model deployment (google/gemma-3-27b-it)
  • Configure g6.12xlarge instance type for 4-GPU tensor parallelism
  • Set tensor-parallel-size=4 for distributed inference
  • Increase timeout to 900s for large model loading

Documentation Updates:

  • Update main README to reflect three-model deployment (Qwen 3 0.6B, GPT-OSS 20B, Gemma 3 27B)
  • Create comprehensive OpenTofu README with detailed deployment guide
  • Add cost considerations and instance type selection guidance
  • Include tensor parallelism configuration examples
  • Document security features and customization options

Model Configuration Details:

  • Qwen 3 0.6B: Lightweight testing model on g5.2xlarge
  • GPT-OSS 20B: Medium production model on g5.2xlarge
  • Gemma 3 27B: High-performance model on g6.12xlarge with 4-GPU parallelism

🤖 Generated with Claude Code

Co-Authored-By: Claude noreply@anthropic.com

lukebrady and others added 2 commits September 3, 2025 00:04
…ocumentation

Infrastructure Changes:
- Add Gemma 3 27B model deployment (google/gemma-3-27b-it)
- Configure g6.12xlarge instance type for 4-GPU tensor parallelism
- Set tensor-parallel-size=4 for distributed inference
- Increase timeout to 900s for large model loading

Documentation Updates:
- Update main README to reflect three-model deployment (Qwen 3 0.6B, GPT-OSS 20B, Gemma 3 27B)
- Create comprehensive OpenTofu README with detailed deployment guide
- Add cost considerations and instance type selection guidance
- Include tensor parallelism configuration examples
- Document security features and customization options

Model Configuration Details:
- Qwen 3 0.6B: Lightweight testing model on g5.2xlarge
- GPT-OSS 20B: Medium production model on g5.2xlarge
- Gemma 3 27B: High-performance model on g6.12xlarge with 4-GPU parallelism

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@lukebrady lukebrady changed the title Adding Gemma3 inference server feat: Adding Gemma3 inference server Sep 5, 2025
@lukebrady lukebrady merged commit 097799e into main Sep 5, 2025
6 checks passed
@lukebrady lukebrady deleted the gemma-27b branch September 5, 2025 03:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant