Skip to content

Commit 14ebed4

Browse files
committed
add image captioning
1 parent 5378bdd commit 14ebed4

File tree

7 files changed

+598
-26
lines changed

7 files changed

+598
-26
lines changed

.env.example

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,7 @@ PACKAGE_NAME=org.yourname.appname
88
MENTRAOS_API_KEY=your_api_key_here
99

1010
# Suno API key - Your Suno HackMIT API key for music generation (optional - only needed for song generation feature)
11-
SUNO_API_KEY=your_suno_api_key_here
11+
SUNO_API_KEY=your_suno_api_key_here
12+
13+
# Anthropic API key - Your Claude API key for photo captioning (optional - captions will be skipped if not provided)
14+
ANTHROPIC_API_KEY=your_anthropic_api_key_here

CLAUDE.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,11 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
44

55
## Project Overview
66

7-
This is an enhanced MentraOS application that combines camera functionality, voice transcription, and AI music generation. The app demonstrates how to:
7+
This is an enhanced MentraOS application that combines camera functionality, voice transcription, AI photo captioning, and AI music generation. The app demonstrates how to:
88
- Take photos from smart glasses with voice activation
9+
- Generate automatic captions for photos using Claude Vision API
910
- Maintain photo galleries and transcription history
10-
- Generate AI music using Suno API based on selected photos and transcriptions
11+
- Generate contextual AI music using Suno API with photo captions and transcriptions
1112
- Provide a rich web interface for content selection and music creation
1213

1314
## Development Commands
@@ -25,6 +26,7 @@ The application requires a `.env` file with these variables:
2526
- `PACKAGE_NAME`: Unique app identifier matching MentraOS Developer Console
2627
- `MENTRAOS_API_KEY`: API key from MentraOS Developer Console
2728
- `SUNO_API_KEY`: API key from Suno for music generation (optional)
29+
- `ANTHROPIC_API_KEY`: Claude API key for photo captioning (optional)
2830

2931
Copy `.env.example` to `.env` and configure these values before running the app.
3032

@@ -34,8 +36,9 @@ Copy `.env.example` to `.env` and configure these values before running the app.
3436

3537
- **ExampleMentraOSApp class** (`src/index.ts`): Enhanced application server extending `@mentra/sdk` AppServer
3638
- Handles MentraOS session lifecycle with photo capture and voice transcription
37-
- Manages photo galleries and transcription history per user
38-
- Integrates with Suno API for AI music generation
39+
- Manages photo galleries with automatic Claude Vision captioning
40+
- Stores transcription history per user
41+
- Integrates with Suno API for contextual AI music generation
3942
- Provides comprehensive REST API endpoints
4043

4144
- **Data Management**: In-memory storage system with Maps tracking:
@@ -181,7 +184,9 @@ Requires a `.env.local` file with:
181184
- All operations are user-scoped and require MentraOS authentication
182185
- Voice transcription only processes final speech results, ignoring interim transcriptions
183186
- Multiple activation phrases are supported for natural voice interaction
184-
- Suno integration requires a separate API key and builds prompts from selected content
187+
- Photo captions are generated asynchronously using Claude Vision API (Claude 3.5 Sonnet)
188+
- Caption generation is optional - photos work without captions if no Anthropic API key
189+
- Suno integration uses photo captions and transcriptions to create contextual song prompts
185190
- The webview interface includes auto-refresh every 10 seconds for real-time updates
186191
- Song generation uses individual polling every 5 seconds per active generation
187192
- Streaming audio is available ~30-60 seconds after generation starts

bun.lock

Lines changed: 3 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)