Ezi is a zero-friction AI lecture assistant designed to help students and professionals convert spoken content into structured study material. By leveraging the Google Gemini API, Ezi automates the most time-consuming parts of studying.
- Live Transcription: Real-time speech-to-text directly in the browser.
- YouTube Import: Fetch transcripts from educational videos via URL.
- AI Study Aids: Instantly generate:
- Structured Summaries & Key Terms.
- Interactive 3D Flip Flashcards.
- Multiple-Choice Quizzes with explanations.
- Markdown Study Notes with "Deep Dive" explanations.
- Visual Mind Maps (Tree/Flow layouts) with Zoom/Pan.
- AI Tutor Chat: Ask specific questions about the lecture content.
- Local Persistence: All data is saved to your browser's
localStorage.
Stop paying $20/month for AI wrappers. This project is 100% free code that runs on your own keys.
- Google Gemini AI: Uses the Free Tier API key. You get a massive amount of free generation and transcription every month directly from Google.
- YouTube Transcripts: Powered by RapidAPI. The free plan includes 100 requests per month, which is plenty for most students.
- Need more? You can upgrade to a pro plan on RapidAPI if you're a heavy user (pricing visible on their page).
| Youtube Import | MindMap |
|---|---|
![]() |
![]() |
| Flashcard | Quiz |
|---|---|
![]() |
![]() |
| Study Notes | AI Chat |
|---|---|
![]() |
![]() |
| Transcripts | Live Transcript |
|---|---|
![]() |
![]() |
This project is intended for LOCAL USE ONLY.
- Client-Side API Usage: This application initializes the Gemini SDK (
@google/genai) on the frontend. In a production/online environment, yourAPI_KEYwould be exposed to anyone inspecting the network traffic. - Environment Variables: RapidAPI and Gemini keys are accessed via
process.env. - Production Recommendation: To use this online, you must port the API calls (Gemini and RapidAPI) to a secure backend (Node.js, Python, etc.) to act as a proxy and keep your keys hidden.
- Node.js (v18 or higher recommended) - Download here
- A modern web browser (Chrome, Edge, or Brave recommended for microphone APIs).
- A Google Gemini API Key (obtainable from Google AI Studio).
- A RapidAPI Key for the YouTube Transcript API.
npm installCreate a .env.local file in the project root with the following variables:
GEMINI_API_KEY=your_gemini_api_key_here
RAPID_API_KEY=your_rapidapi_key_here
RAPID_API_HOST=youtube-captions-transcript-subtitles-video-combiner.p.rapidapi.comImportant: The
.env.localfile is gitignored for security. Never commit your API keys.
npm run devThe app will start at http://localhost:3000
When prompted, allow the browser to access your Microphone to enable the recording feature.
This project uses Vite as the build tool. Environment variables are loaded from .env.local and exposed to the app via vite.config.ts:
define: {
'process.env.GEMINI_API_KEY': JSON.stringify(env.GEMINI_API_KEY),
'process.env.RAPID_API_KEY': JSON.stringify(env.RAPID_API_KEY),
'process.env.RAPID_API_HOST': JSON.stringify(env.RAPID_API_HOST)
}Note: If you change environment variables, you must restart the dev server for changes to take effect.
- Framework: React 19 with TypeScript
- Build Tool: Vite
- Styling: Tailwind CSS with Typography plugin.
- AI Model:
gemini-2.5-flash(Optimized for speed and high context window). - Visualization: Mermaid.js for Mind Maps.
- Markdown: Marked.js for note rendering.
- Data: Handled via standard Web Storage API (
localStorage).
This app uses a hybrid approach for live transcription:
| Component | Technology |
|---|---|
| Audio Capture | Browser Web API (getUserMedia, AudioContext) |
| Transcription | Google Gemini Live API (@google/genai) |
The Gemini Live API is activated when the user clicks the microphone button to start a new recording. The flow is:
- User navigates to "New Recording" or clicks "Continue Recording" on an existing lecture
- User selects their microphone from the dropdown (optional)
- User clicks the microphone button to start recording
- This triggers the
startRecording()function which:- Initializes
GoogleGenAIwith yourGEMINI_API_KEY - Opens a WebSocket connection via
ai.live.connect() - Begins streaming audio to Gemini in real-time
- Initializes
[User Clicks Mic]
β getUserMedia() captures audio
β AudioContext processes PCM at 16kHz
β ai.live.connect() opens WebSocket to Gemini
β Audio chunks sent via session.sendRealtimeInput()
β Gemini returns inputTranscription events
β Text displayed in real-time
| Setting | Value |
|---|---|
| Model | gemini-2.5-flash-native-audio-preview-09-2025 |
| Audio Format | PCM 16-bit, 16kHz sample rate, mono |
| Response Type | inputTranscription (real-time text) |
| System Prompt | "You are a professional stenographer. Transcribe the user's speech exactly as spoken." |
Note: This provides better accuracy than the browser's native
webkitSpeechRecognitionAPI, especially for longer recordings and complex audio.
| Issue | Solution |
|---|---|
| "RapidAPI configuration missing" | Ensure RAPID_API_KEY and RAPID_API_HOST are set in .env.local, then restart the dev server |
| Microphone not working | Ensure you are served over localhost or https. Browsers block microphone access on insecure http origins |
| AI Content failing | Check the browser console. Usually due to an invalid/expired GEMINI_API_KEY or free-tier rate limits |
| YouTube Import Error | Verify the video has captions enabled and your RapidAPI credentials are correct |
| Changes to .env.local not working | Restart the dev server (npm run dev) after modifying environment variables |







