Welcome to the Audio Transcription App! This repository contains a Python application that leverages the power of OpenAI and the FastAPI framework to transcribe audio files. With this app, you can easily upload MP3 audio files, which will then be converted into text using OpenAI's transcription model, providing you with the transcribed content.
This application is built on the following technologies:
- Python: A versatile programming language known for its simplicity and readability.
- OpenAI API: Utilized to transcribe audio content and convert it into text.
- FastAPI: A modern, fast, and highly performant web framework for building APIs with Python.
- pydub: A library for audio file manipulation, used to handle audio conversion and processing.
- tempfile: A built-in Python module used for managing temporary files.
Before you begin, ensure you have the following:
- Python (>=3.6) installed on your system.
- An OpenAI API key. If you don't have one, you can sign up and obtain an API key from the OpenAI website.
- Basic familiarity with FastAPI, OpenAI API, audio file handling, and Python programming.
-
Clone this repository to your local machine:
git clone https://github.com/your-username/your-repo.git cd your-repo -
Install the required Python packages using
pip:pip install fastapi uvicorn openai pydub
-
Open the
main.pyfile and replace'OPENAI_API_KEY'with your actual OpenAI API key.
-
Run the FastAPI server using the following command:
uvicorn main:app
-
Once the server is running, you can access the FastAPI documentation at
http://127.0.0.1:8000/docs. Here, you can test the/api/transcribeendpoint by uploading an MP3 audio file. -
The uploaded audio file will be transcribed using the OpenAI API, and the resulting transcription will be returned as a JSON response.
-
When you upload an MP3 file to the
/api/transcribeendpoint, the application reads the binary content of the audio file. -
The binary content is converted into an
AudioSegmentobject using thepydublibrary, which makes the audio data suitable for processing. -
The
AudioSegmentis temporarily saved as an MP3 file using thetempfile.NamedTemporaryFilefunction. -
The temporary MP3 file's content is read and passed to the
transcriptionfunction, which sends an API call to the OpenAI transcription model via the OpenAI Python library. -
The transcription response is extracted from the API response and returned as a JSON response to the user.
-
The temporary MP3 file is cleaned up by deleting it.