Skip to content

This project provides a unified pipeline for speech transcription, diarization, and emotion analysis using various open source models. It includes a Gradio-based UI for user interaction.

License

Notifications You must be signed in to change notification settings

fvanevski/transmotify

Repository files navigation

Speech Transcription and Emotion Analysis

## Overview
This project provides a unified pipeline for speech transcription, diarization, and emotion analysis using WhisperX and BERT. It includes a Gradio-based UI for user interaction.

## Features
- Transcription using WhisperX
- Diarization to identify speakers
- Emotion analysis using BERT
- Post-processing for speaker labeling and emotion summary
- Batch processing support
- Robust error handling and logging

## Installation
1. Clone the repository:
   ```sh
   git clone https://github.com/your-username/speech-transcription-system.git
   cd speech-transcription-system
   ```
2. Create and activate a virtual environment:
   ```sh
   virtualenv venv
   .\venv\Scripts\activate  # On Windows
   source venv/bin/activate  # On macOS and Linux
   ```
3. Install dependencies:
   ```sh
   pip install -r requirements.txt
   ```

## Usage
1. Run the application:
   ```sh
   python main.py
   ```
2. Open the Gradio UI in your web browser and follow the instructions to upload files and perform tasks.

## Directory Structure
```
speech_transcription_system/
├── config/
│   └── config.py
├── core/
│   ├── __init__.py
│   ├── transcription.py
│   ├── diarization.py
│   ├── emotion_analysis.py
│   ├── file_management.py
│   └── utils.py
├── ui/
│   ├── __init__.py
│   ├── main_gui.py
│   └── postprocess_gui.py
├── logs/
├── temp/
├── output/
├── requirements.txt
└── main.py
```

## Configuration
- `config.json`: Configuration file for output directories, batch size, and log level.

## Contributing
Contributions are welcome! Please open an issue or submit a pull request.

## License
This project is licensed under the MIT License.
```

4. Create a Pull Request

  • Push Your Changes:

    • Ensure your changes are committed and pushed to the refactor-core branch:
      git add .
      git commit -m "Refactor core modules and improve directory structure"
      git push origin refactor-core
  • Create a Pull Request:

    • Go to your GitHub repository.
    • Click on the "New pull request" button.
    • Select the refactor-core branch as the base branch and the main (or master) branch as the compare branch.
    • Write a detailed description of the changes you made.
    • Request a code review from your team or peers.

Additional Tips

  • Continuous Integration (CI):

    • Consider setting up a CI/CD pipeline to automate testing and deployment.
    • Tools like GitHub Actions, GitLab CI, or Jenkins can be used for this purpose.
  • Code Linting and Formatting:

    • Use tools like flake8, black, or pylint to ensure your code is well-formatted and follows best practices.
  • Documentation:

    • Consider using tools like Sphinx or MkDocs to generate comprehensive documentation for your project.

By following these steps, you can ensure that your refactored code is well-organized, thoroughly tested, and ready for integration into the main branch. If you have any more questions or need further assistance, feel free to ask!

About

This project provides a unified pipeline for speech transcription, diarization, and emotion analysis using various open source models. It includes a Gradio-based UI for user interaction.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages