Traveler Application is a multimodal processing pipeline built with Python, Flask, and integrated with Discord for real-time communication. The application handles various types of inputs including GPS data, images, audio, and text messages, and provides dynamic responses such as restaurant recommendations, image analysis, and voice synthesis.
-
Multimodal Input Handling:
- GPS Data: Processes latitude, longitude, street, and city details to provide location-based recommendations.
- Image Processing: Supports image uploads, including resizing for large files, image analysis, and contextual interpretation.
- Audio Processing: Converts uploaded audio to text using speech-to-text transcription and synthesizes voice responses using TTS.
- Text Messages: Processes user text messages to generate context-sensitive responses.
-
Flask API Endpoints:
- /upload: The main endpoint that handles incoming POST requests, validates API keys, and processes data through multiple pipelines based on input type.
-
Discord Integration:
- Real-time Notifications: Sends processed responses directly to Discord channels using a custom Discord bot, handling both text and multimedia content.
- Background Processing: Utilizes asynchronous task execution (with ThreadPoolExecutor and asyncio) for efficient message handling and to avoid blocking operations.
The application pipeline is centralized within the routes.py file, which orchestrates the following workflow:
-
Request Validation and Data Extraction:
- Validates API keys and extracts data from form fields including GPS coordinates, image/audio files, and additional messages.
-
Conditional Processing:
- GPS Only: Generates restaurant recommendations using location data. It builds a system prompt and integrates with external search functions.
- Image + GPS: Processes images by checking file size, optionally resizing them, and then sending the image for analysis to generate contextual insights.
- Image + Text + GPS: Uses the uploaded image along with user messages to generate a detailed response through an advanced language model.
- Audio + GPS: Converts audio to text via speech-to-text services, then processes the text similarly to generate context-aware responses.
- Fallback Handling: If the input does not match any specific case, the application compiles the available data and sends a basic notification to Discord.
-
Response Generation:
- Employs
generate_content_with_historyto integrate dynamic LLM responses based on a system prompt generated by theSystem_Promptfunction. This function adapts its prompt based on the nature of inputs (e.g., GPS, image, audio).
- Employs
-
Discord Messaging:
- Processes responses using Discord bot functions like
send_location_to_discordandsend_text_to_channelto deliver message content, images, and audio files in an asynchronous manner.
- Processes responses using Discord bot functions like
- Backend: Python with Flask for API endpoint management.
- Asynchronous Execution: Utilizes
concurrent.futures.ThreadPoolExecutorandasynciofor non-blocking operations. - Multimodal Processing:
- Image Management: Resizes uploaded images for optimal processing.
- Audio Transcription: Integrates with speech-to-text services (such as Whisper) for audio processing.
- Text-to-Speech (TTS): Synthesizes voice responses from generated text.
- Discord Bot Integration: Custom bot implementation for real-time notifications and user interaction.
- Utility Modules: Managed via helper functions for GPS time conversion, unique filename generation, and search functionalities (e.g.,
maps_search_nearby).
-
Intelligent System Prompts: The application dynamically generates system prompts through the
System_Promptfunction that adapts based on the type of input (GPS only, Image+GPS, Image+Text+GPS, etc.), providing contextualized responses tailored to the user's needs. -
Web Search Integration: Incorporates web search capabilities via DuckDuckGo API (
get_search_resultsfunction) to enhance responses with real-time information from the internet. -
Context-Aware Language Processing: Maintains conversation history through the
Global_Historyvariable, allowing for contextually relevant responses that remember past interactions. -
Time and Location Awareness: Uses TimezoneFinder to determine the local time based on GPS coordinates, enhancing the relevance of recommendations (e.g., breakfast restaurants in the morning, dinner venues in the evening).
-
Automatic Image Optimization: For large images (over 7.5MB), the application automatically resizes them before processing or sending to Discord, ensuring optimal performance.
-
Error Handling and Resilience: Implements robust error handling throughout the pipeline, with detailed logging to track application behavior and troubleshoot issues.
Traveler/
├── api/
│ └── routes.py # Main application pipeline handling multimodal data and routing logic.
├── discord_bot/
│ └── bot.py # Discord bot setup and functions for message handling.
├── utils/
│ ├── image_resize.py # Image resizing utilities.
│ ├── new_utils.py # Utility functions for GPS conversion, search, filename generation, etc.
│ └── whisper_gen.py # Audio transcription and TTS synthesis.
├── config.py # Configuration settings and API keys.
├── main.py # Application entry point.
├── README.md # This file.
└── requirements.txt # Project dependencies.
When processing location data, the application generates restaurant recommendations that include:
- Local Time-Sensitive Suggestions: Breakfast options in the morning, lunch venues at midday, dinner restaurants in the evening.
- Personalized Recommendations: Takes into account user preferences stored in the system.
- Detailed Information: Restaurant names, addresses, ratings, price levels, and opening hours when available.
- Contextual Reasoning: Explanations for why each restaurant is recommended (e.g., high ratings, cuisine type matching preferences, proximity to location).
Example Output:
# Restaurant Recommendations for Downtown Seattle
Based on your current location and the time (8:30 AM), here are some breakfast options:
## 1. The Crumpet Shop (4.7⭐)
- **Address:** 1503 1st Ave, Seattle
- **Details:** Authentic British crumpets and tea in a casual setting
- **Why:** Highly-rated breakfast spot with house-made crumpets and organic ingredients
## 2. Biscuit Bitch (4.5⭐)
- **Address:** 1909 1st Ave, Seattle
- **Details:** Southern-style breakfast with various biscuit options
- **Why:** Popular local spot with generous portions and creative menu itemsWhen processing images, the system provides:
- Visual Content Description: Detailed analysis of what appears in the image.
- Contextual Interpretation: Commentary relevant to the user's location and query.
- Actionable Insights: Suggestions or recommendations based on image content.
Example Output:
# Image Analysis
I can see you've shared a photo of Pike Place Market's iconic sign and entrance. This famous market is one of Seattle's most popular attractions.
## Highlights Nearby:
- The original Starbucks store (just a block away)
- Fresh seafood vendors with their famous fish-throwing tradition
- Local artisan crafts and specialty food shops
Would you like recommendations for things to do at Pike Place Market, or places to eat nearby?When combining different input types (e.g., image + audio + location):
- Integrated Analysis: Combines insights from all sources to provide a comprehensive response.
- Multilingual Support: Detects the language of audio/text input and responds in the same language.
- Rich Media Responses: Can include synthesized audio replies along with text.
- Dynamic Responses: Provides real-time, context-aware outputs for varied input types including location data, images, audio, and text.
- Enhanced User Experience: Seamless integration with Discord ensures users receive prompt notifications and interactive responses.
- Scalability: The modular design facilitates easy expansion and integration with advanced AI and processing tools.
- Versatility: Suitable for travel recommendations, digital concierge services, and any application requiring multimodal data processing.
- Travel Guide Services: Provides on-demand information about local attractions, restaurants, and activities based on the user's current location.
- Accessibility Tool: Audio transcription and voice synthesis make information accessible for users with different needs.
- Remote Collaboration: Allows teams to share location-based insights with rich context and media.
- Customer Service Enhancement: Can be integrated into support systems to handle multimodal customer inquiries with detailed, contextual responses.
- Event Documentation: Captures and processes information from events, providing structured, analyzable data with geographic context.
-
Install Dependencies:
- Run
pip install -r requirements.txtto install all necessary packages.
- Run
-
Configuration:
- Update
config.pywith your API keys, upload directories, and other configuration parameters.
- Update
-
Run the Application:
- Start the Flask server by running
python main.py.
- Start the Flask server by running
-
Discord Setup:
- Ensure the Discord bot is properly configured and active for real-time communication.
Traveler Application leverages robust multimodal input handling and advanced language processing to deliver tailored, instantaneous responses. Its integration with Discord enhances user interaction, making it a versatile and scalable solution for modern travel and digital concierge services.