A Python implementation task to work with AI image generation and analysis capabilities via DIAL API
By completing these tasks, you will learn:
- How to generate images from text prompts using DALL-E 3
- How to analyze images using different AI models (GPT-4o, Claude-3-Sonnet)
- Two different approaches for handling images in AI systems:
- OpenAI approach: Base64 encoding for direct embedding
- DIAL approach: Bucket storage with attachment references
- How to work with file uploads, downloads with DIAL bucket
- How DIAL adapts requests for different AI model vendors
- Python 3.11+
- pip
- API key for DIAL service
-
Install dependencies:
pip install -r requirements.txt
-
Set your API key:
- Ensure that you are connected to the EPAM VPN
- Get the DIAL API key here: https://support.epam.com/ess?id=sc_cat_item&table=sc_cat_item&sys_id=910603f1c3789e907509583bb001310c
- Update the
API_KEYconstant intask/_utils/constants.py - Get available models from: https://ai-proxy.lab.epam.com/openai/models
-
Project structure:
task/ ├── _models/ │ ├── conversation.py # ✅ Complete │ ├── message.py # ✅ Complete │ ├── role.py # ✅ Complete │ └── custom_content.py # ✅ Complete ├── _utils/ │ ├── model_client.py # ✅ Complete │ ├── bucket_client.py # ✅ Complete │ ├── constants.py # ✅ Complete │ └── request.py # ✅ Complete ├── image_to_text/ │ ├── openai/ │ │ ├── message.py # ✅ Complete │ │ └── task_openai_itt.py # 🚧 TODO │ └── task_dial_itt.py # 🚧 TODO └── text_to_image/ └── task_tti.py # 🚧 TODO dialx-banner.png # 📁 Sample image
Complete the implementation of these three practice files:
Goal: Analyze an image using base64 encoding approach
- Create DialModelClient with GPT-4o model (and other models)
- Encode image as base64 data URL
- Send ContentedMessage with text and image content
- Key Learning: Direct image embedding in messages
Goal: Analyze an image using bucket storage approach
- Upload image to DIAL bucket storage
- Create message with attachment reference
- Test with different AI models
- Key Learning: File storage and attachment handling
Goal: Generate images from text prompts
- Create text prompt for image generation
- Use DALL-E 3 model for generation
- Download and save generated images
- Experiment with
size,qualityandstyleof output viacustom_fieldsconfiguration parameter - Key Learning: AI image generation and file handling
- Generated image file saved locally with timestamp
- Console output showing request/response details
- AI description of the
dialx-banner.pngimage - Comparison between different models' responses
Common Issues:
- Empty API key: Update
constants.pywith your DIAL API key - VPN connection: Ensure EPAM VPN is active
- File not found: Verify
dialx-banner.pngexists in project root - Network errors: Check DIAL service status and connectivity
Once you complete the basic tasks, try these extensions:
- Multi-model comparison: Run the same image analysis with different models
- Custom prompts: Create your own text-to-image prompts
- Batch processing: Analyze multiple images at once
- Error handling: Add robust error handling and retries
- Multimodal AI: Working with both text and images
- API Design Patterns: Different approaches to handle media content
- Async Programming: File operations and HTTP requests
- Model Abstraction: How DIAL Core adapts requests across vendors
- File Management: Upload, download, and storage operations
