Skip to content

khshanovskyi/ai-dial-content-generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Image Generation and Analysis with DIAL API

A Python implementation task to work with AI image generation and analysis capabilities via DIAL API

🎓 Learning Goals

By completing these tasks, you will learn:

  • How to generate images from text prompts using DALL-E 3
  • How to analyze images using different AI models (GPT-4o, Claude-3-Sonnet)
  • Two different approaches for handling images in AI systems:
    • OpenAI approach: Base64 encoding for direct embedding
    • DIAL approach: Bucket storage with attachment references
  • How to work with file uploads, downloads with DIAL bucket
  • How DIAL adapts requests for different AI model vendors

📋 Requirements

  • Python 3.11+
  • pip
  • API key for DIAL service

🔧 Setup

  1. Install dependencies:

    pip install -r requirements.txt
  2. Set your API key:

  3. Project structure:

    task/
    ├── _models/
    │   ├── conversation.py          # ✅ Complete
    │   ├── message.py               # ✅ Complete  
    │   ├── role.py                  # ✅ Complete
    │   └── custom_content.py        # ✅ Complete
    ├── _utils/
    │   ├── model_client.py          # ✅ Complete
    │   ├── bucket_client.py         # ✅ Complete
    │   ├── constants.py             # ✅ Complete
    │   └── request.py               # ✅ Complete
    ├── image_to_text/
    │   ├── openai/
    │   │   ├── message.py           # ✅ Complete
    │   │   └── task_openai_itt.py   # 🚧 TODO
    │   └── task_dial_itt.py         # 🚧 TODO
    └── text_to_image/
        └── task_tti.py              # 🚧 TODO
    dialx-banner.png                 # 📁 Sample image
    

📝 Your Tasks

If the task in the main branch is hard for you, then switch to the with-detailed-description branch

Complete the implementation of these three practice files:

1. task_openai_itt.py - OpenAI-Style Image Analysis

Goal: Analyze an image using base64 encoding approach

  • Create DialModelClient with GPT-4o model (and other models)
  • Encode image as base64 data URL
  • Send ContentedMessage with text and image content
  • Key Learning: Direct image embedding in messages

2. task_dial_itt.py - DIAL-Style Image Analysis

Goal: Analyze an image using bucket storage approach

  • Upload image to DIAL bucket storage
  • Create message with attachment reference
  • Test with different AI models
  • Key Learning: File storage and attachment handling

3. task_tti.py - Text-to-Image Generation

Goal: Generate images from text prompts

  • Create text prompt for image generation
  • Use DALL-E 3 model for generation
  • Download and save generated images
  • Experiment with size, quality and style of output via custom_fields configuration parameter
  • Key Learning: AI image generation and file handling

🎯 Expected Outputs

Text-to-Image Task

  • Generated image file saved locally with timestamp
  • Console output showing request/response details

Image-to-Text Tasks

  • AI description of the dialx-banner.png image
  • Comparison between different models' responses

🛠️ Troubleshooting

Common Issues:

  • Empty API key: Update constants.py with your DIAL API key
  • VPN connection: Ensure EPAM VPN is active
  • File not found: Verify dialx-banner.png exists in project root
  • Network errors: Check DIAL service status and connectivity

🌟 Bonus Challenges

Once you complete the basic tasks, try these extensions:

  1. Multi-model comparison: Run the same image analysis with different models
  2. Custom prompts: Create your own text-to-image prompts
  3. Batch processing: Analyze multiple images at once
  4. Error handling: Add robust error handling and retries

📚 Key Concepts Covered

  • Multimodal AI: Working with both text and images
  • API Design Patterns: Different approaches to handle media content
  • Async Programming: File operations and HTTP requests
  • Model Abstraction: How DIAL Core adapts requests across vendors
  • File Management: Upload, download, and storage operations

About

Task to work with different content generation via DIAL API

Topics

Resources

Stars

Watchers

Forks

Languages