Read this blog once for more context,
A RAG based GenAI prompting application that showcases the genAI Patterns.
This application is capable of answering customer queries (like suggest me a good phone) based on the product catalog and customer purchase history.
This setup runs completely on local developer machine, and to be used for poc/learning purpose.
- Python: The programming language used for the script.
- Ollama: A local LLM server that runs on your machine.
- Docker: A containerization platform used to run Qdrant.
- Qdrant: A vector database used for storing and retrieving embeddings.
Follow these steps to set up and run the script:
brew install ollama
ollama serve &
ollama pull llama3.2:latest
ollama pull all-minilm
ollama pull llama-guard3:1b brew install docker
docker run -p 6333:6333 -p 6334:6334 qdrant/qdrant python3 -m venv venv
source venv/bin/activate pip install -r requirements.txt- Start the app
uvicorn src.api.product_chat:app --host 0.0.0.0 --port 8000- Trigger the API
curl -X POST "http://localhost:8000/product/prompt" \
-H "Content-Type: application/json" \
-d '{ "customer_id": "1", "customer_name": "Bob","question": "Can you suggest me a good phone?"}'deactivateThis application is to be used for learning purposes only, specifically to explore how a genAI application would look like, and to help understand these patterns better. Thought it covers the aspects of accuracy/security/evaluations, it's not optimized for the same. There are lots of tools/frameworks (refer the AI Stack image below) available in the market for each of the aspects covered in the application, with more tools comes the more power to the app. Explore them for production use cases.





