Skip to content

Commit 3f1e7e4

Browse files
author
jpaulrajredhat
committed
otel-iceberg integration
2 parents f044520 + cfd6d10 commit 3f1e7e4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

50 files changed

+9985
-0
lines changed

.github/workflows/main.yaml

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
name: Run Podman Compose (Build & Deploy)
2+
3+
on:
4+
push:
5+
branches: [ main ]
6+
pull_request:
7+
8+
jobs:
9+
podman-compose:
10+
runs-on: ubuntu-latest
11+
12+
steps:
13+
# Step 1: Checkout repository
14+
- name: Checkout code
15+
uses: actions/checkout@v4
16+
17+
# Step 2: Install Podman & podman-compose
18+
- name: Install Podman
19+
run: |
20+
sudo apt-get update -y
21+
sudo apt-get install -y podman python3-pip
22+
pip install podman-compose
23+
echo "Podman version:"
24+
podman --version
25+
26+
# Step 3: Configure Podman storage correctly
27+
- name: Configure Podman storage
28+
run: |
29+
echo "Configuring Podman storage..."
30+
31+
# Podman default storage root under runner home
32+
STORAGE_ROOT="/home/runner/work/_containers"
33+
34+
sudo mkdir -p $STORAGE_ROOT
35+
sudo mkdir -p /etc/containers
36+
37+
# Create storage.conf if missing
38+
if [ ! -f /etc/containers/storage.conf ]; then
39+
echo "[storage]" | sudo tee /etc/containers/storage.conf
40+
fi
41+
42+
# Apply correct graphroot path
43+
sudo sed -i '/graphroot/d' /etc/containers/storage.conf
44+
echo "graphroot=\"$STORAGE_ROOT\"" | sudo tee -a /etc/containers/storage.conf
45+
46+
echo "Final /etc/containers/storage.conf:"
47+
cat /etc/containers/storage.conf
48+
49+
# Initialize storage
50+
podman system migrate
51+
52+
# Step 4: Ensure Podman network exists
53+
- name: Ensure Podman network
54+
run: |
55+
NETWORK_NAME="anomaly-network"
56+
if ! podman network exists "$NETWORK_NAME"; then
57+
echo "Creating network $NETWORK_NAME..."
58+
podman network create "$NETWORK_NAME"
59+
else
60+
echo "Network $NETWORK_NAME already exists."
61+
fi
62+
63+
# Step 5: Build images using Podman Compose
64+
- name: Build Podman images
65+
working-directory: demo
66+
run: |
67+
echo "Building images using podman-compose..."
68+
podman-compose -f docker-compose.yaml build
69+
echo "Build completed."
70+
71+
# Step 6: Start containers
72+
- name: Run Podman Compose
73+
working-directory: demo
74+
run: |
75+
echo "Starting Podman Compose containers..."
76+
podman-compose -f docker-compose.yaml up -d
77+
echo "Containers running."
78+
podman ps
79+
80+
# Step 7: Verify Podman environment
81+
- name: Verify Podman environment
82+
run: |
83+
echo "Networks:"
84+
podman network ls
85+
86+
echo "Containers:"
87+
podman ps -a
88+
89+
# Step 8: Show logs of each container
90+
- name: Show Container Logs
91+
working-directory: demo
92+
run: |
93+
echo "Displaying logs for all containers..."
94+
podman ps --format "{{.Names}}" | while read -r c; do
95+
echo "----------------------------"
96+
echo "Logs for: $c"
97+
echo "----------------------------"
98+
podman logs "$c" || echo "No logs for $c"
99+
done
100+
101+
# Step 9: Cleanup (always runs)
102+
- name: Stop and Clean Up
103+
if: always()
104+
working-directory: demo
105+
run: |
106+
echo "Stopping Podman Compose..."
107+
podman-compose -f docker-compose.yaml down
108+
echo "Cleanup completed."

AIOpsOverview.png

56 KB
Loading

demo/README.md

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
2+
# AIOPS Demo - Anomaly Detection & Remedy Generation using LLM
3+
4+
A multi-stage pipeline for detecting anomalies in CPU and memory usage of edge devices or Kubernetes clusters and generating intelligent remedy content using AI/ML and LLMs.
5+
6+
# Overview:
7+
8+
This Demo provides an end-to-end pipeline that simulates resource usage, detects anomalies, and generates actionable remedies:
9+
10+
1. Simulation: Generates synthetic CPU and memory consumption for edge devices or Kubernetes clusters.
11+
12+
2. Anomaly Detection: Uses a Random Forest model to identify anomalies.
13+
14+
3. Remedy Generation: Uses an LLM-Faiss system to produce context-aware remediation steps.
15+
16+
The pipeline is fully asynchronous and uses Redis queues for communication between components.
17+
18+
## AIOps Demo workflow and high level component overview
19+
20+
[Anomaly Simulation API (CPU/Memory)] --> [Redis Queue] --> [Anomaly Consumer] --> [Random Forest Model]
21+
22+
--> [[Redis Queue]] --> [LLM Consumer] --> [LLM-Faiss] --> Remedy Conent
23+
24+
25+
![Alt text](demo-flow.png)
26+
27+
# Components
28+
29+
**1. Anomaly Simulation API**
30+
31+
Simulates CPU and Memory usage of edge devices or Kubernetes clusters.
32+
33+
Pushes synthetic anomaly events to Redis for downstream processing.
34+
35+
**2. Anomaly Consumer**
36+
37+
Consumes messages from Redis.
38+
39+
Sends the data to the Anomaly Isolation Random Forest Model for anomaly detection.
40+
41+
**3. Anomaly Isolation (Random Forest Model)**
42+
43+
Detects anomalies in resource usage.
44+
45+
Annotates and classifies anomalies.
46+
47+
Pushes detected anomalies back to Redis.
48+
49+
**4. LLM Consumer**
50+
51+
Consumes anomaly messages from Redis.
52+
53+
Sends anomaly data to LLM-Faiss for context-aware remedy generation.
54+
55+
**5. LLM-Faiss**
56+
57+
Generates relevant remedy content using vector search.
58+
59+
Produces actionable insights for alerting or automated remediation.
60+
61+
# Key Features
62+
63+
- Simulated CPU/Memory metrics for edge devices or Kubernetes clusters.
64+
65+
- Random Forest-based anomaly detection.
66+
67+
- LLM-Faiss integration for intelligent remedy content.
68+
69+
- Fully asynchronous architecture using Redis queues.
70+
71+
# Getting Started and how to run the domo on your local laptop/desktop
72+
73+
**Step :1** Clone the repo from Git repository and navigate to demo folder as shown below
74+
75+
```bash
76+
git clone https://github.com/lfedgeai/AIOps.git
77+
78+
```
79+
80+
**Step :2** build image using docker compose :
81+
82+
```bash
83+
cd AIOps/demo
84+
docker compose build --no-cache
85+
```
86+
**Step :3** run docker compose to start postgres Database and Redis Cache/Queue. Make sure Database and Redis
87+
started with no errors
88+
```bash
89+
docker compose up -d anomaly-db redis
90+
```
91+
92+
**Step :4** run docker compose to LLM component. Make sure LLM component
93+
started with no errors
94+
95+
```bash
96+
docker compose up -d llm-faiss
97+
```
98+
99+
**Step :4** run docker compose to Anomaly component. Make sure Anomaly component
100+
started with no errors
101+
102+
```bash
103+
docker compose up -d anomaly-detection
104+
```
105+
106+
A built-in Swagger UI is provided so you can interact with the API directly from your browser.
107+
108+
Open your browser and go to the Swagger API URL (for example: http://localhost:8001/docs).
109+
110+
## How to trigger Anomaly :
111+
112+
Open your browser and go to the Swagger API URL http://localhost:8002/
113+
114+
Use the provided endpoints to trigger synthetic CPU/Memory anomaly data which triggers anomaly inference
115+
116+
**GET /generate-anomaly-data/{10}**
117+
118+
The events generaged from synthetic event will automatically flow through the pipeline:
119+
120+
Simulation API → Redis Queue → Anomaly Consumer → Random Forest Model → Redis Queue → LLM Consumer → LLM-Faiss → Remedy Content.
121+
122+
You can verify the anomaly detection and LLM content remedy from the docker console on your terminal
123+
124+
- First you can create new anomalies through the Swagger API:
125+
<img width="1447" height="1177" alt="image" src="https://github.com/user-attachments/assets/02ea0368-51c3-4788-8164-404a9f71748c" />
126+
127+
128+
- Then you'll see the entries in the terminal window:
129+
130+
<img width="1701" height="585" alt="image" src="https://github.com/user-attachments/assets/caa75a65-9733-4975-8938-d5733ec34ef5" />
131+
132+
- And the recommended remidiation steps:
133+
<img width="1707" height="604" alt="image" src="https://github.com/user-attachments/assets/3670c597-c054-4aa8-a4c9-116aaeb3156c" />
134+
135+
136+
137+
**To view Anomaly remedy content generated for the detected anomalies.**
138+
139+
Use the provided endpoints **http://localhost:8002/get-processed-anomalies**
140+
141+
<img width="1912" height="1422" alt="image" src="https://github.com/user-attachments/assets/67473426-6a15-491f-a71e-09c638541fd2" />
142+
143+
144+
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
logs/
2+
tmp/
3+
release.sh
4+
vscode
5+
*.pyc
6+
__pycache__/

demo/anomaly-llm-faiss/Dockerfile

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Use a lightweight Python image
2+
FROM python:3.11-slim
3+
4+
5+
# Install build dependencies
6+
RUN apt-get update && apt-get install -y curl \
7+
gcc \
8+
python3-dev \
9+
build-essential \
10+
&& apt-get clean \
11+
&& rm -rf /var/lib/apt/lists/*
12+
13+
# Create a non-root user (OpenShift runs containers as random UID by default)
14+
RUN mkdir -p /app /tmp/huggingface /mnt/hf_cache && chmod -R 777 /app /tmp/huggingface /mnt/hf_cache
15+
16+
# Set the working directory
17+
WORKDIR /app
18+
19+
# Set environment variables for Hugging Face
20+
ENV HF_HOME=/tmp/huggingface
21+
ENV TRANSFORMERS_CACHE=/tmp/huggingface/transformers
22+
23+
24+
# Copy requirement files and install dependencies
25+
COPY requirements.txt .
26+
27+
RUN pip install --no-cache-dir -r requirements.txt
28+
29+
# Copy the rest of your application
30+
COPY . .
31+
32+
# Set PYTHONPATH to ensure 'app' is recognized
33+
ENV PYTHONPATH=/app
34+
35+
# Command to run the app
36+
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8002" , "--log-level", "debug", "--access-log" ]

demo/anomaly-llm-faiss/app/README.md

Whitespace-only changes.

demo/anomaly-llm-faiss/app/__init__.py

Whitespace-only changes.

demo/anomaly-llm-faiss/app/api/__init__.py

Whitespace-only changes.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# FastAPI Routes for Anomaly Detection
2+
from fastapi import FastAPI,APIRouter, HTTPException
3+
from app.models.models import AnomalyData
4+
5+
# router = APIRouter()
6+
from typing import Dict
7+
8+
app = FastAPI()
9+
10+
@app.get("/")
11+
async def root():
12+
return {"message": "Isolation Forest Anomaly Detection API is running."}
13+
14+
@app.post("/detect-anomaly/")
15+
async def handle_anomaly(anomaly: AnomalyData):
16+
try:
17+
app_name = anomaly.get("app_name", "unknown_app")
18+
pod_name = anomaly.get("pod_name", "unknown_pod")
19+
cluster_info = anomaly.get("cluster_info", "unknown_cluster")
20+
# response = analyze_anomaly_wi
21+
# th_llm(anomaly)
22+
return {"resolution": response}
23+
except Exception as e:
24+
raise HTTPException(status_code=500, detail=f"An error occurred: {e}")
25+
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
from transformers import AutoModelForCausalLM, AutoTokenizer
2+
3+
model_name = "gpt2"
4+
local_dir = "./llmmodels/gpt2"
5+
6+
# Download and save locally
7+
model = AutoModelForCausalLM.from_pretrained(model_name)
8+
tokenizer = AutoTokenizer.from_pretrained(model_name)
9+
10+
# Save locally
11+
model.save_pretrained(local_dir)
12+
tokenizer.save_pretrained(local_dir)

0 commit comments

Comments
 (0)