Skip to content

Comments

Add query to calculate cost of PV change with CloudCost exporter metrics#35

Merged
Pokom merged 1 commit intomainfrom
feat/cutover-persistent-volume-queries-to-cloudcost-exporter
Mar 13, 2025
Merged

Add query to calculate cost of PV change with CloudCost exporter metrics#35
Pokom merged 1 commit intomainfrom
feat/cutover-persistent-volume-queries-to-cloudcost-exporter

Conversation

@Pokom
Copy link
Contributor

@Pokom Pokom commented Mar 13, 2025

This ones a bit more of a challenge then CPU/Memory, due to three problems:

  1. Cloudcost exporter does not emit metrics for persistent volumes for Azure(Implement persistent volumes in Azure cloudcost-exporter#236)
  2. AWS ebs cost metrics does not have a cluster label(AWS EC2 Persistent Volumes missing cluster_name label cloudcost-exporter#450)
  3. persisent volumes in GKE and EKS emit the total hourly cost of the volume, not the hourly cost per GiB which we used previously to figure out the change in cost

I utilized Prometheus or ooperator(https://prometheus.io/docs/prometheus/latest/querying/operators/#logical-set-binary-operators) to overcome not having Azure pv costs. Effectively the query will attempt to find the average cost of pvs for

  1. eks volumes via CloudCost Exporter
  2. gke volumes via CloudCost Exporter
  3. azure volumes via OpenCost

This works because we're only querying one cluster at a time by name, and we rely upon the fact that cluster names are unique within Grafana Labs infrastructure.

The missing cluster label for eks cost metrics and persistent volumes not having cluster labels can be overcome by utilizing kube_persistentvolume_capacity_bytes metrics emitted by kube-state-metrics.

This was tested by looking at an EKS cluster like so:

go run ./cmd/estimator/ \
  -use.cloud.cost.exporter.metrics=true -from $PWD/pkg/costmodel/testdata/resource/StatefulSet.json \
  -to $PWD/pkg/costmodel/testdata/resource/StatefulSet-more-storage.json \
  -http.config.file ~/.config/dev.yaml \
  -prometheus.address $PROMETHEUS_ADDRESS \
   dev-us-east-0

This ones a bit more of a challenge then CPU/Memory, due to three
problems:
1. Cloudcost exporter does not emit metrics for persistent volumes for
   Azure(grafana/cloudcost-exporter#236)
2. AWS ebs cost metrics does not have a cluster label(grafana/cloudcost-exporter#450)
3. persisent volumes in GKE and EKS emit the total hourly cost of the
   volume, _not_ the hourly cost per GiB

I utilized Prometheus or ooperator(https://prometheus.io/docs/prometheus/latest/querying/operators/#logical-set-binary-operators) to overcome not having Azure pv costs.
Effectively the query will attempt to find the average cost of pvs for
1. eks volumes via CloudCost Exporter
2. gke volumes via CloudCost Exporter
3. azure volumes via OpenCost

This works because we're only querying one cluster at a time _by name_, and we rely
upon the fact that cluster names are unique within Grafana Labs
infrastructure.

The missing cluster label for eks cost metrics and persistent volumes
not having cluster labels can be overcome by utilizing
`kube_persistentvolume_capacity_bytes` metrics emitted by
kube-state-metrics.

This was tested by looking at an EKS cluster like so:

```shell
go run ./cmd/estimator/ \
  -use.cloud.cost.exporter.metrics=true -from $PWD/pkg/costmodel/testdata/resource/StatefulSet.json \
  -to $PWD/pkg/costmodel/testdata/resource/StatefulSet-more-storage.json \
  -http.config.file ~/.config/dev.yaml \
  -prometheus.address $PROMETHEUS_ADDRESS \
   dev-us-east-0
```
@Pokom Pokom requested a review from a team as a code owner March 13, 2025 00:25
pv_hourly_cost{cluster="%s"}
)[24h:1m]
)`
cloudcostQueryPersistentVolumeCost = `
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to use 24h for opencost vs instant query for cloudcost-exporter metrics?

Copy link
Contributor Author

@Pokom Pokom Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question! During the hackathon, I had arbitrarily picked a 24h lookup window. When testing out the new queries, there was a negligible difference between using a lookback vs instant query. The resulting values were ~$.00001 different. I don't think that difference is worth the computational increase of issuing the queries with a 24h look back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't really link to the explore's as they would surface internal metrics. If you want, I can share the explores in slack

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the details, no need to hard proof them :) 👍🏼

@Pokom Pokom requested review from a team and jjo March 13, 2025 14:50
pv_hourly_cost{cluster="%s"}
)[24h:1m]
)`
cloudcostQueryPersistentVolumeCost = `
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the details, no need to hard proof them :) 👍🏼

@Pokom Pokom merged commit 658bee6 into main Mar 13, 2025
2 checks passed
@Pokom Pokom deleted the feat/cutover-persistent-volume-queries-to-cloudcost-exporter branch March 13, 2025 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants