Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 18 additions & 15 deletions bundle/manifests/netobserv-operator.clusterserviceversion.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -586,14 +586,16 @@ spec:
kubectl edit flowcollector cluster
```
As it operates cluster-wide on every node, only a single `FlowCollector` is allowed, and it has to be named `cluster`.
Only a single `FlowCollector` is allowed, and it has to be named `cluster`.
A couple of settings deserve special attention:
- Sampling (`spec.agent.ebpf.sampling`): a value of `100` means: one flow every 100 is sampled. `1` means all flows are sampled. The lower it is, the more flows you get, and the more accurate are derived metrics, but the higher amount of resources are consumed. By default, sampling is set to 50 (ie. 1:50). Note that more sampled flows also means more storage needed. We recommend to start with default values and refine empirically, to figure out which setting your cluster can manage.
- Loki (`spec.loki`): configure here how to reach Loki. The default values match the Loki quick install paths mentioned above, but you might have to configure differently if you used another installation method. Make sure to disable it (`spec.loki.enable`) if you don't want to use Loki.
- Processor replicas (`spec.processor.consumerReplicas`): how many replicas of `flowlogs-pipeline` should be deployed. Those pods collect, transform and re-export network flows. They can also be configured as unmanaged via `unmanagedReplicas`, if you want to use an auto-scaler.
- Kafka (`spec.deploymentModel: Kafka` and `spec.kafka`): when enabled, integrates the flow collection pipeline with Kafka, by splitting ingestion from transformation (kube enrichment, derived metrics, ...). Kafka can provide better scalability, resiliency and high availability ([view more details](https://www.redhat.com/en/topics/integration/what-is-apache-kafka)). Assumes Kafka is already deployed and a topic is created.
- Exporters (`spec.exporters`) an optional list of exporters to which to send enriched flows. KAFKA and IPFIX exporters are supported. This allows you to define any custom storage or processing that can read from Kafka or use the IPFIX standard.
Expand All @@ -603,23 +605,24 @@ spec:
## Resource considerations
The following table outlines examples of resource considerations for clusters with certain workload sizes.
The examples outlined in the table demonstrate scenarios that are tailored to specific workloads. Consider each example only as a baseline from which adjustments can be made to accommodate your workload needs.
The examples outlined in the table demonstrate scenarios that are tailored to specific workloads. Consider each example only as a baseline from which adjustments can be made to accommodate your workload needs. The test beds are:
- Extra small: 10 nodes cluster, 4 vCPUs and 16GiB mem per worker, LokiStack size `1x.extra-small`, tested on AWS M6i instances.
- Small: 25 nodes cluster, 16 vCPUs and 64GiB mem per worker, LokiStack size `1x.small`, tested on AWS M6i instances.
- Large: 250 nodes cluster, 16 vCPUs and 64GiB mem per worker, LokiStack size `1x.medium`, tested on AWS M6i instances. In addition to this worker and its controller, 3 infra nodes (size `M6i.12xlarge`) and 1 workload node (size `M6i.8xlarge`) were tested.
| Resource recommendations | Extra small (10 nodes) | Small (25 nodes) | Medium (65 nodes) ** | Large (120 nodes) ** |
| ----------------------------------------------- | ---------------------- | ---------------------- | ----------------------- | ----------------------------- |
| *Worker Node vCPU and memory* | 4 vCPUs\| 16GiB mem * | 16 vCPUs\| 64GiB mem * | 16 vCPUs\| 64GiB mem * |16 vCPUs\| 64GiB Mem * |
| *LokiStack size* | `1x.extra-small` | `1x.small` | `1x.small` | `1x.medium` |
| *Network Observability controller memory limit* | 400Mi (default) | 400Mi (default) | 400Mi (default) | 800Mi |
| *eBPF sampling interval* | 50 (default) | 50 (default) | 50 (default) | 50 (default) |
| *eBPF memory limit* | 800Mi (default) | 800Mi (default) | 2000Mi | 800Mi (default) |
| *FLP memory limit* | 800Mi (default) | 800Mi (default) | 800Mi (default) | 800Mi (default) |
| *FLP Kafka partitions* | N/A | 48 | 48 | 48 |
| *Kafka consumer replicas* | N/A | 24 | 24 | 24 |
| *Kafka brokers* | N/A | 3 (default) | 3 (default) | 3 (default) |
*. Tested with AWS M6i instances.
**. In addition to this worker and its controller, 3 infra nodes (size `M6i.12xlarge`) and 1 workload node (size `M6i.8xlarge`) were tested.
| Resource recommendations | Extra small (10 nodes) | Small (25 nodes) | Large (250 nodes) |
| --------------------------------------------------------------------------------- | ---------------------- | ------------------- | -------------------- |
| Operator memory limit<br>*In `Subscription` `spec.config.resources`* | 400Mi (default) | 400Mi (default) | 400Mi (default) |
| eBPF agent sampling interval<br>*In `FlowCollector` `spec.agent.ebpf.sampling`* | 50 (default) | 50 (default) | 50 (default) |
| eBPF agent memory limit<br>*In `FlowCollector` `spec.agent.ebpf.resources`* | 800Mi (default) | 800Mi (default) | 1600Mi |
| eBPF agent cache size<br>*In `FlowCollector` `spec.agent.ebpf.cacheMaxSize`* | 50,000 | 120,000 (default) | 120,000 (default) |
| Processor memory limit<br>*In `FlowCollector` `spec.processor.resources`* | 800Mi (default) | 800Mi (default) | 800Mi (default) |
| Processor replicas<br>*In `FlowCollector` `spec.processor.consumerReplicas`* | 3 (default) | 6 | 18 |
| Deployment model<br>*In `FlowCollector` `spec.deploymentModel`* | Service (default) | Kafka | Kafka |
| Kafka partitions<br>*In your Kafka installation* | N/A | 48 | 48 |
| Kafka brokers<br>*In your Kafka installation* | N/A | 3 (default) | 3 (default) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're dropping LokiStack size recommendation? IMO we should keep it 1x.extra-small (10 nodes), 1x.small (25 nodes) and 1x.medium (250 nodes)

Copy link
Member Author

@jotak jotak Feb 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved that out of the recommendations, to the test beds. It's still visible above. idk, I think we have a different place where we document the recommended loki stack size with different criteria, I don't want to mix up things here, wdyt?

See also: https://docs.redhat.com/en/documentation/openshift_container_platform/4.21/html/network_observability/installing-network-observability-operators#loki-deployment-sizing_network_observability

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, okay, that sounds good to me then.

## Further reading
Expand Down
39 changes: 21 additions & 18 deletions config/descriptions/ocp.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,14 +50,16 @@ To edit configuration in cluster, run:
oc edit flowcollector cluster
```

As it operates cluster-wide on every node, only a single `FlowCollector` is allowed, and it has to be named `cluster`.
Only a single `FlowCollector` is allowed, and it has to be named `cluster`.

A couple of settings deserve special attention:

- Sampling (`spec.agent.ebpf.sampling`): a value of `100` means: one flow every 100 is sampled. `1` means all flows are sampled. The lower it is, the more flows you get, and the more accurate are derived metrics, but the higher amount of resources are consumed. By default, sampling is set to 50 (ie. 1:50). Note that more sampled flows also means more storage needed. We recommend to start with default values and refine empirically, to figure out which setting your cluster can manage.

- Loki (`spec.loki`): configure here how to reach Loki. The default values match the Loki quick install paths mentioned above, but you might have to configure differently if you used another installation method. Make sure to disable it (`spec.loki.enable`) if you don't want to use Loki.

- Processor replicas (`spec.processor.consumerReplicas`): how many replicas of `flowlogs-pipeline` should be deployed. Those pods collect, transform and re-export network flows. They can also be configured as unmanaged via `unmanagedReplicas`, if you want to use an auto-scaler.

- Kafka (`spec.deploymentModel: Kafka` and `spec.kafka`): when enabled, integrates the flow collection pipeline with Kafka, by splitting ingestion from transformation (kube enrichment, derived metrics, ...). Kafka can provide better scalability, resiliency and high availability ([view more details](https://www.redhat.com/en/topics/integration/what-is-apache-kafka)). Assumes Kafka is already deployed and a topic is created.

- Exporters (`spec.exporters`) an optional list of exporters to which to send enriched flows. KAFKA and IPFIX exporters are supported. This allows you to define any custom storage or processing that can read from Kafka or use the IPFIX standard.
Expand All @@ -67,23 +69,24 @@ A couple of settings deserve special attention:
## Resource considerations

The following table outlines examples of resource considerations for clusters with certain workload sizes.
The examples outlined in the table demonstrate scenarios that are tailored to specific workloads. Consider each example only as a baseline from which adjustments can be made to accommodate your workload needs.


| Resource recommendations | Extra small (10 nodes) | Small (25 nodes) | Medium (65 nodes) ** | Large (120 nodes) ** |
| ----------------------------------------------- | ---------------------- | ---------------------- | ----------------------- | ----------------------------- |
| *Worker Node vCPU and memory* | 4 vCPUs\| 16GiB mem * | 16 vCPUs\| 64GiB mem * | 16 vCPUs\| 64GiB mem * |16 vCPUs\| 64GiB Mem * |
| *LokiStack size* | `1x.extra-small` | `1x.small` | `1x.small` | `1x.medium` |
| *Network Observability controller memory limit* | 400Mi (default) | 400Mi (default) | 400Mi (default) | 800Mi |
| *eBPF sampling interval* | 50 (default) | 50 (default) | 50 (default) | 50 (default) |
| *eBPF memory limit* | 800Mi (default) | 800Mi (default) | 2000Mi | 800Mi (default) |
| *FLP memory limit* | 800Mi (default) | 800Mi (default) | 800Mi (default) | 800Mi (default) |
| *FLP Kafka partitions* | N/A | 48 | 48 | 48 |
| *Kafka consumer replicas* | N/A | 24 | 24 | 24 |
| *Kafka brokers* | N/A | 3 (default) | 3 (default) | 3 (default) |

*. Tested with AWS M6i instances.
**. In addition to this worker and its controller, 3 infra nodes (size `M6i.12xlarge`) and 1 workload node (size `M6i.8xlarge`) were tested.
The examples outlined in the table demonstrate scenarios that are tailored to specific workloads. Consider each example only as a baseline from which adjustments can be made to accommodate your workload needs. The test beds are:

- Extra small: 10 nodes cluster, 4 vCPUs and 16GiB mem per worker, LokiStack size `1x.extra-small`, tested on AWS M6i instances.
- Small: 25 nodes cluster, 16 vCPUs and 64GiB mem per worker, LokiStack size `1x.small`, tested on AWS M6i instances.
- Large: 250 nodes cluster, 16 vCPUs and 64GiB mem per worker, LokiStack size `1x.medium`, tested on AWS M6i instances. In addition to this worker and its controller, 3 infra nodes (size `M6i.12xlarge`) and 1 workload node (size `M6i.8xlarge`) were tested.


| Resource recommendations | Extra small (10 nodes) | Small (25 nodes) | Large (250 nodes) |
| --------------------------------------------------------------------------------- | ---------------------- | ------------------- | -------------------- |
| Operator memory limit<br>*In `Subscription` `spec.config.resources`* | 400Mi (default) | 400Mi (default) | 400Mi (default) |
| eBPF agent sampling interval<br>*In `FlowCollector` `spec.agent.ebpf.sampling`* | 50 (default) | 50 (default) | 50 (default) |
| eBPF agent memory limit<br>*In `FlowCollector` `spec.agent.ebpf.resources`* | 800Mi (default) | 800Mi (default) | 1600Mi |
| eBPF agent cache size<br>*In `FlowCollector` `spec.agent.ebpf.cacheMaxSize`* | 50,000 | 120,000 (default) | 120,000 (default) |
| Processor memory limit<br>*In `FlowCollector` `spec.processor.resources`* | 800Mi (default) | 800Mi (default) | 800Mi (default) |
| Processor replicas<br>*In `FlowCollector` `spec.processor.consumerReplicas`* | 3 (default) | 6 | 18 |
| Deployment model<br>*In `FlowCollector` `spec.deploymentModel`* | Service (default) | Kafka | Kafka |
| Kafka partitions<br>*In your Kafka installation* | N/A | 48 | 48 |
| Kafka brokers<br>*In your Kafka installation* | N/A | 3 (default) | 3 (default) |

## Further reading

Expand Down
Loading