Skip to content

Commit 1faaa78

Browse files
authored
Update recommendations: (#2443)
* Update recommendations: - Refresh from the downstream doc, some items were outdated (e.g. test beds still had the old 65 nodes) - Extract test bed data out of the table, to clarify these are *not* recommendations - Add info about where to find the mentioned settings - Refresh cacheMaxSize with updated information from this release - Rename Kafka consumer replicas => Consumer replicas, according to this release updates - Add recommended deployment model more explicitly - Recommend Service rather than Direct in 10-nodes clusters * address feedback
1 parent 7913271 commit 1faaa78

File tree

3 files changed

+60
-51
lines changed

3 files changed

+60
-51
lines changed

bundle/manifests/netobserv-operator.clusterserviceversion.yaml

Lines changed: 18 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -586,14 +586,16 @@ spec:
586586
kubectl edit flowcollector cluster
587587
```
588588
589-
As it operates cluster-wide on every node, only a single `FlowCollector` is allowed, and it has to be named `cluster`.
589+
Only a single `FlowCollector` is allowed, and it has to be named `cluster`.
590590
591591
A couple of settings deserve special attention:
592592
593593
- Sampling (`spec.agent.ebpf.sampling`): a value of `100` means: one flow every 100 is sampled. `1` means all flows are sampled. The lower it is, the more flows you get, and the more accurate are derived metrics, but the higher amount of resources are consumed. By default, sampling is set to 50 (ie. 1:50). Note that more sampled flows also means more storage needed. We recommend to start with default values and refine empirically, to figure out which setting your cluster can manage.
594594
595595
- Loki (`spec.loki`): configure here how to reach Loki. The default values match the Loki quick install paths mentioned above, but you might have to configure differently if you used another installation method. Make sure to disable it (`spec.loki.enable`) if you don't want to use Loki.
596596
597+
- Processor replicas (`spec.processor.consumerReplicas`): how many replicas of `flowlogs-pipeline` should be deployed. Those pods collect, transform and re-export network flows. They can also be configured as unmanaged via `unmanagedReplicas`, if you want to use an auto-scaler.
598+
597599
- Kafka (`spec.deploymentModel: Kafka` and `spec.kafka`): when enabled, integrates the flow collection pipeline with Kafka, by splitting ingestion from transformation (kube enrichment, derived metrics, ...). Kafka can provide better scalability, resiliency and high availability ([view more details](https://www.redhat.com/en/topics/integration/what-is-apache-kafka)). Assumes Kafka is already deployed and a topic is created.
598600
599601
- Exporters (`spec.exporters`) an optional list of exporters to which to send enriched flows. KAFKA and IPFIX exporters are supported. This allows you to define any custom storage or processing that can read from Kafka or use the IPFIX standard.
@@ -603,23 +605,24 @@ spec:
603605
## Resource considerations
604606
605607
The following table outlines examples of resource considerations for clusters with certain workload sizes.
606-
The examples outlined in the table demonstrate scenarios that are tailored to specific workloads. Consider each example only as a baseline from which adjustments can be made to accommodate your workload needs.
608+
The examples outlined in the table demonstrate scenarios that are tailored to specific workloads. Consider each example only as a baseline from which adjustments can be made to accommodate your workload needs. The test beds are:
607609
610+
- Extra small: 10 nodes cluster, 4 vCPUs and 16GiB mem per worker, LokiStack size `1x.extra-small`, tested on AWS M6i instances.
611+
- Small: 25 nodes cluster, 16 vCPUs and 64GiB mem per worker, LokiStack size `1x.small`, tested on AWS M6i instances.
612+
- Large: 250 nodes cluster, 16 vCPUs and 64GiB mem per worker, LokiStack size `1x.medium`, tested on AWS M6i instances. In addition to this worker and its controller, 3 infra nodes (size `M6i.12xlarge`) and 1 workload node (size `M6i.8xlarge`) were tested.
608613
609-
| Resource recommendations | Extra small (10 nodes) | Small (25 nodes) | Medium (65 nodes) ** | Large (120 nodes) ** |
610-
| ----------------------------------------------- | ---------------------- | ---------------------- | ----------------------- | ----------------------------- |
611-
| *Worker Node vCPU and memory* | 4 vCPUs\| 16GiB mem * | 16 vCPUs\| 64GiB mem * | 16 vCPUs\| 64GiB mem * |16 vCPUs\| 64GiB Mem * |
612-
| *LokiStack size* | `1x.extra-small` | `1x.small` | `1x.small` | `1x.medium` |
613-
| *Network Observability controller memory limit* | 400Mi (default) | 400Mi (default) | 400Mi (default) | 800Mi |
614-
| *eBPF sampling interval* | 50 (default) | 50 (default) | 50 (default) | 50 (default) |
615-
| *eBPF memory limit* | 800Mi (default) | 800Mi (default) | 2000Mi | 800Mi (default) |
616-
| *FLP memory limit* | 800Mi (default) | 800Mi (default) | 800Mi (default) | 800Mi (default) |
617-
| *FLP Kafka partitions* | N/A | 48 | 48 | 48 |
618-
| *Kafka consumer replicas* | N/A | 24 | 24 | 24 |
619-
| *Kafka brokers* | N/A | 3 (default) | 3 (default) | 3 (default) |
620614
621-
*. Tested with AWS M6i instances.
622-
**. In addition to this worker and its controller, 3 infra nodes (size `M6i.12xlarge`) and 1 workload node (size `M6i.8xlarge`) were tested.
615+
| Resource recommendations | Extra small (10 nodes) | Small (25 nodes) | Large (250 nodes) |
616+
| --------------------------------------------------------------------------------- | ---------------------- | ------------------- | -------------------- |
617+
| Operator memory limit<br>*In `Subscription` `spec.config.resources`* | 400Mi (default) | 400Mi (default) | 400Mi (default) |
618+
| eBPF agent sampling interval<br>*In `FlowCollector` `spec.agent.ebpf.sampling`* | 50 (default) | 50 (default) | 50 (default) |
619+
| eBPF agent memory limit<br>*In `FlowCollector` `spec.agent.ebpf.resources`* | 800Mi (default) | 800Mi (default) | 1600Mi |
620+
| eBPF agent cache size<br>*In `FlowCollector` `spec.agent.ebpf.cacheMaxSize`* | 50,000 | 120,000 (default) | 120,000 (default) |
621+
| Processor memory limit<br>*In `FlowCollector` `spec.processor.resources`* | 800Mi (default) | 800Mi (default) | 800Mi (default) |
622+
| Processor replicas<br>*In `FlowCollector` `spec.processor.consumerReplicas`* | 3 (default) | 6 | 18 |
623+
| Deployment model<br>*In `FlowCollector` `spec.deploymentModel`* | Service (default) | Kafka | Kafka |
624+
| Kafka partitions<br>*In your Kafka installation* | N/A | 48 | 48 |
625+
| Kafka brokers<br>*In your Kafka installation* | N/A | 3 (default) | 3 (default) |
623626
624627
## Further reading
625628

config/descriptions/ocp.md

Lines changed: 21 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -50,14 +50,16 @@ To edit configuration in cluster, run:
5050
oc edit flowcollector cluster
5151
```
5252

53-
As it operates cluster-wide on every node, only a single `FlowCollector` is allowed, and it has to be named `cluster`.
53+
Only a single `FlowCollector` is allowed, and it has to be named `cluster`.
5454

5555
A couple of settings deserve special attention:
5656

5757
- Sampling (`spec.agent.ebpf.sampling`): a value of `100` means: one flow every 100 is sampled. `1` means all flows are sampled. The lower it is, the more flows you get, and the more accurate are derived metrics, but the higher amount of resources are consumed. By default, sampling is set to 50 (ie. 1:50). Note that more sampled flows also means more storage needed. We recommend to start with default values and refine empirically, to figure out which setting your cluster can manage.
5858

5959
- Loki (`spec.loki`): configure here how to reach Loki. The default values match the Loki quick install paths mentioned above, but you might have to configure differently if you used another installation method. Make sure to disable it (`spec.loki.enable`) if you don't want to use Loki.
6060

61+
- Processor replicas (`spec.processor.consumerReplicas`): how many replicas of `flowlogs-pipeline` should be deployed. Those pods collect, transform and re-export network flows. They can also be configured as unmanaged via `unmanagedReplicas`, if you want to use an auto-scaler.
62+
6163
- Kafka (`spec.deploymentModel: Kafka` and `spec.kafka`): when enabled, integrates the flow collection pipeline with Kafka, by splitting ingestion from transformation (kube enrichment, derived metrics, ...). Kafka can provide better scalability, resiliency and high availability ([view more details](https://www.redhat.com/en/topics/integration/what-is-apache-kafka)). Assumes Kafka is already deployed and a topic is created.
6264

6365
- Exporters (`spec.exporters`) an optional list of exporters to which to send enriched flows. KAFKA and IPFIX exporters are supported. This allows you to define any custom storage or processing that can read from Kafka or use the IPFIX standard.
@@ -67,23 +69,24 @@ A couple of settings deserve special attention:
6769
## Resource considerations
6870

6971
The following table outlines examples of resource considerations for clusters with certain workload sizes.
70-
The examples outlined in the table demonstrate scenarios that are tailored to specific workloads. Consider each example only as a baseline from which adjustments can be made to accommodate your workload needs.
71-
72-
73-
| Resource recommendations | Extra small (10 nodes) | Small (25 nodes) | Medium (65 nodes) ** | Large (120 nodes) ** |
74-
| ----------------------------------------------- | ---------------------- | ---------------------- | ----------------------- | ----------------------------- |
75-
| *Worker Node vCPU and memory* | 4 vCPUs\| 16GiB mem * | 16 vCPUs\| 64GiB mem * | 16 vCPUs\| 64GiB mem * |16 vCPUs\| 64GiB Mem * |
76-
| *LokiStack size* | `1x.extra-small` | `1x.small` | `1x.small` | `1x.medium` |
77-
| *Network Observability controller memory limit* | 400Mi (default) | 400Mi (default) | 400Mi (default) | 800Mi |
78-
| *eBPF sampling interval* | 50 (default) | 50 (default) | 50 (default) | 50 (default) |
79-
| *eBPF memory limit* | 800Mi (default) | 800Mi (default) | 2000Mi | 800Mi (default) |
80-
| *FLP memory limit* | 800Mi (default) | 800Mi (default) | 800Mi (default) | 800Mi (default) |
81-
| *FLP Kafka partitions* | N/A | 48 | 48 | 48 |
82-
| *Kafka consumer replicas* | N/A | 24 | 24 | 24 |
83-
| *Kafka brokers* | N/A | 3 (default) | 3 (default) | 3 (default) |
84-
85-
*. Tested with AWS M6i instances.
86-
**. In addition to this worker and its controller, 3 infra nodes (size `M6i.12xlarge`) and 1 workload node (size `M6i.8xlarge`) were tested.
72+
The examples outlined in the table demonstrate scenarios that are tailored to specific workloads. Consider each example only as a baseline from which adjustments can be made to accommodate your workload needs. The test beds are:
73+
74+
- Extra small: 10 nodes cluster, 4 vCPUs and 16GiB mem per worker, LokiStack size `1x.extra-small`, tested on AWS M6i instances.
75+
- Small: 25 nodes cluster, 16 vCPUs and 64GiB mem per worker, LokiStack size `1x.small`, tested on AWS M6i instances.
76+
- Large: 250 nodes cluster, 16 vCPUs and 64GiB mem per worker, LokiStack size `1x.medium`, tested on AWS M6i instances. In addition to this worker and its controller, 3 infra nodes (size `M6i.12xlarge`) and 1 workload node (size `M6i.8xlarge`) were tested.
77+
78+
79+
| Resource recommendations | Extra small (10 nodes) | Small (25 nodes) | Large (250 nodes) |
80+
| --------------------------------------------------------------------------------- | ---------------------- | ------------------- | -------------------- |
81+
| Operator memory limit<br>*In `Subscription` `spec.config.resources`* | 400Mi (default) | 400Mi (default) | 400Mi (default) |
82+
| eBPF agent sampling interval<br>*In `FlowCollector` `spec.agent.ebpf.sampling`* | 50 (default) | 50 (default) | 50 (default) |
83+
| eBPF agent memory limit<br>*In `FlowCollector` `spec.agent.ebpf.resources`* | 800Mi (default) | 800Mi (default) | 1600Mi |
84+
| eBPF agent cache size<br>*In `FlowCollector` `spec.agent.ebpf.cacheMaxSize`* | 50,000 | 120,000 (default) | 120,000 (default) |
85+
| Processor memory limit<br>*In `FlowCollector` `spec.processor.resources`* | 800Mi (default) | 800Mi (default) | 800Mi (default) |
86+
| Processor replicas<br>*In `FlowCollector` `spec.processor.consumerReplicas`* | 3 (default) | 6 | 18 |
87+
| Deployment model<br>*In `FlowCollector` `spec.deploymentModel`* | Service (default) | Kafka | Kafka |
88+
| Kafka partitions<br>*In your Kafka installation* | N/A | 48 | 48 |
89+
| Kafka brokers<br>*In your Kafka installation* | N/A | 3 (default) | 3 (default) |
8790

8891
## Further reading
8992

0 commit comments

Comments
 (0)