Skip to content

Commit e4ea8aa

Browse files
jotakjpinsonneau
andauthored
Refresh archicture doc (#1022)
* Refresh archicture doc * Add CONTRIBUTING * Update docs/Architecture.md Co-authored-by: Julien Pinsonneau <91894519+jpinsonneau@users.noreply.github.com> * Update docs/Architecture.md Co-authored-by: Julien Pinsonneau <91894519+jpinsonneau@users.noreply.github.com> * More details on CLI mdoes --------- Co-authored-by: Julien Pinsonneau <91894519+jpinsonneau@users.noreply.github.com>
1 parent 4859d5a commit e4ea8aa

File tree

5 files changed

+101
-49
lines changed

5 files changed

+101
-49
lines changed

CONTRIBUTING.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
## Contributing
2+
3+
Please refer to [NetObserv projects contribution guide](https://github.com/netobserv/documents/blob/main/CONTRIBUTING.md).

docs/Architecture.md

Lines changed: 98 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,49 +1,98 @@
1-
# Network Observability Architecture
2-
3-
The Network Observability solution consists on a [Network Observability Operator (NOO)](https://github.com/netobserv/network-observability-operator)
4-
that deploys, configures and controls the status of the following components:
5-
6-
* [Network Observability eBPF Agent](https://github.com/netobserv/netobserv-ebpf-agent/)
7-
* It is attached to all the interfaces in the host network and listen for each network packet that
8-
is submitted or received by their egress/ingress. The agent aggregates all the packets by source
9-
and destination addresses, protocol, etc... into network flows that are submitted to the
10-
Flowlogs-Pipeline flow processor.
11-
* [Network Observabiilty Flowlogs-Pipeline (FLP)](https://github.com/netobserv/flowlogs-pipeline)
12-
* It receives the raw flows from the agent and decorates them with Kubernetes information (Pod
13-
and host names, namespaces, etc.), and stores them as JSON into a [Loki](https://grafana.com/oss/loki/)
14-
instance.
15-
* [Network Observability Console Plugin](https://github.com/netobserv/network-observability-console-plugin)
16-
* It is attached to the Openshift console as a plugin (see Figure 1, though it can be also
17-
deployed in standalone mode). The Console Plugin queries the flows information stored in Loki
18-
and allows filtering flows, showing network topologies, etc.
19-
20-
![Netobserv frontend architecture](./assets/frontend.png)
21-
Figure 1: Console Plugin deployment
22-
23-
There are two existing deployment modes for Network Observability: direct mode and Kafka mode.
24-
25-
## Direct-mode deployment
26-
27-
In direct mode (figure 2), the eBPF agent sends the flows information to Flowlogs-Pipeline encoded as Protocol
28-
Buffers (binary representation) via [gRPC](https://grpc.io/). In this scenario, Flowlogs-Pipeline
29-
is usually deployed as a DaemonSet so there is a 1:1 communication between the Agent and FLP internal
30-
to the host, so we minimize cluster network usage.
31-
32-
![Netobserv component's architecture (direct mode)](./assets/architecture-direct.png)
33-
Figure 2: Direct deployment
34-
35-
## Kafka-mode deployment
36-
37-
In Kafka mode (figure 3), the communication between the eBFP agent and FLP is done via a Kafka topic.
38-
39-
![Netobserv component's architecture (Kafka mode)](./assets/architecture-kafka.png)
40-
Figure 3: Kafka deployment
41-
42-
This has some advantages over the direct mode:
43-
1. The flows' are buffered in the Kafka topic, so if there is a peak of flows, we make sure that
44-
FLP will receive/process them without any kind of denial of service.
45-
2. Flows are persisted in the topic, so if FLP is restarted by any reason (an update in the
46-
configuration or just a crash), the forwarded flows are persisted in Kafka for its later
47-
processing, and we don't lose them.
48-
3. Deploying FLP as a deployment, you don't have to keep the 1:1 proportion. You can scale up and
49-
down FLP pods according to your load.
1+
# NetObserv architecture
2+
3+
_See also: [architecture in the downstream documentation](https://docs.openshift.com/container-platform/latest/observability/network_observability/understanding-network-observability-operator.html#network-observability-architecture_nw-network-observability-operator)_
4+
5+
NetObserv is a collection of components that can sometimes run independently, or as a whole.
6+
7+
The components are:
8+
9+
- An [eBPF agent](https://github.com/netobserv/netobserv-ebpf-agent), that generates network flows from captured packets.
10+
- It is attached to any/all of the network interfaces in the host, and listens for packets (ingress+egress) with [eBPF](https://ebpf.io/).
11+
- Packets are aggregated into logical flows (similar to NetFlows), periodically exported to a collector, generally FLP.
12+
- Optional features allow to add rich data, such as TCP latency or DNS information.
13+
- It is able to correlate those flows with other events such as network policy rules and drops (network policy correlation requires the [OVN Kubernetes](https://github.com/ovn-org/ovn-kubernetes/) network plugin).
14+
- When used with the CLI or as a standalone, the agent can also do full packet captures instead of generating logical flows.
15+
- [Flowlogs-pipeline](https://github.com/netobserv/flowlogs-pipeline) (FLP), a component that collects, enriches and exports these flows.
16+
- It uses Kubernetes informers to enrich flows with details such as Pod names, namespaces, availability zones, etc.
17+
- It derives all flows into metric counters, for Prometheus.
18+
- Raw flows can be exported to Loki and/or custom exporters (Kafka, IPFIX, OpenTelemetry).
19+
- As a standalone, FLP is very flexible and configurable. It supports more inputs and outputs, allows more arbitrary filters, sampling, aggregations, relabelling, etc. When deployed via the operator, only a subset of its capacities is used.
20+
- When used in OpenShift, [a Console plugin](https://github.com/netobserv/network-observability-console-plugin) for flows visualization with powerful filtering options, a topology representation and more (outside of OpenShift, [it can be deployed as a standalone](https://github.com/netobserv/network-observability-operator/blob/main/FAQ.md#how-do-i-visualize-flows-and-metrics)).
21+
- It provides a polished web UI to visualize and explore the flow logs and metrics stored in Loki and/or Prometheus.
22+
- Different views include metrics overview, a network topology and a table listing raw flows logs.
23+
- It supports multi-tenant access, making it relevant for various use cases: cluster/network admins, SREs, development teams...
24+
- [An operator](https://github.com/netobserv/network-observability-operator) that manages all of the above.
25+
- It provides two APIs (CRD), one called [FlowCollector](https://github.com/netobserv/network-observability-operator/blob/main/docs/FlowCollector.md), which configures and pilots the whole deployment, and another called [FlowMetrics](https://github.com/netobserv/network-observability-operator/blob/main/docs/FlowMetric.md) which allows to customize which metrics to generate out of flow logs.
26+
- As an [OLM operator](https://olm.operatorframework.io/), it is designed with `operator-sdk`, and allows subscriptions for easy updates.
27+
- [A CLI](https://github.com/netobserv/network-observability-cli) that also manages some of the above components, for on-demand monitoring and packet capture.
28+
- It is provided as a `kubectl` or `oc` plugin, allowing to capture flows (similar to what the operator does, except it's on-demand and in the terminal), full packets (much like a `tcpdump` command) or metrics.
29+
- It is also available via [Krew](https://krew.sigs.k8s.io/).
30+
- It offers a live visualization via a TUI. For metrics, when used in OpenShift, it provides out-of-the-box dashboards.
31+
- Check out the blog post: [Network observability on demand](https://developers.redhat.com/articles/2024/09/17/network-observability-demand#what_is_the_network_observability_cli_).
32+
33+
## Direct deployment model
34+
35+
When using the operator with `FlowCollector` `spec.deploymentModel` set to `Direct`, agents and FLP are both deployed per node (as `DaemonSets`). This is perfect for an assessment of the technology, suitable on small clusters, but isn't very memory efficient in large clusters as every instance of FLP ends up caching the same cluster information, which can be huge.
36+
37+
Note that Loki isn't managed by the operator and must be installed separately, such as with the Loki operator. Same goes with Prometheus and any custom receiver.
38+
39+
<!-- You can use https://mermaid.live/ to test it -->
40+
41+
```mermaid
42+
flowchart TD
43+
subgraph "for each node"
44+
A[eBPF Agent] -->|generates flows| F[FLP]
45+
end
46+
F -. exports .-> E[(Kafka/Otlp/IPFIX)]
47+
F -->|raw logs| L[(Loki)]
48+
F -->|metrics| P[(Prometheus)]
49+
C[Console plugin] <-->|fetches| L
50+
C <-->|fetches| P
51+
O[Operator] -->|manages| A
52+
O -->|manages| F
53+
O -->|manages| C
54+
```
55+
56+
## Kafka deployment model
57+
58+
When using the operator with `FlowCollector` `spec.deploymentModel` set to `Kafka`, only the agents are deployed per node as a `DaemonSet`. FLP becomes a Kafka consumer that can be scaled independently. This is the recommended mode for large clusters, and is a more robust/resilient solution.
59+
60+
Like in `Direct` mode, data stores aren't managed by the operator. The same applies to the Kafka brokers and stores. You can check the Strimzi operator for that.
61+
62+
<!-- You can use https://mermaid.live/ to test it -->
63+
64+
```mermaid
65+
flowchart TD
66+
subgraph "for each node"
67+
A[eBPF Agent]
68+
end
69+
A -->|produces flows| K[(Kafka)]
70+
F[FLP] <-->|consumes| K
71+
F -. exports .-> E[(Kafka/Otlp/IPFIX)]
72+
F -->|raw logs| L[(Loki)]
73+
F -->|metrics| P[(Prometheus)]
74+
C[Console plugin] <-->|fetches| L
75+
C <-->|fetches| P
76+
O[Operator] -->|manages| A
77+
O -->|manages| F
78+
O -->|manages| C
79+
```
80+
81+
## CLI
82+
83+
When using the CLI, the operator is not involved, which means you can use it without installing NetObserv as a whole. It uses a special mode of the eBPF agents that embeds FLP.
84+
85+
When running flows or packet capture, a collector Pod is deployed in addition to the agents. When capturing only metrics, the collector isn't deployed, and metrics are exposed directly from the agents, pulled by Prometheus.
86+
87+
<!-- You can use https://mermaid.live/ to test it -->
88+
89+
```mermaid
90+
flowchart TD
91+
subgraph "for each node"
92+
A[eBPF Agent w/ embedded FLP]
93+
end
94+
A -->|generates flows or packets| C[Collector]
95+
CL[CLI] -->|manages| A
96+
CL -->|manages| C
97+
A -..->|metrics| P[(Prometheus)]
98+
```
-1.04 MB
Binary file not shown.

docs/assets/architecture-kafka.png

-1.21 MB
Binary file not shown.

docs/assets/frontend.png

-35.9 KB
Binary file not shown.

0 commit comments

Comments
 (0)