You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/showroom/hardware-recommendations/control-plane.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,23 +6,23 @@ title: Control Plane Hardware
6
6
The minimal control plane footprint is designed for reliability and cost efficiency.
7
7
We do not include the required capacity to run other optional Apeiro services from the COS layer and above in this consideration for the control plane sizing.
8
8
Depending on the complete target scenario, additional capacity needs to be reserved to run Gardener and other services on the control plane.
9
-
This page focuses on the recommended size of the control plane for a plain installation of the BOS layer and bare metal automation to manage the infrastructure in the data plane that will carry the workload.
9
+
This page focuses on the recommended size of the control plane for a plain installation of the BOS layer and bare metal automation to manage the infrastructure in the work plane that will carry the workload.
10
10
11
-
Additional sizing optimization for minimal footprint installations might be achieved by merging control and data plane into a single rack.
11
+
Additional sizing optimization for minimal footprint installations might be achieved by merging control and work plane into a single rack.
12
12
Our focus here is on a sustainable setup that can also be scaled out during productive operations, depending on increased resource demand.
13
13
14
14
## Bare Metal Hardware Specifications
15
15
The Control Plane of a pure bare metal setup that focuses on managing hardware resources without the additional IaaS capabilities requires a single rack for the complete stack.
16
16
17
17
The minimal setup for a bare metal offering includes:
18
18
- Management Nodes: Minimum of three servers to ensure high availability and redundancy for orchestration, monitoring, and API endpoints.
19
-
- Network Switch: One management switch for interconnecting control and data plane components, supporting both internal and external traffic.
19
+
- Network Switch: One management switch for interconnecting control and work plane components, supporting both internal and external traffic.
20
20
- Compute Nodes: Two or more servers dedicated to workload execution and storage, sized according to anticipated resource demand.
21
21
- Storage: Shared storage system accessible by all compute nodes for persistent data and VM images.
22
-
- Firewall: At least one firewall for basic network segmentation and security between control, data plane, and external connections.
22
+
- Firewall: At least one firewall for basic network segmentation and security between control, work plane, and external connections.
23
23
- Console/Management Access: One console for out-of-band management and troubleshooting.
24
24
25
-
This list presents the essential hardware components for a minimal yet scalable single rack deployment, combining both control plane and data plane functions for the Apeiro cloud infrastructure.
25
+
This list presents the essential hardware components for a minimal yet scalable single rack deployment, combining both control plane and work plane functions for the Apeiro cloud infrastructure.
26
26
Please refer to the subsequent sections for more detail on the respective components.
27
27
28
28
## CobaltCore Hardware Specifications
@@ -37,7 +37,7 @@ A typical **network fabric** pod for a Cobalt Core deployment in a modern data c
37
37
- Console Server: One or more console servers for centralized access to the serial management ports of network and compute devices, supporting remote troubleshooting and maintenance.
38
38
39
39
Specifications for each component may vary depending on performance requirements and vendor selection, but common features include support for high-speed interfaces (such as 100G QSFP28), redundant power supplies, and advanced network protocols (e.g., DMTF Redfish, VXLAN, EVPN).
40
-
This general architecture is designed to provide scalable, resilient, and secure networking for control plane and data plane operations in bare metal and IaaS environments.
40
+
This general architecture is designed to provide scalable, resilient, and secure networking for control plane and work plane operations in bare metal and IaaS environments.
41
41
42
42
A typical **compute pod** deployment is designed to deliver scalable, efficient, and manageable compute resources.
43
43
These pods commonly consist of a set of servers, network switches, and management components that together provide the necessary performance, connectivity, and operational flexibility for a wide variety of workloads.
@@ -50,7 +50,7 @@ Support for standardized management protocols, such as DMTF Redfish, is recommen
50
50
This ensures seamless integration with automation tools and reduces complexity.
51
51
Compute pods are architected to be modular, allowing for easy expansion and maintenance.
52
52
Power efficiency, density, and cooling requirements are key factors in hardware selection.
53
-
Network topology is optimized for low latency and high bandwidth, supporting both control plane and data plane operations.
53
+
Network topology is optimized for low latency and high bandwidth, supporting both control plane and work plane operations.
54
54
55
55
## IronCore Hardware Specifications
56
56
The Control Plane of IronCore consists typically of a single rack, holding management functionality for network, compute, and storage.
Copy file name to clipboardExpand all lines: docs/showroom/hardware-recommendations/index.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,13 +8,13 @@ This section provides hardware recommendations for typical Apeiro cloud infrastr
8
8
The Control Plane forms the backbone of the Apeiro cloud, managing orchestration, monitoring, and API endpoints.
9
9
All Apeiro management components are deployed in the control plane.
10
10
11
-
The Data Plane is responsible for workload execution, storage, AI training/inference, and networking.
12
-
The specific buildout of the Data Plane depends on the requirements and expected workload to be handled on the infrastructure.
11
+
The Work Plane is responsible for workload execution, storage, AI training/inference, and networking.
12
+
The specific buildout of the Work Plane depends on the requirements and expected workload to be handled on the infrastructure.
13
13
14
14
<ApeiroFigure src="/showroom/showroom-planes.png"
15
-
alt="An illustration of the layout of the control plane and data plane, consisting of multiple pods"
16
-
caption="The high-level control plane and data plane layout (shows optional components)"
15
+
alt="An illustration of the layout of the control plane and work plane, consisting of multiple pods"
16
+
caption="The high-level control plane and work plane layout (shows optional components)"
17
17
width="100%"/>
18
18
19
-
The recommendations focus on minimal and scalable footprints for both the [Control Plane](./control-plane.md) and [Data Plane](./data-plane.md), while illustrating options for deployments with one and three [availability zones](./scaling.md).
19
+
The recommendations focus on minimal and scalable footprints for both the [Control Plane](./control-plane.md) and [Work Plane](./work-plane.md), while illustrating options for deployments with one and three [availability zones](./scaling.md).
20
20
This ensures both robustness and flexibility for various enterprise workloads.
Copy file name to clipboardExpand all lines: docs/showroom/hardware-recommendations/scaling.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,12 +5,12 @@ title: Availability Zones and Scaling
5
5
6
6
## Availability Zones
7
7
8
-
With a single Control Plane and Data Plane, only limited SLAs (Service Level Agreements) can be guaranteed, as the entire system relies on a single set of resources and is thus more susceptible to outages or failures.
8
+
With a single Control Plane and Work Plane, only limited SLAs (Service Level Agreements) can be guaranteed, as the entire system relies on a single set of resources and is thus more susceptible to outages or failures.
9
9
To achieve higher SLAs and ensure greater system resilience, we recommend deploying multiple availability zones.
10
10
11
11
An availability zone is an isolated location within a data center region, designed with independent power, cooling, and networking to reduce the risk of simultaneous failures. By distributing workloads across at least three identical availability zones, you can significantly improve fault tolerance and disaster recovery capabilities.
12
12
13
-
This approach typically involves multiplying the Control Plane and Data Plane investments to create three separate, fully functional zones.
13
+
This approach typically involves multiplying the Control Plane and Work Plane investments to create three separate, fully functional zones.
14
14
In addition, deploying multiple availability zones requires robust load balancing to distribute traffic and workloads evenly, as well as data replication strategies to ensure data consistency and availability even in the event of a zone failure.
15
15
16
16
Leveraging multiple availability zones is a best practice adopted by leading cloud providers to meet stringent uptime and reliability requirements for enterprise and mission-critical applications.
@@ -31,12 +31,12 @@ Vertical scaling can be achieved by upgrading to more powerful nodes, which may
31
31
32
32
Alternatively, horizontal scaling is possible by incorporating additional racks, thereby increasing the number of nodes that share the workload and enhance redundancy.
33
33
34
-
This approach to rightsizing can be implemented proactively as part of scheduled hardware refresh cycles, aligning with the natural depreciation of equipment, or reactively in response to sudden surges in Data Plane demand.
34
+
This approach to rightsizing can be implemented proactively as part of scheduled hardware refresh cycles, aligning with the natural depreciation of equipment, or reactively in response to sudden surges in work plane demand.
35
35
This ensures that the Control Plane remains resilient and capable of supporting evolving infrastructure requirements without causing disruptions to ongoing services.
36
36
37
-
### Data Plane
37
+
### Work Plane
38
38
39
-
The Data Plane can be scaled horizontally on demand by adding additional compute, storage, network, and AI nodes as needed based on the required capacity for the expected workload.
39
+
The Work Plane can be scaled horizontally on demand by adding additional compute, storage, network, and AI nodes as needed based on the required capacity for the expected workload.
40
40
Horizontal scaling, also known as "scaling out," involves increasing the number of nodes or servers in the system rather than upgrading the hardware of existing nodes.
41
41
42
42
This approach enables organizations to handle greater workloads, improve fault tolerance, and maintain high availability.
Copy file name to clipboardExpand all lines: docs/showroom/hardware-recommendations/work-plane.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,13 @@
1
1
---
2
2
sidebar_position: 2
3
-
title: Data Plane Hardware
3
+
title: Work Plane Hardware
4
4
---
5
5
6
-
The Data Plane footprint is determined by the required performance and horizontal scalability.
7
-
The hardware recommendations for Data Plane are shared between CobaltCore and IronCore and will need to be adapted based on the actual workload profiles that need to be supported in the specific target setup.
8
-
For simplicity in procurement, operations, and management of resources, it should be considered to use the same hardware specifications for control and data plane servers.
6
+
The Work Plane footprint is determined by the required performance and horizontal scalability.
7
+
The hardware recommendations for Work Plane are shared between CobaltCore and IronCore and will need to be adapted based on the actual workload profiles that need to be supported in the specific target setup.
8
+
For simplicity in procurement, operations, and management of resources, it should be considered to use the same hardware specifications for control and work plane servers.
9
9
10
-
## Compute Data Plane
10
+
## Compute Work Plane
11
11
12
12
General Purpose Compute Pod nodes are designed to provide flexible, scalable computing resources suitable for a wide range of workloads, including virtualization, container orchestration, and cloud-native applications.
13
13
Typical hardware configurations for these nodes include a high-core-count, single-socket server processor (such as the latest Intel Xeon or AMD EPYC CPUs) with 128 or more cores, paired with substantial system memory, commonly 512GB RAM or higher, to support resource-intensive tasks and multiple virtual machines or containers simultaneously.
@@ -19,7 +19,7 @@ Additional 1G Base-T Ethernet ports are often included for management or out-of-
19
19
For data center deployments, compute pod nodes are typically designed to maximize density and power efficiency, allowing up to 16 or more nodes per standard 10kW rack, depending on specific power, cooling, and workload requirements.
20
20
These generalized specifications ensure that the compute pods can meet the demands of modern enterprise IT environments and scale effectively as business needs evolve.
21
21
22
-
## Storage Data Plane
22
+
## Storage Work Plane
23
23
24
24
A typical storage node in modern data center environments is designed to deliver high-capacity, high-throughput, and reliable storage services for a variety of applications, such as distributed file systems, object storage, and database backends.
25
25
These nodes commonly feature a high-core-count server processor, such as an AMD EPYC or Intel Xeon CPU, with at least 32 to 64 cores (and corresponding threads) to handle intensive I/O and background processing tasks.
@@ -36,7 +36,7 @@ In cost-sensitive deployments, hardware acceleration features like SmartNICs may
36
36
To optimize for data center density and power efficiency, storage nodes are designed to maximize the number of units per rack, often supporting 12 to 18 nodes per standard 10kW rack, with scalability in modular increments to match capacity and redundancy requirements.
37
37
This general specification ensures that storage nodes can be flexibly deployed in a wide range of enterprise and cloud environments, scaling efficiently as business needs grow.
38
38
39
-
## Mixed Data Plane
39
+
## Mixed Work Plane
40
40
41
41
A typical combined compute and storage setup in modern data centers is designed to balance high performance, scalability, and efficient resource utilization.
42
42
These solutions often involve distributing compute and storage resources across multiple racks to optimize power consumption and provide flexible scaling options.
@@ -54,9 +54,9 @@ These nodes may also incorporate hardware accelerators for networking and storag
54
54
By modularly combining compute and storage resources across racks, organizations can scale their infrastructure in units that balance processing power, storage capacity, and network throughput.
55
55
This approach supports a broad range of workloads, from distributed storage systems and virtualization to high-performance computing and data analytics, making it well-suited for both enterprise and cloud data center environments.
56
56
57
-
## Network Data Plane
57
+
## Network Work Plane
58
58
59
-
For CobaltCore deployments we recommend a dedicated network pod (whereas for IronCore we recommend deploying the network data plane on the compute nodes).
59
+
For CobaltCore deployments we recommend a dedicated network pod (whereas for IronCore we recommend deploying the network work plane on the compute nodes).
60
60
A typical hardware specification for such a pod includes the following components:
61
61
- Top-of-Rack (ToR) Switches: Multiple high-throughput ToR switches (commonly 2–4 per pod) provide primary network connectivity for compute, storage, and other devices within the rack. These switches often support high-speed Ethernet (e.g., 25/40/100/400GbE) to ensure low-latency and high-bandwidth communication between nodes.
62
62
- Tenant Switches: One or more switches dedicated to tenant or customer network segments, enabling secure and isolated networking for different workloads or clients within the data center.
@@ -67,7 +67,7 @@ A typical hardware specification for such a pod includes the following component
67
67
This modular pod-based architecture enables flexible scaling, robust fault tolerance, and streamlined management.
68
68
The actual number and specification of devices may vary based on the data center's size, workload requirements, and specific use cases, but the general principle is to provide a balanced, redundant network foundation for efficient and secure operations.
69
69
70
-
## AI Training and Inference Data Plane
70
+
## AI Training and Inference Work Plane
71
71
72
72
A typical hardware specification for **AI training pods** in modern data centers is designed to deliver high computational power, robust memory bandwidth, and scalable network connectivity to support demanding machine learning and deep learning workloads.
73
73
- GPU Acceleration: AI training nodes commonly feature multiple high-end GPUs (such as NVIDIA H100, H200, A100, or AMD MI300 series), each equipped with large amounts of high-bandwidth memory (HBM2e or HBM3/3e, typically 40GB–141GB per GPU). These GPUs are selected for their performance in parallel processing and support for advanced features like mixed-precision (FP8, FP16) and Multi-Instance GPU (MIG) capability, which allow efficient resource partitioning and flexible scaling.
0 commit comments