Bug 2209942 - Incomplete networking specifications in ODF documentation
Summary: Incomplete networking specifications in ODF documentation
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: documentation
Version: 4.12
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Anjana Suparna Sriram
QA Contact: Neha Berry
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-25 09:44 UTC by Sergii Mykhailushko
Modified: 2023-08-09 16:43 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)

Description Sergii Mykhailushko 2023-05-25 09:44:07 UTC
Ceph, being a core component of OCS suite, has the network requirements specifically outlined in the product documentation. The crucial part of these requirements are those regarding the bandwidth concerns:

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html-single/hardware_guide/index#network-considerations_hw

---8<---
2.5. Network considerations

Carefully consider bandwidth requirements for the cluster network, be mindful of network link oversubscription, and segregate the intra-cluster traffic from the client-to-cluster traffic. 

Important
Red Hat recommends using 10 GB Ethernet for Ceph production deployments. 1 GB Ethernet is not suitable for production storage clusters. 
...
At a minimum, a single 10 GB Ethernet link should be used for storage hardware. If the Ceph nodes have many drives each, add additional 10 GB Ethernet links for connectivity and throughput. 
--->8---

However there is nothing said about the required bandwidth in ODF documentation:

https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.12/html-single/planning_your_deployment/index#network-requirements_rhodf

The only things mentioned there are the IPv6 addressing and Multus support, which is currently a technology preview feature. 

I believe that the networking requirements for the OCS deployment should include those for standalone Ceph installation, since a slow network between the worker nodes can become a bottleneck causing the communication failures between the Ceph components, which then leads into:

- poor performance of a storage cluster (both on heavy workloads and/or recovery operations)
- missed heartbeats from the osd/mon/mgr daemons, resulting into the respective pods crashing or constant monitor re-elections due to quorum changes

We have actually observed this behaviour in quite a few customer scenarios, where they were running OCS cluster on less than 10G network, and thus facing the above issues.

Based on the above, i suggest that we should probably update the ODF docs to include the network specifications in the same way, as we currently have them in the Ceph documentation.

Regards,
Sergii


Note You need to log in before you can comment on or make changes to this bug.