Bug 2209942

Summary: Incomplete networking specifications in ODF documentation
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Sergii Mykhailushko <smykhail>
Component: documentationAssignee: Anjana Suparna Sriram <asriram>
Status: NEW --- QA Contact: Neha Berry <nberry>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.12CC: aglotov, odf-bz-bot
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Sergii Mykhailushko 2023-05-25 09:44:07 UTC
Ceph, being a core component of OCS suite, has the network requirements specifically outlined in the product documentation. The crucial part of these requirements are those regarding the bandwidth concerns:

https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html-single/hardware_guide/index#network-considerations_hw

---8<---
2.5. Network considerations

Carefully consider bandwidth requirements for the cluster network, be mindful of network link oversubscription, and segregate the intra-cluster traffic from the client-to-cluster traffic. 

Important
Red Hat recommends using 10 GB Ethernet for Ceph production deployments. 1 GB Ethernet is not suitable for production storage clusters. 
...
At a minimum, a single 10 GB Ethernet link should be used for storage hardware. If the Ceph nodes have many drives each, add additional 10 GB Ethernet links for connectivity and throughput. 
--->8---

However there is nothing said about the required bandwidth in ODF documentation:

https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.12/html-single/planning_your_deployment/index#network-requirements_rhodf

The only things mentioned there are the IPv6 addressing and Multus support, which is currently a technology preview feature. 

I believe that the networking requirements for the OCS deployment should include those for standalone Ceph installation, since a slow network between the worker nodes can become a bottleneck causing the communication failures between the Ceph components, which then leads into:

- poor performance of a storage cluster (both on heavy workloads and/or recovery operations)
- missed heartbeats from the osd/mon/mgr daemons, resulting into the respective pods crashing or constant monitor re-elections due to quorum changes

We have actually observed this behaviour in quite a few customer scenarios, where they were running OCS cluster on less than 10G network, and thus facing the above issues.

Based on the above, i suggest that we should probably update the ODF docs to include the network specifications in the same way, as we currently have them in the Ceph documentation.

Regards,
Sergii