Bug 2225468

Summary: Changing the content of the MULTUS document
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Oded <oviner>
Component: documentationAssignee: Kusuma <kbg>
Status: ASSIGNED --- QA Contact: Neha Berry <nberry>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.13CC: brgardne, kbg, odf-bz-bot
Target Milestone: ---Keywords: NoDocsQEReview
Target Release: ---Flags: brgardne: needinfo? (kbg)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Oded 2023-07-25 11:34:56 UTC
Describe the issue:

Changing the content of the MULTUS document:
1.Speed of the NIC should be at least 10G
I tested Multus with 1G interface and I got "Slow OSD heartbeats on back" error:

sh-5.1$ ceph -s
  cluster:
    id:     25d4c3cc-98fc-4721-9ab4-157552715dfc
    health: HEALTH_WARN
            Slow OSD heartbeats on back (longest 29066.453ms)
            Slow OSD heartbeats on front (longest 28512.216ms)
            Degraded data redundancy: 467/1401 objects degraded (33.333%), 80 pgs degraded
            2 slow ops, oldest one blocked for 234 sec, mon.a has slow ops

deadline 2023-06-18T09:58:25.633590+0000)
2023-06-18T09:58:34.426413442Z debug 2023-06-18T09:58:34.425+0000 7f727ff6b640  0 auth: could not find secret_id=28
2023-06-18T09:58:34.426413442Z debug 2023-06-18T09:58:34.425+0000 7f727ff6b640  0 cephx: verify_authorizer could not get service secret for service osd secret_id=28
2023-06-18T09:58:35.164616560Z debug 2023-06-18T09:58:35.164+0000 7f7277d7b640 -1 osd.2 101 heartbeat_check: no reply from 192.168.20.23:6802 osd.1 since back 2023-06-18T09:57:59.833534+0000 front 2023-06-18T09:58:34.437892+0000 (oldest deadline 2023-06-18T09:58:25.633590+0000)

2.Use separate NICs for public-net and cluster-net:
We need to mention public-net and cluster-net need to configure on different interfaces 
For Example, public-net nic is X and cluster-net nic is Y


---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
 name: public-net
 namespace: default
 labels: {}
 annotations: {}
spec:
 config: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "X", "mode": "bridge", "ipam": { "type": "whereabouts", "range": "192.168.20.0/24" } }'
---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
 name: cluster-net
 namespace: default
 labels: {}
 annotations: {}
spec:
 config: '{ "cniVersion": "0.3.1", "type": "macvlan", "master": "Y", "ipam": { "type": "whereabouts", "range": "192.168.30.0/24" } }'



3.Use separate VLANs/subnets

The switch ports on which X and Y are connected should be in different VLANs:
For example:
port X [on cluster] connected to port A [in switch]
port Y [on cluster] connected to port B [in switch]

we need to configure port A to VLAN 2 and  port B to VLAN 3 [vlan id is not relevant]

Attached a screenshot to explain the idea

4. Avoid restart of all pods with multus:
 Multus, Connection issue to Noobaa DB after resetting all pods in openshift-storage ns 
https://bugzilla.redhat.com/show_bug.cgi?id=2223780


Suggestions for improvement:

Document URL:
https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.13/html-single/planning_your_deployment/index#recommended-network-configuration-and-requirements-for-a-multus-configuration_rhodf

Chapter/Section Number and Title:
7.7.3. Recommended network configuration and requirements for a Multus configuration

Product Version:
ODF4.13

Environment Details:

Any other versions of this document that also needs this update:

Additional information:

Comment 3 Blaine Gardner 2023-08-07 15:31:40 UTC
I'm not sure that points #2 or #3 are technical requirements. I think those are recommendations, but I don't know of anything that prevents users from doing that if that is what's best for their environment.

Comment 4 Blaine Gardner 2023-08-07 15:35:12 UTC
I also notice an additional issue that the 4.13 docs have the same diagram for both dual and triple network diagrams here: https://access.redhat.com/documentation/en-us/red_hat_openshift_data_foundation/4.13/html-single/planning_your_deployment/index#recommended-network-configuration-and-requirements-for-a-multus-configuration_rhodf

Comment 5 Blaine Gardner 2023-08-07 15:37:21 UTC
Dual network should only show the "Public network" and no "cluster network"

Comment 8 Oded 2023-08-10 09:04:21 UTC
Hi Kusuma,

I have some comments:
1. What is the difference between SDN SW and OCS SW?
2. The Internal-net and public net can work on the same SW.
3. What is the difference between a Storage node and a worker node? the disks attached to worker nodes
4. It should be noted that you have to work with two different network physical interfaces
5. We need to add a comment "Each port [cluster-net and public-net] is associated with a different VLAN ID in ethernet switch"
6. I added a screenshot to explain the idea