2231834 – [4.13.z clone][Multus] rbd command hung in toolbox pod on Multus enabled OCS cluster

Bug 2231834 - [4.13.z clone][Multus] rbd command hung in toolbox pod on Multus enabled OCS cluster

Summary: [4.13.z clone][Multus] rbd command hung in toolbox pod on Multus enabled OCS ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ocs-operator
Sub Component:
Version:	4.13
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	ODF 4.13.4
Assignee:	Nikhil Ladha
QA Contact:	Oded
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2023-08-14 11:02 UTC by Nikhil Ladha
Modified:	2023-10-26 17:48 UTC (History)
CC List:	5 users (show)
Fixed In Version:	4.13.4-1
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-10-26 17:47:55 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	red-hat-storage ocs-operator pull 2179	0	None	Merged	Bug 2231834:[release-4.13] Add multus related annotations to ceph toolbox, and verify network namespace	2023-09-29 05:27:40 UTC
Red Hat Product Errata	RHBA-2023:6146	0	None	None	None	2023-10-26 17:48:45 UTC

Description Nikhil Ladha 2023-08-14 11:02:39 UTC

This bug was initially created as a copy of Bug #1982721


Description of problem (please be detailed as possible and provide log
snippests):
In a Multus enabled OCS Internal mode cluster, all commands are not working as expected in toolbox pod.
Simple ceph commands works, but rbd commands get hung.

```
sh-4.4# ceph -s
  cluster:
    id:     465f536e-3fed-41ec-88c5-00584ad4f069
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,c (age 6h)
    mgr: a(active, since 6h)
    mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby-replay
    osd: 3 osds: 3 up (since 6h), 3 in (since 6h)
    rgw: 1 daemon active (ocs.storagecluster.cephobjectstore.a)
 
  data:
    pools:   10 pools, 176 pgs
    objects: 351 objects, 204 MiB
    usage:   3.5 GiB used, 1.5 TiB / 1.5 TiB avail
    pgs:     176 active+clean
 
  io:
    client:   852 B/s rd, 11 KiB/s wr, 1 op/s rd, 1 op/s wr
 
sh-4.4# rbd ls -p ocs-storagecluster-cephblockpool
>> hangs & no output after several minutes

```

It was observed that toolbox pod doesn't have Multus related annotaions/interfaces by default.
Operator should take care of this automatically and apply proper multus annotations to the toolbox.

Version of all relevant components (if applicable):
OCP: 4.8.0-0.nightly-2021-07-14-153019
OCS: ocs-operator.v4.8.0-452.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
User won't be able to execute all commands via toolbox

Is there any workaround available to the best of your knowledge?
Yes,
Add annotation to "rook-ceph-tools" deployment similar to what MONs have
 and set hostNetwork to false.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
-

Steps to Reproduce:
1. Install OCS operator
2. Create Storage Cluster with Multus
3. Use this command to start toolbox pod
$ oc patch ocsinitialization ocsinit -n openshift-storage --type json --patch  '[{ "op": "replace", "path": "/spec/enableCephTools", "value": true }]'

4. Run a simple rbd command
$ rbd ls -p ocs-storagecluster-cephblockpool

Actual results:
rbd command get hung in toolbox pod

Expected results:
all ceph & rbd commands should work same with and without multus

Additional info:

Comment 15 Oded 2023-10-14 22:35:57 UTC

Bug Fixed

Setup:
OCP Version: 4.13.0-0.nightly-2023-10-13-013258
ODF Versoin: 4.13.4-3
Platform: Vsphere


1. Deploy cluster with multus
$ oc get storagecluster -o yaml
```
network:
  multiClusterService: {}
  provider: multus
  selectors:
    cluster: openshift-storage/private-net
    public: openshift-storage/public-net
```

2. Add tool pod
oc patch ocsinitialization ocsinit -n openshift-storage --type json --patch '[{ "op": "replace", "path": "/spec/enableCephTools", "value": true }]'

3. Check tool pod status:
$ oc get pods rook-ceph-tools-555575c74-wqmn7
NAME                              READY   STATUS    RESTARTS   AGE
rook-ceph-tools-555575c74-wqmn7   1/1     Running   0          9s

4. Check rbd list:
$ oc rsh rook-ceph-tools-555575c74-wqmn7
sh-5.1$  rbd ls -p ocs-storagecluster-cephblockpool
csi-vol-3f268117-7543-49b8-9942-fa3cb8718a0e
csi-vol-5744f8b2-be6d-4857-b588-0e2c6fc22c75
csi-vol-647633d2-6bcd-4d44-b10d-a30c0a03caf7
csi-vol-69fcc985-dca9-4d6c-af75-27eb75969f55
csi-vol-9006565e-e8a0-41fb-b7ef-7fb08a4af543

Comment 20 errata-xmlrpc 2023-10-26 17:47:55 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.13.4 security and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:6146

Note You need to log in before you can comment on or make changes to this bug.