Bug 1982721 - [Multus] rbd command hung in toolbox pod on Multus enabled OCS cluster [NEEDINFO]
Summary: [Multus] rbd command hung in toolbox pod on Multus enabled OCS cluster
Keywords:
Status: ON_QA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ODF 4.14.0
Assignee: Nikhil Ladha
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-07-15 14:41 UTC by Sidhant Agrawal
Modified: 2023-08-16 14:48 UTC (History)
12 users (show)

Fixed In Version: 4.14.0-110
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
shan: needinfo? (rohgupta)
shan: needinfo? (jrivera)


Attachments (Terms of Use)
default_pod_created_without_annotations (8.40 KB, text/plain)
2021-07-15 14:41 UTC, Sidhant Agrawal
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-operator pull 2128 0 None Merged Add multus related annotations to ceph toolbox 2023-08-14 11:00:02 UTC
Github red-hat-storage ocs-operator pull 2135 0 None open [release-4.14] Add multus related annotations to ceph toolbox 2023-08-14 11:11:24 UTC
Red Hat Issue Tracker RHSTOR-2067 0 None None None 2022-01-20 16:11:36 UTC

Description Sidhant Agrawal 2021-07-15 14:41:48 UTC
Created attachment 1801897 [details]
default_pod_created_without_annotations

Description of problem (please be detailed as possible and provide log
snippests):
In a Multus enabled OCS Internal mode cluster, all commands are not working as expected in toolbox pod.
Simple ceph commands works, but rbd commands get hung.

```
sh-4.4# ceph -s
  cluster:
    id:     465f536e-3fed-41ec-88c5-00584ad4f069
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum a,b,c (age 6h)
    mgr: a(active, since 6h)
    mds: ocs-storagecluster-cephfilesystem:1 {0=ocs-storagecluster-cephfilesystem-a=up:active} 1 up:standby-replay
    osd: 3 osds: 3 up (since 6h), 3 in (since 6h)
    rgw: 1 daemon active (ocs.storagecluster.cephobjectstore.a)
 
  data:
    pools:   10 pools, 176 pgs
    objects: 351 objects, 204 MiB
    usage:   3.5 GiB used, 1.5 TiB / 1.5 TiB avail
    pgs:     176 active+clean
 
  io:
    client:   852 B/s rd, 11 KiB/s wr, 1 op/s rd, 1 op/s wr
 
sh-4.4# rbd ls -p ocs-storagecluster-cephblockpool
>> hangs & no output after several minutes

```

It was observed that toolbox pod doesn't have Multus related annotaions/interfaces by default.
Operator should take care of this automatically and apply proper multus annotations to the toolbox.

Version of all relevant components (if applicable):
OCP: 4.8.0-0.nightly-2021-07-14-153019
OCS: ocs-operator.v4.8.0-452.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
User won't be able to execute all commands via toolbox

Is there any workaround available to the best of your knowledge?
Yes,
Add annotation to "rook-ceph-tools" deployment similar to what MONs have
 and set hostNetwork to false.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
-

Steps to Reproduce:
1. Install OCS operator
2. Create Storage Cluster with Multus
3. Use this command to start toolbox pod
$ oc patch ocsinitialization ocsinit -n openshift-storage --type json --patch  '[{ "op": "replace", "path": "/spec/enableCephTools", "value": true }]'

4. Run a simple rbd command
$ rbd ls -p ocs-storagecluster-cephblockpool

Actual results:
rbd command get hung in toolbox pod

Expected results:
all ceph & rbd commands should work same with and without multus

Additional info:

Comment 2 Sébastien Han 2021-07-19 07:42:32 UTC
Rook does not create the toolbox, ocs-operator does, on-demand.
So this request will have to be done from the ocs-operator.

Comment 3 Mudit Agarwal 2021-07-21 06:29:12 UTC
Not a 4.8 blocker, can be fixed in 4.8.1

Comment 5 Jose A. Rivera 2021-10-11 16:22:34 UTC
This is definitely something we should fix. But, since multus is only going GA in ODF 4.10, moving it accordingly since we have time to fix it there. Also giving devel_ack+.

Rohan, if you could take a quick look, see if you can take care of this and set it to ASSIGNED. Otherwise let me know and I'll see if I can find anyone else.

Comment 6 Jose A. Rivera 2022-01-20 16:11:37 UTC
This BZ is directly related to the Multus Epic, which is not targeted for ODF 4.10: https://issues.redhat.com/browse/RHSTOR-2067

As we have hit feature freeze for 4.10, moving this to ODF 4.11.

Comment 7 Sébastien Han 2022-01-21 09:54:17 UTC
Rohan, any plan to work on this? Thanks

Comment 9 Sébastien Han 2022-05-09 15:44:07 UTC
Ideally yes if someone from ocs-op can pick it up, it looks small. It's only about adding the network annotations to the toolbox when requested.
José, do we have someone else than Rohan? He moved to a different team so I'm not sure if we can count on him :) 

Thanks!

Comment 10 Martin Bukatovic 2022-05-10 13:22:29 UTC
Reproducer looks clear.

Comment 28 Blaine Gardner 2023-05-30 16:53:09 UTC
Eran and I talked about targeting 4.13.z for this. @uchapaga could you help me be sure this is in the work queue for a 4.13 z-stream?

Comment 29 umanga 2023-05-31 05:13:09 UTC
Since we have all the acks on this BZ, will take this up for 4.14.
I will create a 4.13.z clone of this BZ when it's available.


Note You need to log in before you can comment on or make changes to this bug.