Bug 2086855

Summary: nfs-ganesha server remains running and cephnfs resource is available, after disabling nfs for the storage cluster
Product: [Red Hat Storage] Red Hat OpenShift Data Foundation Reporter: Amrita Mahapatra <ammahapa>
Component: ocs-operatorAssignee: Rakshith <rar>
Status: CLOSED WONTFIX QA Contact: Amrita Mahapatra <ammahapa>
Severity: high Docs Contact:
Priority: high    
Version: 4.11CC: brgardne, etamir, madam, mbukatov, mmuench, ndevos, ocs-bugs, odf-bz-bot, rar, sostapov
Target Milestone: ---   
Target Release: ODF 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-18 10:51:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Amrita Mahapatra 2022-05-16 16:57:07 UTC
Description of problem (please be detailed as possible and provide log
snippests):

nfs-ganesha server still remains up and running and cephnfs resource is still available, after disabling nfs feature for the storage cluster.

[ammahapa@ammahapa ~]$ oc patch -n openshift-storage storageclusters.ocs.openshift.io ocs-storagecluster --patch '{"spec": {"nfs":{"enable": false}}}' --type merge
storagecluster.ocs.openshift.io/ocs-storagecluster patched

[ammahapa@ammahapa ~]$ oc get storageclusters.ocs.openshift.io ocs-storagecluster -o yaml
nfs:
    enable: false

[ammahapa@ammahapa ~]$ oc get pods | grep rook-ceph-nfs
rook-ceph-nfs-ocs-storagecluster-cephnfs-a-846955ddc5-v2n5k       2/2     Running     0             7h13m


[ammahapa@ammahapa ~]$ oc get cephnfs
NAME                         AGE
ocs-storagecluster-cephnfs   7h13m


Version of all relevant components (if applicable):
OCP: 4.11.0-0.nightly-2022-05-11-054135
ODF full version: 4.11.0-69


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)? No


Is there any workaround available to the best of your knowledge? No


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)? 2


Can this issue reproducible? Yes


Can this issue reproduce from the UI? No


If this is a regression, please provide more details to justify this: No


Steps to Reproduce:
1. Update nfs feature for the storage-cluster with patch command.
[ammahapa@ammahapa ~]$ oc patch -n openshift-storage storageclusters.ocs.openshift.io ocs- 
storagecluster --patch '{"spec": {"nfs":{"enable": true}}}' --type merge
storagecluster.ocs.openshift.io/ocs-storagecluster patched

2. nfs-ganesha server comes up.
[ammahapa@ammahapa ~]$ oc get pods | grep rook-ceph-nfs
rook-ceph-nfs-ocs-storagecluster-cephnfs-a-846955ddc5-v2n5k       2/2     Running     0  

3. cephnfs resource available.
[ammahapa@ammahapa ~]$ oc get cephnfs
NAME                         AGE
ocs-storagecluster-cephnfs   6h32m
        
4. Enable "ROOK_CSI_ENABLE_NFS"
[ammahapa@ammahapa ~]$ oc patch cm rook-ceph-operator-config -n openshift-storage -p $'data:\n "ROOK_CSI_ENABLE_NFS":  "true"'
configmap/rook-ceph-operator-config patched

5. Disable nfs feature for the storage-cluster with patch command.
[ammahapa@ammahapa ~]$ oc patch -n openshift-storage storageclusters.ocs.openshift.io ocs- 
storagecluster --patch '{"spec": {"nfs":{"enable": false}}}' --type merge
storagecluster.ocs.openshift.io/ocs-storagecluster patched

6. Check cephnfs resources available and nfs-server is running or not.


Actual results:
cephnfs resources are available and nfs-server is running after nfs feature is disabled for the storage-cluster.


Expected results:
cephnfs resources should not be available and nfs-server should not be running after nfs feature is disabled for the storage-cluster.

Comment 3 Blaine Gardner 2022-05-16 17:10:49 UTC
I think I see 2 things here....

-------
 ( 1 )

I suspect this is an issue with OCS-Operator. @rar can you take a look and make sure that when when nfs.enable is set to false...
 - the CephNFS and the Service OCS operator creates are removed 
 - the ROOK_CSI_ENABLE_NFS config is set to false 

-------
 ( 2 )

@ammahapa, enabling/disabling ODF NFS features is intended to be by only a single patch command (below):
oc patch -n openshift-storage storageclusters.ocs.openshift.io ocs-storagecluster --patch '{"spec": {"nfs":{"enable": <true/false>}}}' --type merge 

Please be sure to remove `oc patch cm rook-ceph-operator-config -n openshift-storage -p $'data:\n "ROOK_CSI_ENABLE_NFS":  "<true/false>"'` from *all* CI test procedures.

Comment 9 Martin Bukatovic 2022-05-17 12:59:03 UTC
QE team provides ack under the following assumptions:

- ocs operator should at least stop nfs components when nfs is disabled (as explained in the original bug description)
- some manual steps are possible, assumig there have been already documented either in a doc draft or JIRA at the time I write this comment (if this is not the case, QE assumes that it should be fixed as well, and failure to do so wil ll constitute a reason to failQA this bug)
- user data should not be removed by disabling nfs feature

If there is a disagreement with this, we need to discuss this together with status of the original feature.

Comment 12 Blaine Gardner 2022-05-17 16:37:50 UTC
We spoke with Eran about priorities for the 4.11 TP, and I have some points to clarify. 

Due to some technical limitations with ocs-operator, both `oc patch` commands are required for enabling NFS that I was unaware of.  @ammahapa please ignore my comment here that suggests otherwise: https://bugzilla.redhat.com/show_bug.cgi?id=2086855#c3
Sorry for the confusion.

Critical for this BZ, the ability to disable NFS and have NFS components automatically removed is *not* a requirement for 4.11 TP. This is not a scenario we need to test. @ammahapa I believe we should remove this test from the NFS requirements for 4.11. 


If QA tests need to remove NFS components for teardown or for other reasons, the following should work:

Either delete the entire storagecluster, or follow the steps below
1. delete all NFS export PVCs, and wait for them to be deleted
2. oc patch -n openshift-storage storageclusters.ocs.openshift.io ocs-storagecluster --patch '{"spec": {"nfs":{"enable": false}}}' --type merge
3. oc patch cm rook-ceph-operator-config -n openshift-storage -p $'data:\n "ROOK_CSI_ENABLE_NFS":  "false"'
4. manually delete CephNFS, ocs nfs Service, and the nfs StorageClass


To @mbukatov 's comment here https://bugzilla.redhat.com/show_bug.cgi?id=2086855#c9, I believe the information I have from Eran is contrary to your expectations. I will ask Eran to comment here with any follow-up and to help us make sure we are resolving this in the right way.


I will leave this BZ open for now to resolve Martin's concerns and in case there are follow-up questions.