Description of problem (please be detailed as possible and provide log snippests): ---------------------------------------------------------------- Following 2 new behaviors are seen in OCS 4.7 install: a) After installing the OCS operator, it is seen that for few seconds, we had 2 noobaa-operator and 2 rook-ceph-operator pods. but ultimately we had 4 operators as expected Fri Dec 18 17:21:11 UTC 2020 -------------- ========CSV ====== NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.7.0-199.ci OpenShift Container Storage 4.7.0-199.ci Installing -------------- =======PODS ====== NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES noobaa-operator-68789479f9-crfs9 0/1 Terminating 0 42s 10.131.0.12 ip-10-0-129-108.us-east-2.compute.internal <none> <none> noobaa-operator-79647874ff-jr6m6 1/1 Running 0 33s 10.129.2.30 ip-10-0-175-26.us-east-2.compute.internal <none> <none> ocs-metrics-exporter-85479869bc-lr46n 1/1 Running 0 33s 10.131.0.13 ip-10-0-129-108.us-east-2.compute.internal <none> <none> ocs-operator-548bfccdd9-gx6bf 0/1 Terminating 0 42s 10.129.2.28 ip-10-0-175-26.us-east-2.compute.internal <none> <none> rook-ceph-operator-5465bd45c4-gtjn8 0/1 Terminating 0 42s 10.128.2.17 ip-10-0-192-119.us-east-2.compute.internal <none> <none> rook-ceph-operator-7b6d685f58-j96hm 1/1 Running 0 33s 10.131.0.14 ip-10-0-129-108.us-east-2.compute.internal <none> <none> b) On installing Storage Cluster, these operator pods re-spinned again on their own. never seen this in any OCS verison before 4.7 noobaa-operator-68789479f9-lljjv 1/1 Running 0 20s 10.129.2.39 ip-10-0-175-26.us-east-2.compute.internal <none> <none> noobaa-operator-79647874ff-jr6m6 1/1 Terminating 0 15m 10.129.2.30 ip-10-0-175-26.us-east-2.compute.internal <none> <none> ocs-metrics-exporter-6f94c4fb96-7djnn 1/1 Running 0 20s 10.129.2.38 ip-10-0-175-26.us-east-2.compute.internal <none> <none> ocs-metrics-exporter-85479869bc-lr46n 1/1 Terminating 0 15m 10.131.0.13 ip-10-0-129-108.us-east-2.compute.internal <none> <none> ocs-operator-6bccd6f885-5fss7 0/1 Terminating 0 13m 10.129.2.31 ip-10-0-175-26.us-east-2.compute.internal <none> <none> rook-ceph-operator-5465bd45c4-cptdv 1/1 Running 0 20s 10.131.0.16 ip-10-0-129-108.us-east-2.compute.internal <none> <none> rook-ceph-operator-7b6d685f58-j96hm 0/1 Terminating 0 15m 10.131.0.14 ip-10-0-129-108.us-east-2.compute.internal <none> <none> Version of all relevant components (if applicable): ====================================================== $ oc get csv NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.7.0-199.ci OpenShift Container Storage 4.7.0-199.ci Succeeded [nberry@localhost nberry-aws-199.ci]$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2020-12-18-120350 True False 75m Cluster version is 4.7.0-0.nightly-2020-12-18-120350 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? ------------------------------------------------------------------ No. But pod restarts should be for a reason. Is there any workaround available to the best of your knowledge? --------------------------------------------------- NA Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? ---------------------------------------------------------------- 4 Can this issue reproducible? ----------------------------- Yes . Seen on Bare-metal LSO, VMware and AWS clusters Can this issue reproduce from the UI? ------------------------------------- Yes If this is a regression, please provide more details to justify this: ---------------------------------------------------------------- Yes. The operator pods never re-spinned in previous releases on their own while creating storage cluster Steps Performed ===================== 1. Installed OCS Operator ocs-operator.v4.7.0-199.ci on OCP 4.7.0-0.nightly-2020-12-18-120350 ** The namespace is created but monitoring label is not added, as per fix Bug 1866298 2. It is seen that the pods Checked the 4 operator pods and they are finally in Running state (age >14m) 3. Installed Storage cluster and ultimately all OCS pods are created ** The monitoring label is still not added to the namespace openshift-storage, see Bug 1866298#c7 4. Checked that all operator pods have voluntarily restarted Actual results: ------------------- Operator pods are re-spinning on their own, after Operator install and also after storage cluster creation Expected results: ------------------- No unexpected restarts of the pods
Fixed via https://github.com/openshift/ocs-operator/pull/1022.
Umanga, this not there in 4.7 yet. right?
Fix is merged and backported to 4.7. Clearing needinfo.
To reproduce the BZ, I performed the steps below: 1. Install an OCP 4.7 cluster on vSphere without OCS. 2. Run the script below in the cli to check the operator pods status and the status of the resources: while true; do date --utc; echo --------------; echo ========"CSV" ======; oc get csv -n openshift-storage; echo --------------; echo ======="PODS" ======; oc get pods -o wide -n openshift-storage; echo --------------; echo ======= "PVC" ==========; oc get pvc -n openshift-storage ; echo --------------; echo ======= "storagecluster" ==========; oc get storagecluster -n openshift-storage; echo --------------; echo ======= "cephcluster" ==========; oc get cephcluster -n openshift-storage; echo ======= "backingstore" ==========; oc get backingstore -n openshift-storage; echo ======= "bucketclass" ==========; oc get bucketclass -n openshift-storage; echo ======= "bucketclaim" ==========; oc get obc -n openshift-storage; sleep 10; done | tee csv-pods-pvc-ceph-storage-cluster-obc-BS.txt 3. After approximately 12 minutes I created the OCS 4.7 storage cluster. (The script is still running in the background) 4. After approximately 30 minutes I created an LSO operator, and created a new storage cluster using LSO. (The script is still running in the background) 5. Wait another 40 minutes. After looking at the file "csv-pods-pvc-ceph-storage-cluster-obc-BS.txt" where the script wrote all the results - I saw that all the operator pods never restarted or deleted. Also, all the other resources look fine.
Created attachment 1771885 [details] csv pods, ceph storage cluster resources, obc BS
Versions: OCP version: Client Version: 4.7.0-0.nightly-2021-04-10-082109 Server Version: 4.7.0-0.nightly-2021-04-10-082109 Kubernetes Version: v1.20.0+c8905da OCS verison: ocs-operator.v4.7.0-324.ci OpenShift Container Storage 4.7.0-324.ci Succeeded cluster version NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-04-10-082109 True False 20h Cluster version is 4.7.0-0.nightly-2021-04-10-082109 Rook version rook: 4.7-121.436d4ed74.release_4.7 go: go1.15.7 Ceph version ceph version 14.2.11-138.el8cp (18a95d26e01b87abf3e47e9f01f615b8d2dd03c4) nautilus (stable)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2041
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days