Description of problem (please be detailed as possible and provide log snippests): In OCS 4.8 External mode, deployment is failing as the storage cluster is stuck in Progressing state. Following are the observations from the cluster: 1) storagecluster stuck in Progressing state $ oc get storagecluster NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-external-storagecluster 131m Progressing true 2021-05-21T11:30:10Z 4.8.0 2) Even though storagecluster is in Progressing state, the csv is in Succeeded phase and operator pods in 1/1 state $ oc get csv NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.8.0-399.ci OpenShift Container Storage 4.8.0-399.ci Succeeded $ oc get pod | grep operator noobaa-operator-5d46769fdc-htzlk 1/1 Running 0 140m ocs-operator-7f64b96dd5-9794h 1/1 Running 0 140m rook-ceph-operator-77bd5678b9-h7kz6 1/1 Running 0 140m 3) RGW is present in external RHCS cluster and details were passed when creating json output from exportor script. OCS creates a backingstore of `pv-pool` type instead of `s3-compatible` $ oc get backingstore NAME TYPE PHASE AGE noobaa-default-backing-store pv-pool Ready 141m rgw SC is also present in the cluster $ oc get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE ocs-external-storagecluster-ceph-rbd openshift-storage.rbd.csi.ceph.com Delete Immediate true 145m ocs-external-storagecluster-ceph-rgw openshift-storage.ceph.rook.io/bucket Delete Immediate false 145m ocs-external-storagecluster-cephfs openshift-storage.cephfs.csi.ceph.com Delete Immediate true 145m openshift-storage.noobaa.io openshift-storage.noobaa.io/obc Delete Immediate false 143m thin kubernetes.io/vsphere-volume Delete Immediate false 168m 4) cephobjectstore and cephobjectstoreuser are absent $ oc get cephobjectstore -A No resources found $ oc get cephobjectstoreuser -A No resources found 5) OBC created using `ocs-external-storagecluster-ceph-rgw` sc stuck in Pending $ oc get obc NAME STORAGE-CLASS PHASE AGE obc1 openshift-storage.noobaa.io Bound 52m obc2 ocs-external-storagecluster-ceph-rgw Pending 52m Error from rook-ceph-operator logs: I0521 13:47:15.800923 7 controller.go:212] "msg"="reconciling claim" "key"="openshift-storage/obc2" I0521 13:47:15.800955 7 helpers.go:107] "msg"="getting claim for key" "key"="openshift-storage/obc2" I0521 13:47:15.802999 7 helpers.go:213] "msg"="getting ObjectBucketClaim's StorageClass" "key"="openshift-storage/obc2" I0521 13:47:15.804495 7 helpers.go:218] "msg"="got StorageClass" "key"="openshift-storage/obc2" "name"="ocs-external-storagecluster-ceph-rgw" I0521 13:47:15.804516 7 helpers.go:90] "msg"="checking OBC for OB name, this indicates provisioning is complete" "key"="openshift-storage/obc2" "obc2"=null I0521 13:47:15.804531 7 resourcehandlers.go:446] "msg"="updating status:" "key"="openshift-storage/obc2" "new status"="Pending" "obc"="openshift-storage/obc2" "old status"="Pending" I0521 13:47:15.807810 7 controller.go:273] "msg"="syncing obc creation" "key"="openshift-storage/obc2" I0521 13:47:15.807831 7 controller.go:620] "msg"="updating OBC metadata" "key"="openshift-storage/obc2" I0521 13:47:15.807841 7 resourcehandlers.go:436] "msg"="updating" "key"="openshift-storage/obc2" "obc"="openshift-storage/obc2" I0521 13:47:15.811577 7 resourcehandlers.go:148] "msg"="seeing if OB for OBC exists" "key"="openshift-storage/obc2" "checking for OB name"="obc-openshift-storage-obc2" I0521 13:47:15.813455 7 controller.go:396] "msg"="provisioning" "key"="openshift-storage/obc2" "bucket"="obc2-1bf185cb-3795-40b3-87e5-5eea52985f9a" 2021-05-21 13:47:15.813465 I | op-bucket-prov: initializing and setting CreateOrGrant services 2021-05-21 13:47:15.813474 I | op-bucket-prov: getting storage class "ocs-external-storagecluster-ceph-rgw" E0521 13:47:15.816402 7 controller.go:199] error syncing 'openshift-storage/obc2': error provisioning bucket: failed to get cephObjectStore: cephObjectStore not found: cephobjectstores.ceph.rook.io "ocs-external-storagecluster-cephobjectstore" not found, requeuing W0521 13:50:53.491520 7 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget W0521 13:56:26.493072 7 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget Version of all relevant components (if applicable): OCP: 4.8.0-0.nightly-2021-05-19-123944 OCS: ocs-operator.v4.8.0-399.ci External RHCS: ceph version 14.2.11-146.el8cp (c5c2c77b05b124fcbbe81df2cd4b3739215f88ad) nautilus (stable) Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, deployment failure Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: Yes Deployment passes on OCS 4.7 with the same external RHCS cluster Steps to Reproduce: 1. Deploy an OCP cluster 2. Install OCS 4.8 External mode 3. Check the storagecluster status Actual results: Deployment failure due to storagecluster stuck in Progressing state Expected results: Deployment should succeed and storagecluster should not get stuck in Progressing
Someone the CephObjectStore CR was not injected by the OCS-Op so moving to ocs-op for further investigation. For external mode, it is expected (and has been like this since 4.7) that a CephObjectStore needs to be created.
Arun, PTAL since you were last writing this code :) Thanks
Need to delete the cluster now cause of 4.6.5 and 4.7.1 testing. If you need to repro this and have running cluster, please let us know.
We should definitely at least look into this, even if it turns out it's not a problem that requires code changes. Giving devel_ack+.
In `CephObjectStore` spec has 'GatewaySpec' object and gateway spec expects one of it's field `Instances` to be minimal ONE. PR raised: https://github.com/openshift/ocs-operator/pull/1209 Jose, please take a look...
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Container Storage 4.8.0 container images bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:3003