1910790 – [AWS 1AZ] [OCP 4.7 / OCS 4.6] ocs-operator stuck in Installing phase, and noobaa-db-0 pod in a Pending state

Bug 1910790 - [AWS 1AZ] [OCP 4.7 / OCS 4.6] ocs-operator stuck in Installing phase, and noobaa-db-0 pod in a Pending state

Summary: [AWS 1AZ] [OCP 4.7 / OCS 4.6] ocs-operator stuck in Installing phase, and noo...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Data Foundation
Classification:	Red Hat Storage
Component:	ocs-operator
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	ODF 4.9.0
Assignee:	umanga
QA Contact:	Shrivaibavi Raghaventhiran
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1931875 (view as bug list)
Depends On:
Blocks:	1910199
TreeView+	depends on / blocked

Reported:	2020-12-24 17:03 UTC by Itzhak
Modified:	2023-08-09 17:00 UTC (History)
CC List:	24 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-12-13 17:44:23 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift ocs-operator pull 1224	0	None	None	None	2021-08-12 11:13:37 UTC
Red Hat Product Errata	RHSA-2021:5086	0	None	None	None	2021-12-13 17:44:44 UTC

Internal Links: 1931875

Description Itzhak 2020-12-24 17:03:02 UTC

Description of problem (please be detailed as possible and provide log
snippests):
When Deploying a cluster with conf: AWS 1 AZ, OCP 4.7, OCS 4.6 the ocs-operator stucks with status "installing", and the noobaa pod "noobaa-db-0" in a "Pending" state.

Version of all relevant components (if applicable):
OCP version:
Client Version: 4.6.0-0.nightly-2020-12-08-021151
Server Version: 4.7.0-0.nightly-2020-12-21-131655
Kubernetes Version: v1.20.0+87544c5

OCS verison:
ocs-operator.v4.6.0   OpenShift Container Storage   4.6.0                Installing

cluster version
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-12-21-131655   True        False         107m    Cluster version is 4.7.0-0.nightly-2020-12-21-131655

Rook version
rook: 4.6-80.1ae5ac6a.release_4.6
go: go1.15.2

Ceph version
ceph version 14.2.8-115.el8cp (183dfafff0de1f79fccc983d82e733fedc0e988b) nautilus (stable)

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, we can't install the OCS operator successfully.

Is there any workaround available to the best of your knowledge?
No.

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:
It may be a regression, cause I didn't notice this issue in OCP 4.6.

Steps to Reproduce:
1. Install the OCP 4.7 cluster with AWS 1 AZ(This issue may also be reproduced with vmware). 
2. Install OCS 4.6 from operator hub
3. Create the storage cluster


Actual results:
noobaa-db-0 pod in status "Pending", and ocs-operator stuck in "Installing" phase

Expected results:
All the pods should be in status "Running", and ocs-operator should be in a "Succeeded" phase

Additional info:

Comment 2 Itzhak 2020-12-27 15:39:07 UTC

Additional info:

Jenkins job: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/15865/

Logs: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/logs-20201224-200538/

Comment 3 Danny 2021-01-05 15:59:04 UTC

In the must gather it looks like noobaa-db-0 is pending waiting for a PV.

from noobaa-db-0 pod:

status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-12-24T14:32:08Z"
    message: '0/6 nodes are available: 6 pod has unbound immediate PersistentVolumeClaims.'
    reason: Unschedulable





from describe pvc:

Name:          db-noobaa-db-0
Namespace:     openshift-storage
StorageClass:  ocs-storagecluster-ceph-rbd
Status:        Pending
Volume:        
Labels:        app=noobaa
               noobaa-db=noobaa
Annotations:   volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com
Finalizers:    [kubernetes.io/pvc-protection]
Capacity:      
Access Modes:  
VolumeMode:    Filesystem
Mounted By:    noobaa-db-0
Events:
  Type     Reason                Age                      From                                                                                                              Message
  ----     ------                ----                     ----                                                                                                              -------
  Warning  ProvisioningFailed    177m (x10 over 3h30m)    openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-b46dd5c7-bmqqp_520fb3a8-db03-49a6-9f34-8636f580900e  failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-135dd866-4826-4aff-b4f2-d90bd4d327f2 already exists
  Normal   ExternalProvisioning  3m38s (x863 over 3h33m)  persistentvolume-controller                                                                                       waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator
  Normal   Provisioning          2m51s (x51 over 3h33m)   openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-b46dd5c7-bmqqp_520fb3a8-db03-49a6-9f34-8636f580900e  External provisioner is provisioning volume for claim "openshift-storage/db-noobaa-db-0"

Comment 4 Mudit Agarwal 2021-01-06 14:36:22 UTC

Niels, PTAL

Comment 5 Mudit Agarwal 2021-01-07 06:28:47 UTC

Looks like this is happening because cephblockpool is not yet created

csi.storage.k8s.io/node-stage-secret-name:rook-csi-rbd-node csi.storage.k8s.io/node-stage-secret-namespace:openshift-storage csi.storage.k8s.io/provisioner-secret-name:rook-csi-rbd-provisioner csi.storage.k8s.io/provisioner-secret-namespace:openshift-storage imageFeatures:layering imageFormat:2 pool:ocs-storagecluster-cephblockpool] Secrets:map[] VolumeContentSource:<nil> AccessibilityRequirements:<nil> XXX_NoUnkeyedLiteral:{} XXX_unrecognized:[] XXX_sizecache:0}
2020-12-24T14:32:08.690567060Z I1224 14:32:08.690541       1 connection.go:182] GRPC call: /csi.v1.Controller/CreateVolume
2020-12-24T14:32:08.690849103Z I1224 14:32:08.690556       1 connection.go:183] GRPC request: {"capacity_range":{"required_bytes":53687091200},"name":"pvc-135dd866-4826-4aff-b4f2-d90bd4d327f2","parameters":{"clusterID":"openshift-storage","imageFeatures":"layering","imageFormat":"2","pool":"ocs-storagecluster-cephblockpool"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
2020-12-24T14:32:08.756186544Z I1224 14:32:08.756093       1 connection.go:185] GRPC response: {}
2020-12-24T14:32:08.756186544Z I1224 14:32:08.756172       1 connection.go:186] GRPC error: rpc error: code = Internal desc = pool not found: pool (ocs-storagecluster-cephblockpool) not found in Ceph cluster
2020-12-24T14:32:08.756210761Z I1224 14:32:08.756197       1 controller.go:642] CreateVolume failed, supports topology = false, node selected false => may reschedule = false => state = Finished: rpc error: code = Internal desc = pool not found: pool (ocs-storagecluster-cephblockpool) not found in Ceph cluster
2020-12-24T14:32:08.756245976Z I1224 14:32:08.756231       1 controller.go:1084] Final error received, removing PVC 135dd866-4826-4aff-b4f2-d90bd4d327f2 from claims in progress
2020-12-24T14:32:08.756253895Z W1224 14:32:08.756243       1 controller.go:943] Retrying syncing claim "135dd866-4826-4aff-b4f2-d90bd4d327f2", failure 0
2020-12-24T14:32:08.756274547Z E1224 14:32:08.756261       1 controller.go:966] error syncing claim "135dd866-4826-4aff-b4f2-d90bd4d327f2": failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = Internal desc = pool not found: pool (ocs-storagecluster-cephblockpool) not found in Ceph cluster
2020-12-24T14:32:08.756302696Z I1224 14:32:08.756291       1 controller.go:1317] provision "openshift-storage/db-noobaa-db-0" class "ocs-storagecluster-ceph-rbd": started
2020-12-24T14:32:08.756363375Z I1224 14:32:08.756330       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"openshift-storage", Name:"db-noobaa-db-0", UID:"135dd866-4826-4aff-b4f2-d90bd4d327f2", APIVersion:"v1", ResourceVersion:"49156", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = Internal desc = pool not found: pool (ocs-storagecluster-cephblockpool) not found in Ceph cluster

Moving it to rook

Comment 6 Travis Nielsen 2021-01-11 22:12:48 UTC

Rook took about 4 minutes to initialize the cluster, then created the pool [1]:

2020-12-24T14:32:16.585089490Z 2020-12-24 14:32:16.585050 I | ceph-block-pool-controller: creating pool "ocs-storagecluster-cephblockpool" in namespace "openshift-storage"

This timestamp is about 10s after the CSI driver logged above that it could not find the pool. 

After that, the CSI provisioner seems to be stuck in a loop indicating that the volume already exists [2]:

2020-12-24T14:35:41.314353576Z I1224 14:35:41.314337       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"openshift-storage", Name:"db-noobaa-db-0", UID:"135dd866-4826-4aff-b4f2-d90bd4d327f2", APIVersion:"v1", ResourceVersion:"49156", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = Aborted desc = an operation with the given Volume ID pvc-135dd866-4826-4aff-b4f2-d90bd4d327f2 already exists

CSI team can you take another look at this?


[1] http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/logs-20201224-200538/ocs-must-gather-us/quay-io-ocs-dev-ocs-must-gather-sha256-fbcaa53e653a3bb4c281dbb3e98706cfe9ee5f8effb1d177cbf6a694555c4fe5/namespaces/openshift-storage/pods/rook-ceph-operator-868c759664-w4k57/rook-ceph-operator/rook-ceph-operator/logs/current.log
[2] http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/logs-20201224-200538/ocs-must-gather-us/quay-io-ocs-dev-ocs-must-gather-sha256-fbcaa53e653a3bb4c281dbb3e98706cfe9ee5f8effb1d177cbf6a694555c4fe5/namespaces/openshift-storage/pods/csi-rbdplugin-provisioner-b46dd5c7-bmqqp/csi-provisioner/csi-provisioner/logs/current.log

Comment 7 Mudit Agarwal 2021-01-12 11:41:49 UTC

Thanks Travis for looking.

>> Yeah, initially it was failing because of the unavailability of the pool

2020-12-24T14:32:15.288395183Z I1224 14:32:15.288384       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"openshift-storage", Name:"db-noobaa-db-0", UID:"135dd866-4826-4aff-b4f2-d90bd4d327f2", APIVersion:"v1", ResourceVersion:"49156", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "ocs-storagecluster-ceph-rbd": rpc error: code = Internal desc = pool not found: pool (ocs-storagecluster-cephblockpool) not found in Ceph cluster
2020-12-24T14:32:23.288535724Z I1224 14:32:23.288497       1 controller.go:1317] provision "openshift-storage/db-noobaa-db-0" class "ocs-storagecluster-ceph-rbd": started
2020-12-24T14:32:23.288654535Z I1224 14:32:23.288621       1 event.go:282] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"openshift-storage", Name:"db-noobaa-db-0", UID:"135dd866-4826-4aff-b4f2-d90bd4d327f2", APIVersion:"v1", ResourceVersion:"49156", FieldPath:""}): type: 'Normal' reason: 'Provisioning' External provisioner is provisioning volume for claim "openshift-storage/db-noobaa-db-0"


>> Once the pool got created, the same request failed with DeadlineExceeded error:

2020-12-24T14:32:23.297253984Z I1224 14:32:23.297236       1 connection.go:182] GRPC call: /csi.v1.Controller/CreateVolume
2020-12-24T14:32:23.297335778Z I1224 14:32:23.297248       1 connection.go:183] GRPC request: {"capacity_range":{"required_bytes":53687091200},"name":"pvc-135dd866-4826-4aff-b4f2-d90bd4d327f2","parameters":{"clusterID":"openshift-storage","imageFeatures":"layering","imageFormat":"2","pool":"ocs-storagecluster-cephblockpool"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
2020-12-24T14:34:53.297438383Z I1224 14:34:53.297381       1 connection.go:185] GRPC response: {}
2020-12-24T14:34:53.297438383Z I1224 14:34:53.297421       1 connection.go:186] GRPC error: rpc error: code = DeadlineExceeded desc = context deadline exceeded
2020-12-24T14:34:53.297484023Z I1224 14:34:53.297445       1 controller.go:642] CreateVolume failed, supports topology = false, node selected false => may reschedule = false => state = Background: rpc error: code = DeadlineExceeded desc = context deadline exceeded

>> Ultimately, it went into an endless loop saying another operation is in progress for the same volume:

2020-12-24T14:35:09.303817190Z I1224 14:35:09.303799       1 connection.go:182] GRPC call: /csi.v1.Controller/CreateVolume
2020-12-24T14:35:09.303906196Z I1224 14:35:09.303812       1 connection.go:183] GRPC request: {"capacity_range":{"required_bytes":53687091200},"name":"pvc-135dd866-4826-4aff-b4f2-d90bd4d327f2","parameters":{"clusterID":"openshift-storage","imageFeatures":"layering","imageFormat":"2","pool":"ocs-storagecluster-cephblockpool"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
2020-12-24T14:35:09.305811989Z I1224 14:35:09.305783       1 connection.go:185] GRPC response: {}
2020-12-24T14:35:09.305822036Z I1224 14:35:09.305807       1 connection.go:186] GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID pvc-135dd866-4826-4aff-b4f2-d90bd4d327f2 already exists

Comment 8 Mudit Agarwal 2021-01-15 12:13:03 UTC

Looks like this was hit only once, reducing the severity and removing the blocker flag. We might need to reproduce it to see what process was actually running on the volume.

Comment 9 Niels de Vos 2021-01-15 12:14:24 UTC

Debugging this might required getting the go-routines from the provisioner to see if there is something stuck. That requires https://github.com/ceph/ceph-csi/issues/1699 (not yet available).

Comment 10 Christian Burns 2021-01-26 16:16:21 UTC

I seem to have hit the same issue deploying an internal OCS 4.6.1 cluster running in OCP 4.6.12 on VMware (VMware ESXi, 6.7.0, 15160138)

Comment 11 Niels de Vos 2021-02-02 14:20:21 UTC

From the logs mentioned in comment #2, we can confirm that the first CreateVolume request after the pool was created did not return, and is indeed still in progress:

"ID: 26" in http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/logs-20201224-200538/ocs-must-gather-us/quay-io-ocs-dev-ocs-must-gather-sha256-fbcaa53e653a3bb4c281dbb3e98706cfe9ee5f8effb1d177cbf6a694555c4fe5/namespaces/openshift-storage/pods/csi-rbdplugin-provisioner-b46dd5c7-bmqqp/csi-rbdplugin/csi-rbdplugin/logs/current.log

2020-12-24T14:32:23.297624597Z I1224 14:32:23.297583       1 utils.go:159] ID: 26 Req-ID: pvc-135dd866-4826-4aff-b4f2-d90bd4d327f2 GRPC call: /csi.v1.Controller/CreateVolume
2020-12-24T14:32:23.298876783Z I1224 14:32:23.298860       1 utils.go:160] ID: 26 Req-ID: pvc-135dd866-4826-4aff-b4f2-d90bd4d327f2 GRPC request: {"capacity_range":{"required_bytes":53687091200},"name":"pvc-135dd866-4826-4aff-b4f2-d90bd4d327f2","parameters":{"clusterID":"openshift-storage","imageFeatures":"layering","imageFormat":"2","pool":"ocs-storagecluster-cephblockpool"},"secrets":"***stripped***","volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"ext4"}},"access_mode":{"mode":1}}]}
2020-12-24T14:32:23.299017801Z I1224 14:32:23.299004       1 rbd_util.go:771] ID: 26 Req-ID: pvc-135dd866-4826-4aff-b4f2-d90bd4d327f2 setting disableInUseChecks on rbd volume to: false


It is not possible to identify where exactly, and how this request is stuck. It possibly is blocking on some operation in Ceph libraries (through go-ceph). The ability to get a stack of the running go-routines would help with finding the place where the request is blocked.

I do not think that there is any further analysis that can be done with the current available data.

A workaround could be to only start deploying NooBaa once the CephBlockPool is sucessfully created.

Comment 12 Mudit Agarwal 2021-02-04 13:34:23 UTC

Right now, we don't have enough data to debug this issue. As Niels mentioned in the previous comment that enabling golang profiling will help us in debugging such issues, I am moving this to next release where hopefully we would have profiling enabled. Again, that will not help in debugging this issue but similar issues in future. Not closing it so that we don't miss the same.

Comment 13 João Soares 2021-03-12 14:31:38 UTC

I also came across this issue twice while deploying OCS 4.6.3 on 2 clusters running OCP 4.6.19 UPI bare metal install on VMware ESXi, 6.5.0, 15256549.
On a 3rd OCP setup the installation of OCS went without any issues. The environment and versions was exactly the same on all installs.

Comment 14 Nate Revo 2021-03-28 16:33:25 UTC

I am currently experiencing this issue with an OCP 4.7.2 "Compact Cluster" configuration on 3 bare metal machines.  

CPU: i7-8750H
Ram: 64gb
Storage: 1x256GB nvme for boot
         1x2TB 2.5" Seagate HDD
Nic: 1x10GbE Mellanox ConnectX-3

Comment 15 William Lovaton 2021-05-02 03:33:35 UTC

I'm also experiencing this bug on Red Hat Virtualization 4.4.3

Client Version: 4.7.5
Server Version: 4.7.8
Kubernetes Version: v1.20.0+7d0a2b2

I upgraded my cluster from 4.7.5 to 4.7.8 and then installed OCS 4.6 using the OperatorHub.

Comment 21 Niels de Vos 2021-05-10 07:40:35 UTC

As it seems that the provisioning is stuck somewhere in the Ceph libraries (our current best guess), we may be able to add a workaround in the ocs-operator that takes care of the deployment.

The ocs-operator creates the CephCluster, StorageClass(es) and related objects. It currently waits for the CephCluster to become healty before continuing with the next steps. It would be cleaner (and probably prevent the issue in this bz) if the ocs-operator creates the CephBlockPool (and CephFilesystem), waits for its completion and then creates the StorageClasses. This would reasult in Ceph-CSI getting requests when the CephBlockPool is ready to be used.

Umanga, does the above seem correct? Should I create a separate BZ for the ocs-operator to introduce the additional status checks before creating the StorageClasses?

Comment 24 umanga 2021-05-11 10:41:11 UTC

(In reply to Niels de Vos from comment #21)
> As it seems that the provisioning is stuck somewhere in the Ceph libraries
> (our current best guess), we may be able to add a workaround in the
> ocs-operator that takes care of the deployment.
> 
> The ocs-operator creates the CephCluster, StorageClass(es) and related
> objects. It currently waits for the CephCluster to become healty before
> continuing with the next steps. It would be cleaner (and probably prevent
> the issue in this bz) if the ocs-operator creates the CephBlockPool (and
> CephFilesystem), waits for its completion and then creates the
> StorageClasses. This would reasult in Ceph-CSI getting requests when the
> CephBlockPool is ready to be used.
> 
> Umanga, does the above seem correct? Should I create a separate BZ for the
> ocs-operator to introduce the additional status checks before creating the
> StorageClasses?

Yes, you should. Or this BZ is also fine if that is the only fix required.

Comment 25 Sahina Bose 2021-05-18 07:14:00 UTC

Is there a workaround that can be applied on cluster once cu hits this?

Comment 26 Madhu Rajanna 2021-05-18 07:17:22 UTC

I think the below steps can help to recover

* Make sure the ceph cluster is healthy
* The pool creation should be succeed
* Restart the rbd provisioner pod.

Comment 27 Sahina Bose 2021-05-18 15:46:03 UTC

Umanga, are we using this bug for fix? If so, can you reassign and set the component correctly?

Comment 28 Niels de Vos 2021-05-25 07:57:15 UTC

I expect that this can be solved (prevented in the future) by a change in the ocs-operator.

By creating the StorageClass(es) after the CephBlockPool and CephFilesystem are created and ready, the there should not be any problems/hangs during PVC creation.

Comment 29 Jose A. Rivera 2021-06-02 15:40:50 UTC

Since we've hit this more than once, it would be good to finally get this resolved. Giving devel_ack+.

Comment 30 Martin Bukatovic 2021-06-02 17:35:44 UTC

Based on today's bug triage and original bug description, providing qa ack.

Comment 33 Mudit Agarwal 2021-06-14 09:25:55 UTC

This exists since 4.6 and we don't have a fix ready yet. Moving it out of 4.8
Can be taken in 4.8.z if required.

Comment 37 Stu Lipshires 2021-06-25 16:13:56 UTC

I have a client with this problem on AWS. (running 4.6.32 I think) There is a description up above that says a potential work-around is to "Restart the rbd provisioner pod". Can you tell me which NS and the exact name of the pod(s) that need restarting? csi-rbdplugin-provisioner-xxxxx.xxxxx and rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-xxxxx in openshift-storage. Is it one of these?

Comment 38 Niels de Vos 2021-07-01 06:12:29 UTC

(In reply to Stu Lipshires from comment #37)
> I have a client with this problem on AWS. (running 4.6.32 I think) There is
> a description up above that says a potential work-around is to "Restart the
> rbd provisioner pod". Can you tell me which NS and the exact name of the
> pod(s) that need restarting? csi-rbdplugin-provisioner-xxxxx.xxxxx and
> rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-xxxxx in
> openshift-storage. Is it one of these?

The provisioner pods are in the openshift-storage namespace and have a label app=csi-rbdplugin-provisioner. You can restart them by running this command:

    $ oc -n openshift-storage delete pod -l app=csi-rbdplugin-provisioner

Comment 40 Mudit Agarwal 2021-08-20 07:40:42 UTC

*** Bug 1931875 has been marked as a duplicate of this bug. ***

Comment 41 Mudit Agarwal 2021-09-08 06:28:46 UTC

Please test with the latest build.

Comment 42 Itzhak 2021-09-13 07:51:45 UTC

I tried to test the bug again. 
Steps I did to reproduce:
 
1. I deployed an AWS with 1AZ, OCP 4.9, without OCS.
2. Install the OCS 4.9, the latest build from the UI.
3. Create the storage cluster. 

After all the steps above, the OCS pods were running, and the ocs-operator was in a "Succeeded" phase.

Link to the Jenkins job: https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/5909/

Versions of all components:

OCP version:
Client Version: 4.8.0-0.nightly-2021-09-11-042202
Server Version: 4.9.0-0.nightly-2021-09-10-170926
Kubernetes Version: v1.22.0-rc.0+75ee307

OCS verison:
ocs-operator.v4.9.0-132.ci      OpenShift Container Storage   4.9.0-132.ci              Succeeded

cluster version
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-09-10-170926   True        False         102m    Cluster version is 4.9.0-0.nightly-2021-09-10-170926

Rook version
rook: 4.9-187.c4c788f.release_4.9
go: go1.16.6

Ceph version
ceph version 16.2.0-117.el8cp (0e34bb74700060ebfaa22d99b7d2cdc037b28a57) pacific (stable)

Comment 44 Itzhak 2021-09-13 08:06:37 UTC

Can I move the bug to Verified according to the results above?

Comment 45 Mudit Agarwal 2021-09-16 06:44:58 UTC

lgtm

Comment 50 errata-xmlrpc 2021-12-13 17:44:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:5086

Note You need to log in before you can comment on or make changes to this bug.

assingh
christian.burns1
dzaken
ebenahar
hnallurv
jarrpa
jbasquil
joao.soares
madam
mbukatov
mrajanna
muagarwa
ndevos
nrevo
ocs-bugs
odf-bz-bot
owasserm
sabose
sagrawal
sbhavsar
sostapov
stu_lipshires
uchapaga
wlovaton