Description of problem (please be detailed as possible and provide log snippests): OCS 4.2 does not provision volumes. A partner installed OCS 4.2 on on OCP 4.2 on KVM/libvirt with spinning disks underneath. OCS sits on top PVs provisioned by Local Storage Operator using qemu images attached as disks to VMs. Version of all relevant components (if applicable): oc version Client Version: openshift-clients-4.2 Server Version: 4.2.20 Kubernetes Version: v1.14.6+999bb21 oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.2.2 OpenShift Container Storage 4.2.2 Installing oc get csv -n local-storage NAME DISPLAY VERSION REPLACES PHASE local-storage-operator.4.2.22-202003020552 Local Storage 4.2.22-202003020552 Succeeded Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? yes Is there any workaround available to the best of your knowledge? using LSO directly without OCS on top Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? not sure, the installation has not been repeated yet Can this issue reproduce from the UI? mostly performed from CLI If this is a regression, please provide more details to justify this: probably not Steps to Reproduce: 1. Deploy OCP 4.2 on KVM/libvirt with spinning drives underneath 2. Deploy LSO with enough volumes 3. Deploy OCS 4.2 on top of LSO PVs Actual results: Installation does not finish because NooBaa's PV cannot be attached: Warning FailedMount 16s (x8 over 2m25s) kubelet, pvx180.wdf.sap.corp MountVolume.MountDevice failed for volume "pvc-14e97ec0-685b-11ea-b5ce-52540017001e" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0011-openshift-storage-0000000000000001-213a69c7-685b-11ea-9cc1-0a580a820214 already exists Subsequent attempts to provision volumes using OCS storage classes end up in pending PVs. Expected results: OCS is usable. Additional info: The following guide was followed to deploy LSO and OCS: https://blog.openshift.com/ocs-4-2-in-ocp-4-2-14-upi-installation-in-rhv/ oc rsh -n openshift-storage $TOOLS_POD ceph health detail HEALTH_WARN 1 MDSs report slow metadata IOs; Reduced data availability: 42 pgs inactive; Degraded data redundancy: 62 pgs undersized MDS_SLOW_METADATA_IO 1 MDSs report slow metadata IOs mdsocs-storagecluster-cephfilesystem-a(mds.0): 14 slow metadata IOs are blocked > 30 secs, oldest blocked for 600793 secs PG_AVAILABILITY Reduced data availability: 42 pgs inactive pg 1.3 is stuck inactive for 600826.571739, current state undersized+peered, last acting [1] pg 2.4 is stuck inactive for 600824.554328, current state undersized+peered, last acting [1] pg 2.6 is stuck inactive for 600824.554328, current state undersized+peered, last acting [1] pg 3.1 is stuck inactive for 600821.758956, current state undersized+peered, last acting [1] pg 3.4 is stuck inactive for 600821.758956, current state undersized+peered, last acting [1] pg 3.6 is stuck inactive for 600821.758956, current state undersized+peered, last acting [1] pg 3.7 is stuck inactive for 600821.758956, current state undersized+peered, last acting [1] pg 4.0 is stuck inactive for 600816.054681, current state undersized+peered, last acting [1] ... oc describe pod noobaa-core-0 -n openshift-storage <snip> Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 4m43s (x5 over 5m7s) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 4 times) Normal Scheduled 4m42s default-scheduler Successfully assigned openshift-storage/noobaa-core-0 to pvx180.wdf.sap.corp Normal SuccessfulAttachVolume 4m42s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-14e97ec0-685b-11ea-b5ce-52540017001e" Warning FailedMount 2m26s kubelet, pvx180.wdf.sap.corp MountVolume.MountDevice failed for volume "pvc-14e97ec0-685b-11ea-b5ce-52540017001e" : rpc error: code = DeadlineExceeded desc = context deadline exceeded Warning FailedMount 25s (x2 over 2m39s) kubelet, pvx180.wdf.sap.corp Unable to mount volumes for pod "noobaa-core-0_openshift-storage(15139cdb-685b-11ea-b5ce-52540017001e)": timeout expired waiting for volumes to attach or mount for pod "openshift-storage"/"noobaa-core-0". list of unmounted volumes=[db]. list of unattached volumes=[db logs mgmt-secret s3-secret noobaa-token-gzql6] Warning FailedMount 16s (x8 over 2m25s) kubelet, pvx180.wdf.sap.corp MountVolume.MountDevice failed for volume "pvc-14e97ec0-685b-11ea-b5ce-52540017001e" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0011-openshift-storage-0000000000000001-213a69c7-685b-11ea-9cc1-0a580a820214 already exists oc describe pvc db-noobaa-core-0 Name: db-noobaa-core-0 Namespace: openshift-storage StorageClass: ocs-storagecluster-ceph-rbd Status: Bound Volume: pvc-8b6a17cb-6863-11ea-b5ce-52540017001e Labels: noobaa-core=noobaa Annotations: pv.kubernetes.io/bind-completed: yes pv.kubernetes.io/bound-by-controller: yes volume.beta.kubernetes.io/storage-provisioner: openshift-storage.rbd.csi.ceph.com Finalizers: [kubernetes.io/pvc-protection] Capacity: 50Gi Access Modes: RWO VolumeMode: Filesystem Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ExternalProvisioning 15m persistentvolume-controller waiting for a volume to be created, either by external provisioner "openshift-storage.rbd.csi.ceph.com" or manually created by system administrator Normal Provisioning 15m openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-5dcdb49bb9-s254c_6b362ba7-685a-11ea-95f7-0a580a820214 External provisioner is provisioning volume for claim "openshift-storage/db-noobaa-core-0" Normal ProvisioningSucceeded 14m openshift-storage.rbd.csi.ceph.com_csi-rbdplugin-provisioner-5dcdb49bb9-s254c_6b362ba7-685a-11ea-95f7-0a580a820214 Successfully provisioned volume pvc-8b6a17cb-6863-11ea-b5ce-52540017001e Mounted By: noobaa-core-0 oc describe pv pvc-8b6a17cb-6863-11ea-b5ce-52540017001e Name: pvc-8b6a17cb-6863-11ea-b5ce-52540017001e Labels: <none> Annotations: pv.kubernetes.io/provisioned-by: openshift-storage.rbd.csi.ceph.com Finalizers: [kubernetes.io/pv-protection] StorageClass: ocs-storagecluster-ceph-rbd Status: Bound Claim: openshift-storage/db-noobaa-core-0 Reclaim Policy: Delete Access Modes: RWO VolumeMode: Filesystem Capacity: 50Gi Node Affinity: <none> Message: Source: Type: CSI (a Container Storage Interface (CSI) volume source) Driver: openshift-storage.rbd.csi.ceph.com VolumeHandle: 0001-0011-openshift-storage-0000000000000001-8ba71bf5-6863-11ea-9cc1-0a580a820214 ReadOnly: false VolumeAttributes: clusterID=openshift-storage imageFeatures=layering imageFormat=2 pool=ocs-storagecluster-cephblockpool storage.kubernetes.io/csiProvisionerIdentity=1584454814327-8081-openshift-storage.rbd.csi.ceph.com Events: <none> oc describe sc ocs-storagecluster-ceph-rbd Name: ocs-storagecluster-ceph-rbd IsDefaultClass: No Annotations: <none> Provisioner: openshift-storage.rbd.csi.ceph.com Parameters: clusterID=openshift-storage,csi.storage.k8s.io/fstype=ext4,csi.storage.k8s.io/node-stage-secret-name=rook-csi-rbd-node,csi.storage.k8s.io/node-stage-secret-namespace=openshift-storage,csi.storage.k8s.io/provisioner-secret-name=rook-csi-rbd-provisioner,csi.storage.k8s.io/provisioner-secret-namespace=openshift-storage,imageFeatures=layering,imageFormat=2,pool=ocs-storagecluster-cephblockpool AllowVolumeExpansion: <unset> MountOptions: <none> ReclaimPolicy: Delete VolumeBindingMode: Immediate Events: <none> More information in the attachements.
Created attachment 1673137 [details] OCS container logs and ceph health
Created attachment 1673160 [details] Described storage related objects