Description of problem (please be detailed as possible and provide log snippets): OSD is stuck in Init:0/9 after performing TestNodeReplacement proactive test case Version of all relevant components (if applicable): OCS operator 4.11.0-131 Cluster Version 4.11.0-0.nightly-2022-08-04-081314 Ceph Version 16.2.8-84.el8cp (c2980f2fd700e979d41b4bad2939bb90f0fe435c) pacific (stable) Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Deploy KMS VAULT v2 cluster 2. Run io 3. Perform node replacement test case Actual results: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 50m (x4 over 50m) default-scheduler 0/6 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }, 1 node(s) were unschedulable, 3 Insufficient cpu, 3 node(s) didn't match pod topology spread constraints (missing required label), 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 5 node(s) didn't match Pod's node affinity/selector. preemption: 0/6 nodes are available: 6 Preemption is not helpful for scheduling. Warning FailedScheduling 44m (x3 over 45m) default-scheduler 0/5 nodes are available: 3 Insufficient cpu, 3 node(s) didn't match pod topology spread constraints (missing required label), 3 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 5 node(s) didn't match Pod's node affinity/selector. preemption: 0/5 nodes are available: 5 Preemption is not helpful for scheduling. Normal Scheduled 43m default-scheduler Successfully assigned openshift-storage/rook-ceph-osd-0-6cc5ff4c5c-8vt7g to compute-3 by control-plane-1 Warning FailedMount 41m kubelet Unable to attach or mount volumes: unmounted volumes=[ocs-deviceset-1-data-0k5qzf], unattached volumes=[run-udev ocs-deviceset-1-data-0k5qzf-bridge kube-api-access-tnm5p dev-mapper osd-encryption-key vault rook-ceph-log ocs-deviceset-1-data-0k5qzf rook-config-override rook-ceph-crash rook-data]: timed out waiting for the condition Warning FailedMount 38m kubelet Unable to attach or mount volumes: unmounted volumes=[ocs-deviceset-1-data-0k5qzf], unattached volumes=[rook-ceph-crash rook-data rook-config-override run-udev ocs-deviceset-1-data-0k5qzf-bridge vault rook-ceph-log kube-api-access-tnm5p osd-encryption-key dev-mapper ocs-deviceset-1-data-0k5qzf]: timed out waiting for the condition Warning FailedMount 36m kubelet Unable to attach or mount volumes: unmounted volumes=[ocs-deviceset-1-data-0k5qzf], unattached volumes=[dev-mapper rook-config-override osd-encryption-key ocs-deviceset-1-data-0k5qzf-bridge kube-api-access-tnm5p ocs-deviceset-1-data-0k5qzf rook-data rook-ceph-log rook-ceph-crash vault run-udev]: timed out waiting for the condition Warning FailedMount 33m kubelet Unable to attach or mount volumes: unmounted volumes=[ocs-deviceset-1-data-0k5qzf], unattached volumes=[ocs-deviceset-1-data-0k5qzf-bridge kube-api-access-tnm5p rook-ceph-log run-udev vault rook-config-override rook-ceph-crash osd-encryption-key dev-mapper rook-data ocs-deviceset-1-data-0k5qzf]: timed out waiting for the condition Warning FailedMount 31m kubelet Unable to attach or mount volumes: unmounted volumes=[ocs-deviceset-1-data-0k5qzf], unattached volumes=[ocs-deviceset-1-data-0k5qzf kube-api-access-tnm5p dev-mapper rook-config-override vault rook-data run-udev ocs-deviceset-1-data-0k5qzf-bridge rook-ceph-log osd-encryption-key rook-ceph-crash]: timed out waiting for the condition Warning FailedMount 29m kubelet Unable to attach or mount volumes: unmounted volumes=[ocs-deviceset-1-data-0k5qzf], unattached volumes=[vault ocs-deviceset-1-data-0k5qzf rook-ceph-crash run-udev kube-api-access-tnm5p rook-ceph-log ocs-deviceset-1-data-0k5qzf-bridge rook-data rook-config-override osd-encryption-key dev-mapper]: timed out waiting for the condition Warning FailedMount 26m kubelet Unable to attach or mount volumes: unmounted volumes=[ocs-deviceset-1-data-0k5qzf], unattached volumes=[ocs-deviceset-1-data-0k5qzf-bridge rook-data ocs-deviceset-1-data-0k5qzf kube-api-access-tnm5p vault rook-ceph-crash run-udev rook-ceph-log rook-config-override osd-encryption-key dev-mapper]: timed out waiting for the condition Warning FailedMount 24m kubelet Unable to attach or mount volumes: unmounted volumes=[ocs-deviceset-1-data-0k5qzf], unattached volumes=[dev-mapper kube-api-access-tnm5p ocs-deviceset-1-data-0k5qzf rook-ceph-log vault osd-encryption-key rook-data rook-config-override rook-ceph-crash run-udev ocs-deviceset-1-data-0k5qzf-bridge]: timed out waiting for the condition Warning FailedMount 21m kubelet Unable to attach or mount volumes: unmounted volumes=[ocs-deviceset-1-data-0k5qzf], unattached volumes=[rook-ceph-crash kube-api-access-tnm5p rook-data rook-config-override ocs-deviceset-1-data-0k5qzf-bridge rook-ceph-log run-udev osd-encryption-key vault dev-mapper ocs-deviceset-1-data-0k5qzf]: timed out waiting for the condition Warning FailedAttachVolume 75s (x22 over 43m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-5dfe54c2-1c36-4bf2-bbce-d18b4f615af6" : Failed to add disk 'scsi0:2'. Warning FailedMount 66s (x9 over 19m) kubelet (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[ocs-deviceset-1-data-0k5qzf], unattached volumes=[rook-data osd-encryption-key rook-config-override rook-ceph-log rook-ceph-crash run-udev vault ocs-deviceset-1-data-0k5qzf-bridge kube-api-access-tnm5p dev-mapper ocs-deviceset-1-data-0k5qzf]: timed out waiting for the condition Expected results: The pod should be in a Running state Additional info: Jenkins job:- https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/4969/consoleFull must-gather:- http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-015vukv21cs33-t4a/j-015vukv21cs33-t4a_20220808T045215/ Jenkins job rerun:- https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster-prod/4998/ must-gather:- http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/j-016vukv21cs33-t4a/j-016vukv21cs33-t4a_20220809T051710/
``` Warning FailedAttachVolume 75s (x22 over 43m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-5dfe54c2-1c36-4bf2-bbce-d18b4f615af6" : Failed to add disk 'scsi0:2'. AttachVolume.Attach failed for volume "pvc-5dfe54c2-1c36-4bf2-bbce-d18b4f615af6" : Failed to add disk 'scsi0:2'. ``` Prime facie looks like an environment issue related to shifting of pods on a new node. The is the closet resolution I could find for this error - https://access.redhat.com/solutions/5917391
(In reply to Santosh Pillai from comment #3) > ``` Warning FailedAttachVolume 75s (x22 over 43m) attachdetach-controller > AttachVolume.Attach failed for volume > "pvc-5dfe54c2-1c36-4bf2-bbce-d18b4f615af6" : Failed to add disk 'scsi0:2'. > AttachVolume.Attach failed for volume > "pvc-5dfe54c2-1c36-4bf2-bbce-d18b4f615af6" : Failed to add disk 'scsi0:2'. > ``` > > Prime facie looks like an environment issue related to shifting of pods on a > new node. > > The is the closet resolution I could find for this error - > https://access.redhat.com/solutions/5917391 Pratik?