Description of problem (please be detailed as possible and provide log snippests): osd pods are not created in latest ocs version ( 4.7.0-242.ci ) Version of all relevant components (if applicable): openshift installer (4.7.0-0.nightly-2021-01-29-024753) ocs-registry:4.7.0-242.ci Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes Is there any workaround available to the best of your knowledge? NA Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1/1 Can this issue reproducible? NA Can this issue reproduce from the UI? Not tried If this is a regression, please provide more details to justify this: Yes Steps to Reproduce: 1. Install OCS using ocs-ci 2. installation failed due to osd pods are not created 3. Actual results: osd pods are not created Expected results: osd pods should be running Additional info: job: https://ocs4-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/qe-deploy-ocs-cluster/16815/console must gather logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/vavuthua-lso/vavuthua-lso_20210112T143119/logs/failed_testcase_ocs_logs_1611905038/deployment_ocs_logs/
> pods list $ oc get pods NAME READY STATUS RESTARTS AGE csi-cephfsplugin-fcsjh 3/3 Running 0 71m csi-cephfsplugin-g84rg 3/3 Running 0 71m csi-cephfsplugin-provisioner-6fdfcfdc9c-4fjjh 6/6 Running 0 71m csi-cephfsplugin-provisioner-6fdfcfdc9c-826ks 6/6 Running 0 71m csi-cephfsplugin-t7hxr 3/3 Running 0 71m csi-rbdplugin-f5h5h 3/3 Running 0 71m csi-rbdplugin-g5rx7 3/3 Running 0 71m csi-rbdplugin-gjvrn 3/3 Running 0 71m csi-rbdplugin-provisioner-69596c4bf8-7dwmz 6/6 Running 0 71m csi-rbdplugin-provisioner-69596c4bf8-mp97b 6/6 Running 0 71m must-gather-2x2w4-helper 1/1 Running 0 28m must-gather-l8g9k-helper 1/1 Running 0 53m noobaa-core-0 1/1 Running 0 68m noobaa-db-pg-0 0/1 Pending 0 68m noobaa-operator-6f4db5955b-w6n65 1/1 Running 0 71m ocs-metrics-exporter-5b8cb75c94-btcvh 1/1 Running 0 71m ocs-operator-6cb6dff494-4gzc9 1/1 Running 0 71m rook-ceph-crashcollector-compute-0-7fdccc8889-fbfg7 1/1 Running 0 69m rook-ceph-crashcollector-compute-1-f5d447b76-z5qkn 1/1 Running 0 70m rook-ceph-crashcollector-compute-2-868fcfc97f-455dx 1/1 Running 0 69m rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-76d9b79c5tw6p 2/2 Running 0 67m rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-76f7d6b67bdjm 2/2 Running 0 67m rook-ceph-mgr-a-55cc97b766-jhv9l 2/2 Running 0 69m rook-ceph-mon-a-69d864d7f8-7r4kv 2/2 Running 0 70m rook-ceph-mon-b-5db5d5b599-qgqjj 2/2 Running 0 69m rook-ceph-mon-c-7d7bc4cb9d-kzr5x 2/2 Running 0 69m rook-ceph-operator-5fd7d877f-tc2qq 1/1 Running 0 71m rook-ceph-osd-prepare-ocs-deviceset-0-data-0-zbf6n-mvh49 0/1 Completed 0 69m rook-ceph-osd-prepare-ocs-deviceset-1-data-0-zql2z-8q9hx 0/1 Completed 0 69m rook-ceph-osd-prepare-ocs-deviceset-2-data-0-f2rf5-4d9gf 0/1 Completed 0 69m $
> pvc $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE db-noobaa-db-pg-0 Pending ocs-storagecluster-ceph-rbd 77m ocs-deviceset-0-data-0-zbf6n Bound pvc-771f1c90-446e-48a0-9d99-5addbdf559b4 100Gi RWO thin 78m ocs-deviceset-1-data-0-zql2z Bound pvc-36d044e8-82e4-418e-bf62-a97fb28674c5 100Gi RWO thin 78m ocs-deviceset-2-data-0-f2rf5 Bound pvc-a5d8ded1-c88d-4f8b-80c9-7014074878e1 100Gi RWO thin 78m rook-ceph-mon-a Bound pvc-29933f95-a7f8-4d17-bb84-8b6e47b653b5 10Gi RWO thin 80m rook-ceph-mon-b Bound pvc-e4bc6993-0c00-488d-9a1a-9bcbb5cccaf7 10Gi RWO thin 80m rook-ceph-mon-c Bound pvc-3b480e9b-26c6-4df1-a10d-0ec2fe0a7d10 10Gi RWO thin 80m > from the rook-ceph-operator-5fd7d877f-tc2qq log, it looks like rook-ceph-osd-* is invalid 2021-01-29 08:16:09.112448 W | op-osd: failed to create osd deployment for pvc "ocs-deviceset-1-data-0-zql2z", osd {0 ceph 9491a35d-8bb6-413d-a06b-c192a50e4d6a /mnt/ocs-deviceset-1-data-0-zql2z true root=default host=ocs-deviceset-1-data-0-zql2z rack=rack0 false raw bluestore}. Deployment.apps "rook-ceph-osd-0" is invalid: spec.template.spec.securityContext.shareProcessNamespace: Invalid value: true: ShareProcessNamespace and HostPID cannot both be enabled 2021-01-29 08:16:18.403045 I | op-osd: osd orchestration status for node ocs-deviceset-2-data-0-f2rf5 is computingDiff 2021-01-29 08:16:18.423835 I | op-osd: osd orchestration status for node ocs-deviceset-2-data-0-f2rf5 is orchestrating 2021-01-29 08:16:19.219964 I | op-osd: osd orchestration status for node ocs-deviceset-0-data-0-zbf6n is computingDiff 2021-01-29 08:16:19.243202 I | op-osd: osd orchestration status for node ocs-deviceset-0-data-0-zbf6n is orchestrating 2021-01-29 08:16:22.635392 I | op-osd: osd orchestration status for node ocs-deviceset-2-data-0-f2rf5 is completed 2021-01-29 08:16:22.635406 I | op-osd: starting 1 osd daemons on pvc ocs-deviceset-2-data-0-f2rf5 2021-01-29 08:16:22.635411 I | op-osd: OSD will have its main bluestore block on "ocs-deviceset-2-data-0-f2rf5" 2021-01-29 08:16:22.635416 I | cephclient: getting or creating ceph auth key "osd.1" 2021-01-29 08:16:22.957492 W | op-osd: failed to create osd deployment for pvc "ocs-deviceset-2-data-0-f2rf5", osd {1 ceph b628e29d-abc1-4dc0-90b6-9e91dab2af8c /mnt/ocs-deviceset-2-data-0-f2rf5 true root=default host=ocs-deviceset-2-data-0-f2rf5 rack=rack1 false raw bluestore}. Deployment.apps "rook-ceph-osd-1" is invalid: spec.template.spec.securityContext.shareProcessNamespace: Invalid value: true: ShareProcessNamespace and HostPID cannot both be enabled 2021-01-29 08:16:23.431758 I | op-osd: osd orchestration status for node ocs-deviceset-0-data-0-zbf6n is completed 2021-01-29 08:16:23.431775 I | op-osd: starting 1 osd daemons on pvc ocs-deviceset-0-data-0-zbf6n 2021-01-29 08:16:23.431780 I | op-osd: OSD will have its main bluestore block on "ocs-deviceset-0-data-0-zbf6n" 2021-01-29 08:16:23.431785 I | cephclient: getting or creating ceph auth key "osd.2" 2021-01-29 08:16:23.742978 W | op-osd: failed to create osd deployment for pvc "ocs-deviceset-0-data-0-zbf6n", osd {2 ceph 0632f858-89fe-4fc6-ab92-9074d5ef1fd5 /mnt/ocs-deviceset-0-data-0-zbf6n true root=default host=ocs-deviceset-0-data-0-zbf6n rack=rack2 false raw bluestore}. Deployment.apps "rook-ceph-osd-2" is invalid: spec.template.spec.securityContext.shareProcessNamespace: Invalid value: true: ShareProcessNamespace and HostPID cannot both be enabled 2021-01-29 08:16:23.748060 I | op-osd: 3/3 node(s) completed osd provisioning 2021-01-29 08:16:23.748091 I | op-osd: start provisioning the osds on nodes, if needed 2021-01-29 08:16:23.752693 I | op-osd: 0 of the 0 storage nodes are valid 2021-01-29 08:16:23.752752 W | op-osd: no valid nodes available to run osds on nodes in namespace openshift-storage 2021-01-29 08:16:23.752760 I | op-osd: start osds after provisioning is completed, if needed 2021-01-29 08:16:24.048234 I | op-osd: completed running osds in namespace openshift-storage
Created attachment 1751984 [details] rook-ceph-operator log
ocs-operator does not directly influence the OSD Deployments, those are created and managed by Rook-Ceph. As such, moving this BZ to the relevant component.
This looks related to enabling the log collector. @Seb can you take a look?
Merged downstream with https://github.com/openshift/rook/pull/157
Verified deployment job here: https://ocs4-jenkins-csb-ocsqe.apps.ocp4.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/167/console openshift installer (4.7.0-0.nightly-2021-02-02-223803) ocs-operator.v4.7.0-249.ci > All the osd pods are created. 10:59:17 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage get Pod -n openshift-storage --selector=app=rook-ceph-osd -o yaml 10:59:17 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage get Pod rook-ceph-osd-0-6bf5df9459-hzgpt -n openshift-storage 10:59:18 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage get Pod rook-ceph-osd-1-77749bb57d-df8tb -n openshift-storage 10:59:18 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n openshift-storage get Pod rook-ceph-osd-2-75ccb4599f-r4snx -n openshift-storage 10:59:18 - MainThread - ocs_ci.ocs.ocp - INFO - 3 resources already reached condition! > job failed due to bug bug 1924211 Marking as Verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2041